[hashcash] Re: Hashcash performance improvements

  • From: Malcolm Howell <hashcash@xxxxxxxxxxxxxxxxxxx>
  • To: <hashcash@xxxxxxxxxxxxx>
  • Date: Sun, 30 May 2004 16:00:24 +0100 (BST)

> From: Jonathan Morton <chromi@xxxxxxxxxxxxxxxxxxxxx>
> Date: Sun, 30 May 2004 03:47:37 +0100
> 
> > [..] to place some arbitrary limit on stamp size to protect against 
> > extremes - 5000 bytes for example.  This should be much more than
> > enough to satisfy any conceivable future expansion, but small enough
> > to restrict processing time and storage space to manageable
> > (negligible?) levels.

> I think this is fair.  Is there already a size limit on e-mail headers, 
> though?  If so, we should use that as a guide.

After some quick reading: SMTP (RFC 2821) places a 1000 character limit on
message lines, including the CRLF at the end, effectively making it a 998
character limit.  RFC 2822 copies this 998 as a hard limit on header line
length, plus it recommends restricting lines to 78 characters for ease of
display.  To allow for longer header fields it defines a folding mechanism,
so headers can ultimately be as long as you like, as long as they can be
broken across multiple lines - I think there was some discussion about
splitting hashcash stamps some time ago.

In case hashcash ever finds its way into an SMTP extension, there is a limit
of 512 characters on SMTP command lines although this can be extended by
"SMTP extensions", which I assume means some sort of negotiation between
client and server.

Well there are the guidelines.  In any case, a limit should be named.
Standards normally have arbitrary limits written into them, as without them
no computer with finite memory could ever claim to be standards compliant.

  [multiple sub-puzzles]
> How about if the secondary counters are the same length as the primary? 
>   Then they can be overlaid on the primary field, so there's less 
> padding involved - and leading zeroes could be omitted from the 
> secondaries.

I don't get what you're saying here.  How can the counters be the same
length and also be optimally aligned within blocks?  That is the sole reason
for padding in the first place.

> The above example could be recoded:
>      
> 1:10,4:040525:foo::c82a215126afb2c2:0000000000000000416,5970,6119,7023

Also, you seem to be reading meaning into the leading zeros, which could
just as easily have been leading 'A's had I used base64 for the counter.
That said, an extra comma separator would sort this out, but would
effectively make the padding field compulsory:

1:10,4:040525:foo::c82a215126afb2c2:0000000000000000,416,5970,6119,7023

I'd like to rewind this whole discussion a bit though and point out that I
don't see the need for the sub-puzzles.  Is the only reason that it makes
minting time more predictable by reducing the variance?  I can't see much
value in that, although maybe I will when I'm desperate to send an email
quickly :-)

> >     - Common code assembles stamp including random,
> >     - Efficient mint function calculates counter.
> 
> I already do something like this, actually.  The random field is added 
> and the SHA-1 block is assembled by a central routine, which then looks 
> up a function pointer to the optimised minter in a table and calls it.  

Admittedly I hadn't read the code and was working from mailing list posts.

> The optimised minter then overwrites part of the pre-assembled SHA-1 
> block with the counter.

It's the overwriting of random that I find distasteful.  The sole purpose
of the minter should be to append a counter that works.  It doesn't even
need to care about the format of the string it's been given to work on.

> Now, admittedly, padding with random data does potentially consume more 
> entropy from the system's RNG,

Ah, this was one of my original points against, but I removed it as it
wasn't very convincing.  The random field can easily be padded with
non-random data, and who's going to prove you didn't generate it randomly?
This is a case where it's in the user's interest to make the field random,
as it decreases the probability of collision with someone else's stamp.
However, once there are 64 bits of entropy in there putting in more
doesn't add any value.

I'll sum up my position by saying that the minting function's job is to
append a counter to some string, without parsing it in any way, within the
restrictions of the character set and total length limit (both of which are
being discussed elsewhere).  Tricks such as padding which can be applied to
the counter string are fair game, modifying the string passed in is
definitely bad form.

  [cached values of W[16]..W[34]]
> Bear in mind that for a single-pipe routine running on a superscalar 
> processor, the XOR and shift calculations are essentially free, because 
> they can very easily be overlapped with the other half of each round 
> calculation.

A fair point.  I haven't implemented any of this yet myself so I'm not
familiar with what overlaps well.  Still, eliminating operations is
generally good as sometimes those "free" instructions can be used to get
another pipeline in.  And not every processor is superscalar, the vector
unit on a G4 (7450) effectively isn't as it only does one simple integer op
per cycle.

> > Is there any reason recipients would want to store just the random
> > field, as opposed to, say, a sha1 hash (or some similar hash) of the
> > entire stamp?

> I can't think of any good objection to that.

Neither can I.  I was just working on previous comments about the database
holding date, recipient and random field.  A hash of the whole thing could
work just as well (although the date would need to be stored separately
anyway, to allow expiration).

Also, if you're concerned about the leading zeros which are guaranteed to be
in the SHA-1 hash of a valid stamp, you could hash the stamp without
including the counter.

Regards,
Malcolm.


Other related posts: