[hashcash] Re: Hashcash performance improvements

  • From: Malcolm Howell <hashcash@xxxxxxxxxxxxxxxxxxx>
  • To: <hashcash@xxxxxxxxxxxxx>
  • Date: Sat, 29 May 2004 19:28:23 +0100 (BST)

(I should point out first that I get the list in digest format, hence the
slightly delayed bulk replies.)

> Date: Fri, 28 May 2004 17:39:58 -0400
> From: Adam Back <adam@xxxxxxxxxxxxxxx>

  [large random/counter fields]

> This makes a slight conflict between the senders interests and the
> receivers (hostile sender may send massive stamp which costs nothing
> extra practially to compute).  If we cared about this we would ideally
> want to enforce small stamps on sender (measured in SHA1 blocks) to
> make the receivers verification fast.  Simple "enforcement" would be
> if it cost the sender disproportionately more.  

As Jonathan has shown, the time taken to check a stamp is negligible, however
there is still a matter of storage space.  Quoting Adam from 29th April:

| > Does the recipient even need to care about the random field?  
|
| He might care to enable him to store something compact in his double
| spend database.  He need only store the date, recipient, and random
| field.

A maliciously large random field could then be used to fill up somebody's
double spend database, or at least hurt performance if the DB is designed
on the assumption that it contains small data.  I think it might be fair
to place some arbitrary limit on stamp size to protect against extremes -
5000 bytes for example.  This should be much more than enough to satisfy
any conceivable future expansion, but small enough to restrict processing
time and storage space to manageable (negligible?) levels.

> One way to do this would be to put the counter at the beginning,
> however this is typically dangerous because then work can be
> pre-computed and shared across multiple recipients.

I don't think there's a significant problem here.  Carrying out some
precomputation once for many recipients instead of doing it once for
each recipient will only save around a second per many million recipients.

> In version 1 format so far I generate hex digits.  But I accept
> anything if I recall.  Need to decide what is clean / simple etc.  Ie
> if we say base64 only, then it doesn't matter that its less efficient
> because it will also be less efficient for spammers.

Personally I think it's best to choose some restricted character set in
order to avoid trouble with parsing.  Colons can't be permitted because
they are the field separators within a stamp.  Special shell characters
could be dangerous because they will mess up even simple manipulations
such as "% echo -n <stamp> | sha1".  Every so often a new method of
processing will cause another unusual character to become unsuitable.
All in all I think there should be a limited set of safe characters which
will never change.  Hex digits is one option, base64 is another.

That applies to both the random and counter fields, by the way.

> It might also help in some sense if we chose things which are simple
> to code efficiently, with no unrolled pre-computed tricks available to
> keep code simple if that makes same overhead for spammer and normal
> user and results in simpler, less unrolled pre-computed, funny-order
> counted, manipulated aligned field tricked etc.

> (So think about that also in your optimization -- what would help
> fairly prevent this optimization and keep things simpler -- we could
> adapt v1 spec.)

Interesting, I never thought of making optimisations impossible, only
taking advantage of them.  My opinion now is that some optimisation will
always be available to those who are willing to find tricks and create the
required spaghetti code.  Creating a system which cannot be optimised
would be about as difficult as creating a secure hash function in the
first place.

I think the stamp format is fine as it is and would not benefit from any
rearrangement simply to enforce minting times.  Remember we are only talking
about 10-15% improvements over an efficiently implemented SHA-1 function.
If hashcash is too easy to compute, this is equivalent to saying that
SHA-1 is broken.

> Izzy Kindred was also suggesting earlier that version 1 should support
> multiple sub-puzzles; if we do this it will make align at end of SHA1
> input block work less well.  As there will be multiple counters.

Simply align every counter by padding out to the next boundary:

1:10,4:040525:foo::c82a215126afb2c2:0000000000000000416,
000000000000000000000000000000000000000000000000000000000005970,
000000000000000000000000000000000000000000000000000000000006119,
000000000000000000000000000000000000000000000000000000000007023

Line breaks added for readability.  The first portion (not including the
comma) is 55 bytes, then each counter adds 64 bytes == 1 block.  Best
computation time, horrible to look at though.  The extra puzzles just
added an incentive to add more padding.

(See comments below about padding counter rather than random.)

> ------------------------------
>
> From: Jonathan Morton <chromi@xxxxxxxxxxxxxxxxxxxxx>
> Date: Sat, 29 May 2004 01:46:33 +0100
>
>
> At the moment, I'm filling in random data until the following 
> conditions are satisfied:
> 
> - There are at least 16 characters of random (96 bits, as it's in 
> base-64).
> 
> - After a separator, an 8-character count field (enough for a 32-bit 
> count in base-16) won't straddle a block boundary.
>
> It would be easy enough to change the second condition to something 
> like "...will be aligned at the end of the block".

Having considered how to tie these fast minting routines together, and
looked over some other issues, here is what I think:

Padding should be added to the counter, not to the random field.  Where
you have defined the setup to be

        - Common code assembles stamp string,
        - Efficient mint function appends random and calculates counter,

a better way would be

        - Common code assembles stamp including random,
        - Efficient mint function calculates counter.

Reasons:
- Appending the random string is a common, non-time-critical function.
  In your arrangement, every interchangeable minting function performs the
  same task of appending the random string.
- The amount of random salt can be determined by policy or convention at a
  higher level, rather than by some obscure optimisation need specific to
  whatever processor you're running on.
- Recipients will likely be storing the random field in a double spend
  database.  This way, they're not storing your padding in there too.
- This is perhaps a minor point relating to my brain being stuck in
  assembly language, but in your arrangement the mint function needs
  access to a random number generator.  My way around, it's simply a pure
  calculating function which calls no other functions.

> That means we could potentially be looking 
> at a line-full (63 bytes) of extra padding near the end of the stamp, 
> depending on the size of the resource and extension fields.  Does this 
> matter?

I guess we could just provide a selection of minting functions, some
"fast", others "small".  There's no one right answer to the question of
which is more important.

> At the moment, I'm working on functions which try to be generically 
> efficient on any compatible processor, rather than trying to 
> micro-optimise for any particular model.  I think I'll leave that part 
> up to you.  :)

I agree, and we're making good progress here at finding speed tricks which
can apply to all processors.  Now put your thinking caps on, because
here's some more potential speedups:

First, a recap/introduction to one part of the SHA-1 algorithm.  The hash
processes each block (64 bytes) of a message using input from an array of
80 words, called W.  W[0] to W[15] are copies of the bytes in the block
being processed.  W[16] to W[79] are calculated from these words using

        W[t] = S^1(W[t-3] XOR W[t-8] XOR W[t-14] XOR W[t-16])

where S^1 means a left rotation by 1 bit.  Hence, each W[t] above 16
depends on four of the preceding values.  Some working out gives this:

t       W[t] depends on...
--      ------------------
16      0 2 8 13
17      1 3 9 14
18      2 4 10 15
19      0 2 3 5 8 11 13
20      1 3 4 6 9 12 14
21      2 4 5 7 10 13 15
22      0 2 3 5 6 8 11 13 14
23      1 3 4 7 6 9 12 14 15
24      0 2 4 5 7 8 10 13 15
25      0 1 2 3 5 6 8 9 11 13 14
26      1 2 3 4 6 7 9 10 12 14 15
27      0 2 3 4 5 7 8 10 11 13 15
28      0 1 2 3 4 5 6 8 9 11 12 13 14
29      1 2 3 4 5 6 7 9 10 12 13 14 15
30      0 2 3 4 5 6 7 8 10 11 13 14 15
31      0 1 2 3 4 5 6 7 8 9 11 12 13 14 15
32      0 1 2 3 4 5 6 7 8 9 10 12 13 14 15
33      0 1 2 3 4 5 6 7 8 9 10 11 13 14 15
34      0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

From W[34] onwards, every word of the message block has been mixed into
W[t].  Now, counting the occurrences of each number in the above table
gives this:

t       # of values in W[16]..W[34] which depend on W[t] 
--      ------------------------------------------------
0       12
1       11
2       16
3       15
4       14
5       13
6       12
7       11
8       12
9       11
10      10
11      9
12      8
13      14
14      13
15      12

So W[12], which is bytes 48..51 of the block, does *not* affect the values
of 11 of the W values which are calculated.  This means that these 11
values can be cached for all blocks which only vary in those bytes.  This
also fits in very conveniently with the precalculating trick suggested
before.

So, a fast minting routine would work like this:
- Pad the counter until the stamp is 48 bytes (12 words, W[0]..W[11])
- Fill in W[13]..W[15] with padding and length values
- Calculate the 11 values (W[16], W[17], etc) which do not vary with W[12]
- Carry out the first 12 steps of SHA-1 block, cache results

- For every possible value of W[12]
        calculate the remaining 68 steps, using cached data on 11 of them.

Using the base64 alphabet for the counter, "every possible value" covers
2^24 possibilities.  When these are exhausted, just change one bit of the
counter in W[11] or before and start again.

Well, this caching might not bring much of a speed increase, but it can't
be worse than not doing it.  It would even make the unrolled loop slightly
smaller by replacing several integer ops with one memory access (or
register move, if there are enough regs).

I was hoping to find a greater improvement by calculating most of the W
array and XORing in the W[12] value as needed, but the bit rotation
complicates this.  It might still be possible though, give me time to try
it...

Cheers,
Malcolm.


Other related posts: