[hashcash] Hashcash performance improvements

  • From: Jonathan Morton <chromi@xxxxxxxxxxxxxxxxxxxxx>
  • To: hashcash@xxxxxxxxxxxxx
  • Date: Wed, 26 May 2004 04:31:22 +0100

I recently sent Adam an updated version of libsha1.c, which is about 
20% faster on x86 machines.  After some more thought in this vein, I'd 
like to help write a set of "fast minting" routines which maximise 
sustained minting performance on a given class of machines.

I think this is in our best interests, as determined spammers will do 
this kind of optimisation as soon as it becomes profitable for them, 
and we want to stay abreast of them at worst.  Optimising the software 
available to the "good guys" limits the advantage spammers can gain 
over us, at least until they decide to use dedicated hardware.

I've already determined that the vanilla x86 instruction set is 
incapable of computing SHA-1 hashes without spilling working variables 
to memory - the revised routine reduces this problem but doesn't 
eliminate it.  There are 5 explicit working variables, 2 temporaries, 
and one frequently-accessed constant needed, but the x86 only has 8 
GPRs, some of which are needed for system purposes.

It should, however, be possible to compute SHA-1 more efficiently using 
MMX, which has a set of 8 registers separate from the GPRs - just 
enough to store all the needed values.  Additionally, MMX would be able 
to compute two hashes simultaneously, potentially doubling the 
collision rate.  I estimate that an efficient MMX routine could be as 
much as 4 times as fast as the present implementation - that's 
potentially 4 million collision tests per second from a recent Athlon.  
One disadvantage, though, is that MMX has no rotate instruction - it 
takes 4 instructions (move, shift left, shift right, or) to emulate 
that operation.

Even more efficient routines could possibly be written using SSE2, but 
I don't have hardware capable of running such code.  Someone else might 
want to implement an SSE2 extension once the MMX and Altivec routines 
are mature, so I'll try to ensure such extensions are easy to add.

PowerPC machines (meaning Macs) have a rich set of registers and are 
not limited by spilling.  I still think a faster implementation is 
possible, by running multiple "pipes" alongside each other and thus 
taking advantage of pipeline bubbles.  Other register-rich 
architectures, including perhaps ARM, could benefit from the same code.

On PowerPC machines supporting Altivec (G4 and G5 at present), there 
are 32 128-bit vector registers available, so it is theoretically 
possible to run a 16-pipe hash routine, consisting of 4 pipes each 
operating on 4-element vectors.  This would probably achieve a rate of 
1.6 million collision tests per second, or better, on a 400MHz G4.

To allow multi-pipe implementations and reduce overhead, it's necessary 
to move away from the standard libsha1-style implementation, towards a 
dedicated minting routine.  The standard libsha1 routines can be used 
to verify the token after minting and upon receipt.  I suggest 
decoupling the token formatting from the hash minting, and simply 
passing a string and a bit-count request to the minter.  The formatting 
only needs to be done once per token, so doesn't need to be specially 

The minter routine would then append the random salt string, which 
would be at least 64 (or 96) bits worth but extended (if necessary) to 
allow minting to progress on a single SHA-1 block - which is much more 
efficient than calculating SHA-1 afresh for the entire message.  
Hexadecimal and base64 strings are both feasible, though the tail of 
the salt string may be more efficiently coded in some subset of base64, 
eg. the first 16 letters of the alphabet.

I'd like this to work within the official hashcash distribution, but if 
that is not possible, I hope there is no objection to my implementing a 
standalone minter.

from:     Jonathan "Chromatix" Morton
mail:     chromi@xxxxxxxxxxxxxxxxxxxxx
website:  http://www.chromatix.uklinux.net/
tagline:  The key to knowledge is not to rely on people to teach you it.

Other related posts: