[hashcash] Re: Hashcash performance improvements

  • From: Jonathan Morton <chromi@xxxxxxxxxxxxxxxxxxxxx>
  • To: hashcash@xxxxxxxxxxxxx
  • Date: Thu, 27 May 2004 11:05:02 +0100

> this may have some effect on the approach you use to short-cut the
> code outside of SHA1_Transform, as the count has been separated from
> the random/collision avoidance field.  Hopefully if anything it should
> make things faster as the counter is simpler (no random offset, just
> incrementing counter).
It doesn't really help.  The code I have now just statically cracks a 
32-bit counter into a 6-character encoding using the base64 alphabet, 
and plants it at the tail of the message, which has been prepared with 
a liberal quantity of random data.  This is actually pretty fast - much 
faster than trying to manage a variable-length string which you then 
have to rejig the SHA-1 trailer for.

That said, it doesn't really hurt either.  I can always stick a colon 
between my random junk and the counter space, and the 
performance-critical code won't change a bit.

> Not sure if we want to fold in speed improvements into pre 1.0 or not.
> If people are going to continue using perhaps, but on the other hand
> better to move forwards.

I'm not trying to put anything in the main hashcash program right now, 
just building a library of routines and a separate test routine, so it 
should be easy for you to integrate yourself.  I'm attaching my work so 
far, so you can see what I mean and play around with it.

> Also note that on redhat I compile with -DOPENSSL which then uses
> openssl assembler optimized SHA1 rather than my C optimized SHA1.

The overhead is a very significant factor - possibly more than the SHA 
code itself - as the performance improvements I've just realised prove. 
  With the overhead eliminated, it is now worthwhile to start 
introducing low-level optimisations such as assembly and vectorisation.

              MHz   Before    After
Athlon-XP   1600  1000000  2000000
PPC7450      667   680000  1250000
Pentium-MMX  200   104000   170000

I found out what was wrong with the PPC/GCC code - GCC does something 
very stupid indeed when faced with a big, flat function like this, 
which means I have to force it to "-O2 -fno-schedule-insns 
-fschedule-insns2".  Once that's done, I get better than 1.2 million 
per second from my 667MHz G4 - better even than CodeWarrior managed.  
Fortunately, it seems to be a platform-specific quirk.

Attached are the necessary source files to use libfastmint in it's 
present form.  Also present is an altered version of random.c which 
knows about OSX's /dev/urandom support.

from:     Jonathan "Chromatix" Morton
mail:     chromi@xxxxxxxxxxxxxxxxxxxxx
website:  http://www.chromatix.uklinux.net/
tagline:  The key to knowledge is not to rely on people to teach you it.

-- Binary/unsupported file stripped by Ecartis --
-- Type: application/zip
-- File: fastmint.zip

-- Binary/unsupported file stripped by Ecartis --
-- Type: application/x-gzip
-- File: random.c.gz

Other related posts: