[hashcash] Re: Hashcash performance improvements

  • From: Jonathan Morton <chromi@xxxxxxxxxxxxxxxxxxxxx>
  • To: hashcash@xxxxxxxxxxxxx
  • Date: Sun, 30 May 2004 15:04:02 +0100

> I believe I can still get more performance out of a double-pipe
> algorithm, but I'll have to do it by writing assembly (or, preferably,
> generating it using a custom macro processor) rather than relying on
> the compiler.  I think the time to do that is after implementing the
> single-pipe MMX minter, which I plan to do today.
Good news:  I managed to write an MMX minter that is noticeably faster 
than ANSI code on my Athlon, and *considerably* faster than ANSI on my 
Pentium-MMX.  20-bit hashcash times for the Athlon (1.6GHz) and G4 are 
now well below half a second, and they're below 4 seconds on the 
Pentium-MMX (200MHz).

Bad news:  To do so, I had to effectively write the whole thing in 
assembler - the compiler just couldn't make efficient code using the 
built-in intrinsics.  (It also really didn't help that there were no 
intrinsics mapped to the "shift left" and "shift right" instructions, 
which I needed to emulate a rotate.)  That means I had to choose 
between GNU and Intel syntax, and of course I chose GNU.  Good luck 
getting it to compile under VC++.

More bad news (for PC fans, anyway):  This is about as fast as the 
classic Athlon is going to get, unless Malcolm wants to dig in and 
micro-optimise it.  It's about as fast as my 667MHz G4, which is pretty 
embarrassing, especially as I haven't hand-optimised the Altivec code 
yet.  Unfortunately, the x86/MMX ISA is just missing too many really 
useful things, which the PowerPC/Altivec ISA does have.  I haven't yet 
looked at SSE2 to see if it's any better.

NB: I haven't yet put in the work-reduction tricks Malcolm suggested.  
The following numbers are all from brute-force algorithms.  Also, the 
"MMX Compact" minter is still on GCC intrinsics, and is therefore 
painfully suboptimal.  The "MMX Standard" minter is the hand-coded 
version.

Here are the benchmark outputs:

Athlon-XP 1600MHz (compiled for Pentium-MMX)
     Rate  Name (* machine default)
   1784638 ANSI Compact 1-pipe
   1567587 ANSI Standard 1-pipe
   1705904 ANSI Compact 2-pipe
   1526335 ANSI Standard 2-pipe
    ---    PowerPC Altivec Standard 1x4-pipe  (Not available on this 
machine)
    ---    PowerPC Altivec Standard 2x4-pipe  (Not available on this 
machine)
   2697709 AMD64/x86 MMX Standard 1x2-pipe
   1487198 AMD64/x86 MMX Compact 1x2-pipe *
Best minter: AMD64/x86 MMX Standard 1x2-pipe (2697709 hashes/sec)

Pentium-MMX 200MHz
     Rate  Name (* machine default)
    169841 ANSI Compact 1-pipe
    104883 ANSI Standard 1-pipe
    135357 ANSI Compact 2-pipe
    138591 ANSI Standard 2-pipe
    ---    PowerPC Altivec Standard 1x4-pipe  (Not available on this 
machine)
    ---    PowerPC Altivec Standard 2x4-pipe  (Not available on this 
machine)
    303668 AMD64/x86 MMX Standard 1x2-pipe
    180687 AMD64/x86 MMX Compact 1x2-pipe *
Best minter: AMD64/x86 MMX Standard 1x2-pipe (303668 hashes/sec)

PowerPC 7450 667MHz (GCC 3.3)
     Rate  Name (* machine default)
    682361 ANSI Compact 1-pipe
   1247327 ANSI Standard 1-pipe
    811199 ANSI Compact 2-pipe
   1008708 ANSI Standard 2-pipe
   2416697 PowerPC Altivec Standard 1x4-pipe
   1221068 PowerPC Altivec Standard 2x4-pipe *
    ---    AMD64/x86 MMX Standard 1x2-pipe  (Not available on this 
machine)
    ---    AMD64/x86 MMX Compact 1x2-pipe  (Not available on this 
machine)
Best minter: PowerPC Altivec Standard 1x4-pipe (2416697 hashes/sec)

PowerPC 7450 667MHz (CodeWarrior 8)
     Rate  Name (* machine default)
    690485 ANSI Compact 1-pipe
   1017557 ANSI Standard 1-pipe
    758179 ANSI Compact 2-pipe
    892319 ANSI Standard 2-pipe
   2636397 PowerPC Altivec Standard 1x4-pipe
   1195891 PowerPC Altivec Standard 2x4-pipe *
    ---    AMD64/x86 MMX Standard 1x2-pipe  (Not available on this 
machine)
    ---    AMD64/x86 MMX Compact 1x2-pipe  (Not available on this 
machine)
Best minter: PowerPC Altivec Standard 1x4-pipe (2636397 hashes/sec)

PowerPC 750 400MHz (GCC 3.3)
     Rate  Name (* machine default)
    401389 ANSI Compact 1-pipe
    659099 ANSI Standard 1-pipe *
    465869 ANSI Compact 2-pipe
    527279 ANSI Standard 2-pipe
    ---    PowerPC Altivec Standard 1x4-pipe  (Not available on this 
machine)
    ---    PowerPC Altivec Standard 2x4-pipe  (Not available on this 
machine)
    ---    AMD64/x86 MMX Standard 1x2-pipe  (Not available on this 
machine)
    ---    AMD64/x86 MMX Compact 1x2-pipe  (Not available on this 
machine)
Best minter: ANSI Standard 1-pipe (659099 hashes/sec)

The updated source is attached - I'd like to see what the P4 makes of 
the MMX code.

Oh, one further caveat:  the MMX detection code will probably fail 
noisily on very old machines without the CPUID instruction.  I forget 
which CPUs it was introduced with.  If someone wants to fix this 
shortcoming, they're welcome.  In the meantime, the workaround is to 
compile with -march={i386|i486} and without -mmmx (or, alternatively, 
using a non-GCC compiler) - this will short-circuit the whole source 
file out of the binary.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@xxxxxxxxxxxxxxxxxxxxx
website:  http://www.chromatix.uklinux.net/
tagline:  The key to knowledge is not to rely on people to teach you it.



-- Binary/unsupported file stripped by Ecartis --
-- Type: application/zip
-- File: libfastmint.zip



Other related posts: