[hashcash] Re: Hashcash performance improvements
- From: Jonathan Morton <chromi@xxxxxxxxxxxxxxxxxxxxx>
- To: hashcash@xxxxxxxxxxxxx
- Date: Sun, 30 May 2004 15:04:02 +0100
> I believe I can still get more performance out of a double-pipe
> algorithm, but I'll have to do it by writing assembly (or, preferably,
> generating it using a custom macro processor) rather than relying on
> the compiler. I think the time to do that is after implementing the
> single-pipe MMX minter, which I plan to do today.
Good news: I managed to write an MMX minter that is noticeably faster
than ANSI code on my Athlon, and *considerably* faster than ANSI on my
Pentium-MMX. 20-bit hashcash times for the Athlon (1.6GHz) and G4 are
now well below half a second, and they're below 4 seconds on the
Pentium-MMX (200MHz).
Bad news: To do so, I had to effectively write the whole thing in
assembler - the compiler just couldn't make efficient code using the
built-in intrinsics. (It also really didn't help that there were no
intrinsics mapped to the "shift left" and "shift right" instructions,
which I needed to emulate a rotate.) That means I had to choose
between GNU and Intel syntax, and of course I chose GNU. Good luck
getting it to compile under VC++.
More bad news (for PC fans, anyway): This is about as fast as the
classic Athlon is going to get, unless Malcolm wants to dig in and
micro-optimise it. It's about as fast as my 667MHz G4, which is pretty
embarrassing, especially as I haven't hand-optimised the Altivec code
yet. Unfortunately, the x86/MMX ISA is just missing too many really
useful things, which the PowerPC/Altivec ISA does have. I haven't yet
looked at SSE2 to see if it's any better.
NB: I haven't yet put in the work-reduction tricks Malcolm suggested.
The following numbers are all from brute-force algorithms. Also, the
"MMX Compact" minter is still on GCC intrinsics, and is therefore
painfully suboptimal. The "MMX Standard" minter is the hand-coded
version.
Here are the benchmark outputs:
Athlon-XP 1600MHz (compiled for Pentium-MMX)
Rate Name (* machine default)
1784638 ANSI Compact 1-pipe
1567587 ANSI Standard 1-pipe
1705904 ANSI Compact 2-pipe
1526335 ANSI Standard 2-pipe
--- PowerPC Altivec Standard 1x4-pipe (Not available on this
machine)
--- PowerPC Altivec Standard 2x4-pipe (Not available on this
machine)
2697709 AMD64/x86 MMX Standard 1x2-pipe
1487198 AMD64/x86 MMX Compact 1x2-pipe *
Best minter: AMD64/x86 MMX Standard 1x2-pipe (2697709 hashes/sec)
Pentium-MMX 200MHz
Rate Name (* machine default)
169841 ANSI Compact 1-pipe
104883 ANSI Standard 1-pipe
135357 ANSI Compact 2-pipe
138591 ANSI Standard 2-pipe
--- PowerPC Altivec Standard 1x4-pipe (Not available on this
machine)
--- PowerPC Altivec Standard 2x4-pipe (Not available on this
machine)
303668 AMD64/x86 MMX Standard 1x2-pipe
180687 AMD64/x86 MMX Compact 1x2-pipe *
Best minter: AMD64/x86 MMX Standard 1x2-pipe (303668 hashes/sec)
PowerPC 7450 667MHz (GCC 3.3)
Rate Name (* machine default)
682361 ANSI Compact 1-pipe
1247327 ANSI Standard 1-pipe
811199 ANSI Compact 2-pipe
1008708 ANSI Standard 2-pipe
2416697 PowerPC Altivec Standard 1x4-pipe
1221068 PowerPC Altivec Standard 2x4-pipe *
--- AMD64/x86 MMX Standard 1x2-pipe (Not available on this
machine)
--- AMD64/x86 MMX Compact 1x2-pipe (Not available on this
machine)
Best minter: PowerPC Altivec Standard 1x4-pipe (2416697 hashes/sec)
PowerPC 7450 667MHz (CodeWarrior 8)
Rate Name (* machine default)
690485 ANSI Compact 1-pipe
1017557 ANSI Standard 1-pipe
758179 ANSI Compact 2-pipe
892319 ANSI Standard 2-pipe
2636397 PowerPC Altivec Standard 1x4-pipe
1195891 PowerPC Altivec Standard 2x4-pipe *
--- AMD64/x86 MMX Standard 1x2-pipe (Not available on this
machine)
--- AMD64/x86 MMX Compact 1x2-pipe (Not available on this
machine)
Best minter: PowerPC Altivec Standard 1x4-pipe (2636397 hashes/sec)
PowerPC 750 400MHz (GCC 3.3)
Rate Name (* machine default)
401389 ANSI Compact 1-pipe
659099 ANSI Standard 1-pipe *
465869 ANSI Compact 2-pipe
527279 ANSI Standard 2-pipe
--- PowerPC Altivec Standard 1x4-pipe (Not available on this
machine)
--- PowerPC Altivec Standard 2x4-pipe (Not available on this
machine)
--- AMD64/x86 MMX Standard 1x2-pipe (Not available on this
machine)
--- AMD64/x86 MMX Compact 1x2-pipe (Not available on this
machine)
Best minter: ANSI Standard 1-pipe (659099 hashes/sec)
The updated source is attached - I'd like to see what the P4 makes of
the MMX code.
Oh, one further caveat: the MMX detection code will probably fail
noisily on very old machines without the CPUID instruction. I forget
which CPUs it was introduced with. If someone wants to fix this
shortcoming, they're welcome. In the meantime, the workaround is to
compile with -march={i386|i486} and without -mmmx (or, alternatively,
using a non-GCC compiler) - this will short-circuit the whole source
file out of the binary.
--------------------------------------------------------------
from: Jonathan "Chromatix" Morton
mail: chromi@xxxxxxxxxxxxxxxxxxxxx
website: http://www.chromatix.uklinux.net/
tagline: The key to knowledge is not to rely on people to teach you it.
-- Binary/unsupported file stripped by Ecartis --
-- Type: application/zip
-- File: libfastmint.zip
- Follow-Ups:
- [hashcash] Re: Hashcash performance improvements
- From: Adam Back
- References:
- [hashcash] Re: Hashcash performance improvements
- From: Malcolm Howell
- [hashcash] Re: Hashcash performance improvements
- From: Jonathan Morton
Other related posts:
- » [hashcash] Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- [hashcash] Re: Hashcash performance improvements
- From: Adam Back
- [hashcash] Re: Hashcash performance improvements
- From: Malcolm Howell
- [hashcash] Re: Hashcash performance improvements
- From: Jonathan Morton