> I believe I can still get more performance out of a double-pipe > algorithm, but I'll have to do it by writing assembly (or, preferably, > generating it using a custom macro processor) rather than relying on > the compiler. I think the time to do that is after implementing the > single-pipe MMX minter, which I plan to do today. Good news: I managed to write an MMX minter that is noticeably faster than ANSI code on my Athlon, and *considerably* faster than ANSI on my Pentium-MMX. 20-bit hashcash times for the Athlon (1.6GHz) and G4 are now well below half a second, and they're below 4 seconds on the Pentium-MMX (200MHz). Bad news: To do so, I had to effectively write the whole thing in assembler - the compiler just couldn't make efficient code using the built-in intrinsics. (It also really didn't help that there were no intrinsics mapped to the "shift left" and "shift right" instructions, which I needed to emulate a rotate.) That means I had to choose between GNU and Intel syntax, and of course I chose GNU. Good luck getting it to compile under VC++. More bad news (for PC fans, anyway): This is about as fast as the classic Athlon is going to get, unless Malcolm wants to dig in and micro-optimise it. It's about as fast as my 667MHz G4, which is pretty embarrassing, especially as I haven't hand-optimised the Altivec code yet. Unfortunately, the x86/MMX ISA is just missing too many really useful things, which the PowerPC/Altivec ISA does have. I haven't yet looked at SSE2 to see if it's any better. NB: I haven't yet put in the work-reduction tricks Malcolm suggested. The following numbers are all from brute-force algorithms. Also, the "MMX Compact" minter is still on GCC intrinsics, and is therefore painfully suboptimal. The "MMX Standard" minter is the hand-coded version. Here are the benchmark outputs: Athlon-XP 1600MHz (compiled for Pentium-MMX) Rate Name (* machine default) 1784638 ANSI Compact 1-pipe 1567587 ANSI Standard 1-pipe 1705904 ANSI Compact 2-pipe 1526335 ANSI Standard 2-pipe --- PowerPC Altivec Standard 1x4-pipe (Not available on this machine) --- PowerPC Altivec Standard 2x4-pipe (Not available on this machine) 2697709 AMD64/x86 MMX Standard 1x2-pipe 1487198 AMD64/x86 MMX Compact 1x2-pipe * Best minter: AMD64/x86 MMX Standard 1x2-pipe (2697709 hashes/sec) Pentium-MMX 200MHz Rate Name (* machine default) 169841 ANSI Compact 1-pipe 104883 ANSI Standard 1-pipe 135357 ANSI Compact 2-pipe 138591 ANSI Standard 2-pipe --- PowerPC Altivec Standard 1x4-pipe (Not available on this machine) --- PowerPC Altivec Standard 2x4-pipe (Not available on this machine) 303668 AMD64/x86 MMX Standard 1x2-pipe 180687 AMD64/x86 MMX Compact 1x2-pipe * Best minter: AMD64/x86 MMX Standard 1x2-pipe (303668 hashes/sec) PowerPC 7450 667MHz (GCC 3.3) Rate Name (* machine default) 682361 ANSI Compact 1-pipe 1247327 ANSI Standard 1-pipe 811199 ANSI Compact 2-pipe 1008708 ANSI Standard 2-pipe 2416697 PowerPC Altivec Standard 1x4-pipe 1221068 PowerPC Altivec Standard 2x4-pipe * --- AMD64/x86 MMX Standard 1x2-pipe (Not available on this machine) --- AMD64/x86 MMX Compact 1x2-pipe (Not available on this machine) Best minter: PowerPC Altivec Standard 1x4-pipe (2416697 hashes/sec) PowerPC 7450 667MHz (CodeWarrior 8) Rate Name (* machine default) 690485 ANSI Compact 1-pipe 1017557 ANSI Standard 1-pipe 758179 ANSI Compact 2-pipe 892319 ANSI Standard 2-pipe 2636397 PowerPC Altivec Standard 1x4-pipe 1195891 PowerPC Altivec Standard 2x4-pipe * --- AMD64/x86 MMX Standard 1x2-pipe (Not available on this machine) --- AMD64/x86 MMX Compact 1x2-pipe (Not available on this machine) Best minter: PowerPC Altivec Standard 1x4-pipe (2636397 hashes/sec) PowerPC 750 400MHz (GCC 3.3) Rate Name (* machine default) 401389 ANSI Compact 1-pipe 659099 ANSI Standard 1-pipe * 465869 ANSI Compact 2-pipe 527279 ANSI Standard 2-pipe --- PowerPC Altivec Standard 1x4-pipe (Not available on this machine) --- PowerPC Altivec Standard 2x4-pipe (Not available on this machine) --- AMD64/x86 MMX Standard 1x2-pipe (Not available on this machine) --- AMD64/x86 MMX Compact 1x2-pipe (Not available on this machine) Best minter: ANSI Standard 1-pipe (659099 hashes/sec) The updated source is attached - I'd like to see what the P4 makes of the MMX code. Oh, one further caveat: the MMX detection code will probably fail noisily on very old machines without the CPUID instruction. I forget which CPUs it was introduced with. If someone wants to fix this shortcoming, they're welcome. In the meantime, the workaround is to compile with -march={i386|i486} and without -mmmx (or, alternatively, using a non-GCC compiler) - this will short-circuit the whole source file out of the binary. -------------------------------------------------------------- from: Jonathan "Chromatix" Morton mail: chromi@xxxxxxxxxxxxxxxxxxxxx website: http://www.chromatix.uklinux.net/ tagline: The key to knowledge is not to rely on people to teach you it. -- Binary/unsupported file stripped by Ecartis -- -- Type: application/zip -- File: libfastmint.zip