[hashcash] Re: Hashcash performance improvements

  • From: Jonathan Morton <chromi@xxxxxxxxxxxxxxxxxxxxx>
  • To: hashcash@xxxxxxxxxxxxx
  • Date: Thu, 27 May 2004 18:29:15 +0100

> % fastmint_benchtest
>     Rate  Name (* machine default)
>   1520290 ANSI Compact 1-pipe *
>   1689211 ANSI Standard 1-pipe
>
> % hashcash -s
> 840336
>
> 3.06Ghz P4-xeon w. hyperthreading (actually hyperthreading is disabled
> until I install fedora core2 as the keyboard repeat rate goes crazy if
> you enable it on fedora core1).

What optimisation flags are you using?  As a Gentoo user, I tune my 
flags to match each machine I test on:

Athlon:   -O3 -funroll-loops -march=athlon-xp
P-MMX:    -O3 -funroll-loops -march=pentium-mmx  <-- this is reasonably 
generic
G4:       -O2 -fno-schedule-insns -mcpu=7450

It's good to see a similar improvement on the P4, given how different 
it is from the other CPUs, though it's odd to see "compact" being 
slightly slower than "standard" on an x86.  I understand the P4 is very 
sensitive to code optimisations, so we might see some exceptionally 
large step-changes in it's performance with better minters.

> (results from hashcash -s seem a bit quantized, so perhaps I am not
> running long enough, however same test is used while generating 
> hashcash
> so don't want to take too much real-time).

I strongly suggest overhauling timer.c - ATM it uses "wall time" 
instead of "CPU time", which is probably where a lot of the variance 
comes from, and it has rather low resolution on some machines.  I use 
clock() in my code, which tends to have at least 1/60 sec resolution, 
and works on CPU time rather than wall time.  It's also better to use 
several ticks instead of just one, as that smooths out a lot of kinks.

> Well it would be possible to use the other hyperthread, but one thought
> is its nice to leave a thread for non-hashcash things to avoid the
> machine getting sluggish.

I'm shying away from low-level multithreading right now.  For bulk 
operations, it's trivial to have the hashcash server start multiple 
threads at a relatively high level (fork & pipe is probably the most 
portable way), so overall throughput is maximised.  The 
platform-specific details of multithreading aren't something I want to 
deal with in a minter core.  (That doesn't stop someone else from 
adding multithreaded minter cores if they really want to - it just 
means I'm not going to do it myself.)

For single-user operations, most machines capable of multiple threads 
have CPUs which are also capable of more generic optimisations 
(particularly vectorisation and/or multi-piping), which I reckon have a 
bigger payoff.  MMX support, for example, is now nearly universal on 
x86.  By contrast, CPUs capable of such optimisations are not always 
found in multithreading capable systems - eg. there are lot more single 
Athlons than duals.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@xxxxxxxxxxxxxxxxxxxxx
website:  http://www.chromatix.uklinux.net/
tagline:  The key to knowledge is not to rely on people to teach you it.


Other related posts: