[hashcash] a note about hyperthreading (was: parallel hashcash)

Hal> I did a little research then and it seems that most hyperthreading
Hal> benchmarks show similar results of only a few percent increase at
Hal> best.  I have to say that this CPU technology is more hype than
Hal> hyper.

I try to avoid the hype. ;-)

I think it all depends on the type of application you're running. If
both processes are numeric-intensive (or are doing the exact same
thing), I guess hyperthreading won't help, since you're still only using
one core. If your processes are doing vastly different things, like in
a typical office app or game, hyperthreading is probably more of a
winner. I don't think hashcash is, or ever will be, an application that
can take advantage of hyperthreading, since it just does a whole bunch
of integer calculations.

Hyperthreading is supposed to help the "front end" part of the processor find more instructions per clock to send to the execution units. Thus it only really helps raw performance if *two* conditions hold:


- Each process, if run on it's own, would leave execution units unused during a significant percentage of clock cycles. This generally happens when there is a lot of random branching and/or memory access going on, but it also often happens with floating-point code on the P4.

- The other process(es) make use of the unused unit-cycles. (A particular hyperthreading implementation, eg. Niagara, may use more than two threads per core.) If the first process is doing a lot of branching, or uses FP code that's not already optimised for the P4, that's quite easy to do. If the problem is memory access, then if the second process is also running heavy memory access, there may be a conflict of resources that have nothing to do with execution units, and total performance could actually decrease. Where difficulties really begin, however, is where both processes are already using the same execution units quite efficiently.

On the P4, running two hashcash threads together doesn't satisfy these conditions. As Hubert pointed out, hashcash uses integer instructions (specifically, bitwise logic and addition) pretty much exclusively. The P4 allegedly has a pair of double-pumped ALUs that *should* be equivalent to four ALUs at normal clock speed, but the evidence suggests that half the clock cycles on each one can only be used for address generation, not for real work. Thus, the P4 tends to be limited by the execution units for this particular algorithm, not by the front end.

What hyperthreading *is* useful for, in a hashcash context, is allowing ordinary user applications to remain responsive while hashcash is churning away in the background. Most "business" applications do a lot of memory access and branching, or else they do floating-point calculations, both of which can be neatly slipped into the execution backend of the P4 without interfering much with the hashcash thread. Interestingly, this last is roughly what Intel marketing actually describes.

Demos performed by a variety of hardware-review sites back this up, although they usually do this on Windows where hyperthreading gets a large advantage from an unexpected source. This source is actually Windows itself... the SMP and UP kernels have different schedulers, and the SMP one is considerably more intelligent. This is unlike Linux, where the SMP and UP kernel variants use the same scheduler, but the UP kernel is able to take some shortcuts because it can assume only one CPU is being used (it still runs on SMP boxes, but "parks" the unused CPUs so that they cannot interfere). Microsoft persist in using their braindead UP kernel on non-SMP and non-hyperthreading PCs, which means that a hyperthreading P4 gets an almost unfair boost in performance. The Linux UP kernel gets a modest performance gain over the SMP kernel on UP hardware, simply because of the shortcuts.

Hyperthreading may provide a direct performance benefit on other processor architectures, however. In particular, the IBM POWER5 uses hyperthreading in conjunction with an unusually wide backend, in an attempt to obtain more-than-dualcore performance from an equivalent transistor count to implementing dual cores. Multithreading at the application level will also, obviously, speed up true dualcore and SMP machines. It should therefore be left up to the user to decide whether to turn this on.

Finally, I've found an application which might have been designed for hyperthreading, if I didn't know for certain that it's been around for over a decade. It's about as different from hashcash as possible. While it is not presently multithreaded, it could easily be split into a producer thread (doing heavy FPU work to iterate over an IFS-type fractal) and a consumer thread (doing lots of random memory access to render the fractal data points into an image). For SMP boxes, multiple producer and consumer threads would also be beneficial.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@xxxxxxxxxxxxxxxxxxxxx
website:  http://www.chromatix.uklinux.net/
tagline:  The key to knowledge is not to rely on people to teach you it.


Other related posts: