[hashcash] Re: libfastsha1 vs libfastmint

  • From: Jonathan Morton <chromi@xxxxxxxxxxxxxxxxxxxxx>
  • To: hashcash@xxxxxxxxxxxxx
  • Date: Fri, 11 Jun 2004 16:07:52 +0100

I've been pondering if a more general function for the fast minting
code might be as a fast partially-precomputed SHA1 library.

The "best" one could do I suppose would be to give the minter a
bit-mask of bytes in the input set which are to be pre-computed, and
have it output a sha1 context with as much of that precomputed as
possible.

The problem is, there's only about 3 words in the block that it makes sense to do about half of this work-reduction optimisation for. I've already coded in support for 2 of them (on about 80% of the minters so far), and for hashcash it doesn't make much sense to add the third (because it's immediately before one of the others).


The other half of the optimisation only gives single-digit percentage performance gains (at best) on it's own.

The other consideration is best described as "throughput versus latency" design. I can point you to an Apple tech article about this if you like. In the vast majority of cryptographic situations (except for cracking, which hashcash is technically), you only have one piece or chained sequence of data to work on at a time.

The generic libsha1.c implementation is what's known as a low-latency design, which is optimised for single pieces of data, or a single chained sequence of data. You call it with a bundle of data, it does the job as quickly as possible, and returns the hash. It's actually quite efficient for general data.

My libfastmint implementations are high-throughput designs, especially the vectorised ones - it simply doesn't make sense to try and use them on a single piece of data, because they're designed to work on up to 8 blocks at once, and would run much slower if adapted to run at partial capacity.

Because of the way cryptographic blocks are chained together, you only have multiple blocks to work on if you're cracking something, or if you have multiple genuine data streams to process at once. I gather it's fairly hard to arrange for the latter.

As far as I know none of the other libraries attempt to do this.  What
people who are trying to be efficient do is just hash in the key part,
and copy that state.  So they are precomputing but only whole blocks.

In general, this is really all you can do. At some point of generality, you really do have to quit looking for tiny optimisations and just do the job. Hashcash really is a special case, in that we *can* do a few neat optimisations (above and beyond the design-for-throughput basics) without looking silly. Even then, we're getting less than 20% faster, in general.


Besides, as I've already pointed out, chances are libsha1.c is faster than your network interface already. :)

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@xxxxxxxxxxxxxxxxxxxxx
website:  http://www.chromatix.uklinux.net/
tagline:  The key to knowledge is not to rely on people to teach you it.


Other related posts: