[hashcash] Re: libfastsha1 vs libfastmint
- From: Jonathan Morton <chromi@xxxxxxxxxxxxxxxxxxxxx>
- To: hashcash@xxxxxxxxxxxxx
- Date: Fri, 11 Jun 2004 16:07:52 +0100
I've been pondering if a more general function for the fast minting
code might be as a fast partially-precomputed SHA1 library.
The "best" one could do I suppose would be to give the minter a
bit-mask of bytes in the input set which are to be pre-computed, and
have it output a sha1 context with as much of that precomputed as
possible.
The problem is, there's only about 3 words in the block that it makes
sense to do about half of this work-reduction optimisation for. I've
already coded in support for 2 of them (on about 80% of the minters so
far), and for hashcash it doesn't make much sense to add the third
(because it's immediately before one of the others).
The other half of the optimisation only gives single-digit percentage
performance gains (at best) on it's own.
The other consideration is best described as "throughput versus
latency" design. I can point you to an Apple tech article about this
if you like. In the vast majority of cryptographic situations (except
for cracking, which hashcash is technically), you only have one piece
or chained sequence of data to work on at a time.
The generic libsha1.c implementation is what's known as a low-latency
design, which is optimised for single pieces of data, or a single
chained sequence of data. You call it with a bundle of data, it does
the job as quickly as possible, and returns the hash. It's actually
quite efficient for general data.
My libfastmint implementations are high-throughput designs, especially
the vectorised ones - it simply doesn't make sense to try and use them
on a single piece of data, because they're designed to work on up to 8
blocks at once, and would run much slower if adapted to run at partial
capacity.
Because of the way cryptographic blocks are chained together, you only
have multiple blocks to work on if you're cracking something, or if you
have multiple genuine data streams to process at once. I gather it's
fairly hard to arrange for the latter.
As far as I know none of the other libraries attempt to do this. What
people who are trying to be efficient do is just hash in the key part,
and copy that state. So they are precomputing but only whole blocks.
In general, this is really all you can do. At some point of
generality, you really do have to quit looking for tiny optimisations
and just do the job. Hashcash really is a special case, in that we
*can* do a few neat optimisations (above and beyond the
design-for-throughput basics) without looking silly. Even then, we're
getting less than 20% faster, in general.
Besides, as I've already pointed out, chances are libsha1.c is faster
than your network interface already. :)
--------------------------------------------------------------
from: Jonathan "Chromatix" Morton
mail: chromi@xxxxxxxxxxxxxxxxxxxxx
website: http://www.chromatix.uklinux.net/
tagline: The key to knowledge is not to rely on people to teach you it.
Other related posts: