[hashcash] Re: Hashcash performance improvements
- From: Jonathan Morton <chromi@xxxxxxxxxxxxxxxxxxxxx>
- To: hashcash@xxxxxxxxxxxxx
- Date: Sun, 30 May 2004 03:47:37 +0100
>> This makes a slight conflict between the senders interests and the
>> receivers (hostile sender may send massive stamp which costs nothing
>> extra practially to compute). If we cared about this we would ideally
>> want to enforce small stamps on sender (measured in SHA1 blocks) to
>> make the receivers verification fast. Simple "enforcement" would be
>> if it cost the sender disproportionately more.
>
> As Jonathan has shown, the time taken to check a stamp is negligible,
> however
> there is still a matter of storage space. Quoting Adam from 29th
> April:
>
> | > Does the recipient even need to care about the random field?
> |
> | He might care to enable him to store something compact in his double
> | spend database. He need only store the date, recipient, and random
> | field.
>
> A maliciously large random field could then be used to fill up
> somebody's
> double spend database, or at least hurt performance if the DB is
> designed
> on the assumption that it contains small data. I think it might be
> fair
> to place some arbitrary limit on stamp size to protect against
> extremes -
> 5000 bytes for example. This should be much more than enough to
> satisfy
> any conceivable future expansion, but small enough to restrict
> processing
> time and storage space to manageable (negligible?) levels.
I think this is fair. Is there already a size limit on e-mail headers,
though? If so, we should use that as a guide.
>> One way to do this would be to put the counter at the beginning,
>> however this is typically dangerous because then work can be
>> pre-computed and shared across multiple recipients.
>
> I don't think there's a significant problem here. Carrying out some
> precomputation once for many recipients instead of doing it once for
> each recipient will only save around a second per many million
> recipients.
Also, storing enough precomputed counters to make the job worthwhile
would cause memory bandwidth to be a limiting factor, which is going to
be a lot slower than just computing the thing in the first place.
> All in all I think there should be a limited set of safe characters
> which
> will never change. Hex digits is one option, base64 is another.
>
> That applies to both the random and counter fields, by the way.
Strongly agree. The base64 alphabet is well-established and relatively
easy to encode, because we can easily extract a set of 6-bit fields
from any number and use them in a lookup table. Why make life hard for
ourselves?
>> It might also help in some sense if we chose things which are simple
>> to code efficiently, with no unrolled pre-computed tricks available to
>> keep code simple if that makes same overhead for spammer and normal
>> user and results in simpler, less unrolled pre-computed, funny-order
>> counted, manipulated aligned field tricked etc.
>
>> (So think about that also in your optimization -- what would help
>> fairly prevent this optimization and keep things simpler -- we could
>> adapt v1 spec.)
>
> Interesting, I never thought of making optimisations impossible, only
> taking advantage of them. My opinion now is that some optimisation
> will
> always be available to those who are willing to find tricks and create
> the
> required spaghetti code. Creating a system which cannot be optimised
> would be about as difficult as creating a secure hash function in the
> first place.
>
> I think the stamp format is fine as it is and would not benefit from
> any
> rearrangement simply to enforce minting times. Remember we are only
> talking
> about 10-15% improvements over an efficiently implemented SHA-1
> function.
> If hashcash is too easy to compute, this is equivalent to saying that
> SHA-1 is broken.
Also agreed. I'd rather have a format where the potential
optimisations are relatively obvious, and get within a few percent of
that ideal, than one where we've eliminated all the obvious
optimisations but in which an attacker might find others (which he
isn't going to share with us!).
>> Izzy Kindred was also suggesting earlier that version 1 should support
>> multiple sub-puzzles; if we do this it will make align at end of SHA1
>> input block work less well. As there will be multiple counters.
>
> Simply align every counter by padding out to the next boundary:
>
> 1:10,4:040525:foo::c82a215126afb2c2:0000000000000000416,
> 000000000000000000000000000000000000000000000000000000000005970,
> 000000000000000000000000000000000000000000000000000000000006119,
> 000000000000000000000000000000000000000000000000000000000007023
>
> Line breaks added for readability. The first portion (not including
> the
> comma) is 55 bytes, then each counter adds 64 bytes == 1 block. Best
> computation time, horrible to look at though. The extra puzzles just
> added an incentive to add more padding.
How about if the secondary counters are the same length as the primary?
Then they can be overlaid on the primary field, so there's less
padding involved - and leading zeroes could be omitted from the
secondaries. The problem then changes to "find the first N counter
values that produce X-bit value", which is on-average equivalent to
"find the first counter value that produces (X+log2(N))-bit value", and
remains easy to implement.
The above example could be recoded:
1:10,4:040525:foo::c82a215126afb2c2:0000000000000000416,5970,6119,7023
>> It would be easy enough to change the second condition to something
>> like "...will be aligned at the end of the block".
>
> Having considered how to tie these fast minting routines together, and
> looked over some other issues, here is what I think:
>
> Padding should be added to the counter, not to the random field. Where
> you have defined the setup to be
>
> - Common code assembles stamp string,
> - Efficient mint function appends random and calculates counter,
>
> a better way would be
>
> - Common code assembles stamp including random,
> - Efficient mint function calculates counter.
I already do something like this, actually. The random field is added
and the SHA-1 block is assembled by a central routine, which then looks
up a function pointer to the optimised minter in a table and calls it.
The optimised minter then overwrites part of the pre-assembled SHA-1
block with the counter.
Now, admittedly, padding with random data does potentially consume more
entropy from the system's RNG, of which there is often a limited
supply, but I don't know if that's significant or not, considering
there's a whole lot of computation going on in-between each set of RNG
calls. For systems that are specifically built to bulk-generate
hashcash, I suppose any such limitation could be circumvented by adding
a hardware entropy generator - which is already present in a
significant fraction of x86 systems.
>> That means we could potentially be looking
>> at a line-full (63 bytes) of extra padding near the end of the stamp,
>> depending on the size of the resource and extension fields. Does this
>> matter?
>
> I guess we could just provide a selection of minting functions, some
> "fast", others "small". There's no one right answer to the question of
> which is more important.
True. I think most people, at least at the beginning, will have
relatively small stamps, so padding them to the (55+N*64)-byte mark
will still result in a 1-line stamp, which is quite acceptable. One
wonders, however, about the minority of people who have unusually large
e-mail addresses, or need to use the extension field for something.
>> At the moment, I'm working on functions which try to be generically
>> efficient on any compatible processor, rather than trying to
>> micro-optimise for any particular model. I think I'll leave that part
>> up to you. :)
>
> I agree, and we're making good progress here at finding speed tricks
> which
> can apply to all processors. Now put your thinking caps on, because
> here's some more potential speedups:
>
> <snip>
> So W[12], which is bytes 48..51 of the block, does *not* affect the
> values
> of 11 of the W values which are calculated. This means that these 11
> values can be cached for all blocks which only vary in those bytes.
> This
> also fits in very conveniently with the precalculating trick suggested
> before.
Bear in mind that for a single-pipe routine running on a superscalar
processor, the XOR and shift calculations are essentially free, because
they can very easily be overlapped with the other half of each round
calculation.
Even for a double-pipe routine, omitting the XOR calculations only
begins to make sense when less than four bitwise operations can be done
in parallel, which is still only true for a minority of modern
processors (which happens to include the G3).
The XOR stage can also be done one or two rounds in advance without
consuming any extra load-store bandwidth or more than one additional
register, which is particularly helpful on deeply pipelined machines.
At the moment, the single-pipe routines I have are still faster than
the double-pipe ones (except, paradoxically, on the P4), though mainly
because neither GCC nor CodeWarrior seem capable of keeping track of
the huge number of operations and temporaries in a double-pipe routine.
I've done all kinds of juggling to try and make the compiler behave,
but it's been only slightly more effective than headbutting a brick
wall.
I believe I can still get more performance out of a double-pipe
algorithm, but I'll have to do it by writing assembly (or, preferably,
generating it using a custom macro processor) rather than relying on
the compiler. I think the time to do that is after implementing the
single-pipe MMX minter, which I plan to do today.
The custom macro processor would also allow special improvements to the
single-pipe routines, including storing the entire 16-entry circular W
buffer in registers (on the PowerPC, that is - it's the only popular
machine with enough registers). Regrettably, I don't know of any
architecture (besides the Itanic, which doesn't really count) which has
enough registers to store two versions of the W buffer at once!
Just to recap, I'd like to make my goals plain:
- Good performance on the vast majority of consumer hardware, so few
consumers need to "buy" stamps rather than generating them. The
improvements on x86 so far, and the potential performance using
carefully optimised MMX, combined with the fairly spectacular results
on scalar PowerPC, all look promising for this. If possible, extension
of tests to 68K and ARM hardware would be helpful, as these
architectures are in wide use in the mobile arena.
- At the very least, I want low-value stamps (which I term
"signature-grade") to be available to all reasonable hardware (meaning
"anything that can run a TCP/IP stack") within a few seconds. I
already know this is practical, because I pseudo-implemented a
generator for the 6502 - it turned out to be capable of generating an
8-bit token in about 5 seconds. :)
- Excellent to spectacular performance on commodity server-grade
hardware, so that the cost of buying stamps is low for those who still
need to. The good first-generation results using Altivec, and the
potential for similar performance using SSE2 on AMD64, seem very
promising on this front.
- Better performance on conventional hardware also makes the economics
of building custom hardware less attractive. This should at least
delay the "arms race", even if it doesn't eliminate it.
--------------------------------------------------------------
from: Jonathan "Chromatix" Morton
mail: chromi@xxxxxxxxxxxxxxxxxxxxx
website: http://www.chromatix.uklinux.net/
tagline: The key to knowledge is not to rely on people to teach you it.
- Follow-Ups:
- [hashcash] Re: Hashcash performance improvements
- From: Jonathan Morton
- References:
- [hashcash] Re: Hashcash performance improvements
- From: Malcolm Howell
Other related posts:
- » [hashcash] Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- » [hashcash] Re: Hashcash performance improvements
- [hashcash] Re: Hashcash performance improvements
- From: Jonathan Morton
- [hashcash] Re: Hashcash performance improvements
- From: Malcolm Howell