[hashcash] Re: Hashcash performance improvements
- From: Jonathan Morton <chromi@xxxxxxxxxxxxxxxxxxxxx>
- To: hashcash@xxxxxxxxxxxxx
- Date: Sun, 30 May 2004 21:26:13 +0100
In case hashcash ever finds its way into an SMTP extension, there is a
limit
of 512 characters on SMTP command lines although this can be extended
by
"SMTP extensions", which I assume means some sort of negotiation
between
client and server.
Well there are the guidelines. In any case, a limit should be named.
Standards normally have arbitrary limits written into them, as without
them
no computer with finite memory could ever claim to be standards
compliant.
I have a feeling the 512 bytes of SMTP should be the guideline, minus
some reasonable amount for the command itself. That's over 6 80-column
lines, BTW.
OTOH, just on the off-chance that there's a really good reason why we
need to put over a half-kilobyte of data in a secured e-mail header
(*cough*), perhaps we might word it as SHOULD be less than 500 and MUST
be less than 5000. The latter is to discourage attacks, the former is
for interoperability.
How about if the secondary counters are the same length as the
primary?
Then they can be overlaid on the primary field, so there's less
padding involved - and leading zeroes could be omitted from the
secondaries.
I don't get what you're saying here. How can the counters be the same
length and also be optimally aligned within blocks? That is the sole
reason
for padding in the first place.
The above example could be recoded:
1:10,4:040525:foo::c82a215126afb2c2:0000000000000000416,5970,6119,7023
Also, you seem to be reading meaning into the leading zeros, which
could
just as easily have been leading 'A's had I used base64 for the
counter.
Perhaps I should manually decode it for you - it's a kind of shorthand
recording only the differences between four similar stamps:
1:10,4:040525:foo::c82a215126afb2c2:0000000000000000416
1:10,4:040525:foo::c82a215126afb2c2:0000000000000005970
1:10,4:040525:foo::c82a215126afb2c2:0000000000000006119
1:10,4:040525:foo::c82a215126afb2c2:0000000000000007023
Each stamp in the group has it's own validity, and these are added
together like postage stamps - you can put three 10p stamps on to cover
a 30p charge, at least in the UK. Because hashcash value is counted in
bits, however, it only makes sense to attach power-of-two numbers of
stamps to this particular envelope.
The "leading zeroes" is misleading - it doesn't matter what fills the
space before the secondary counters, only that it's the same for each.
The secondary counters simply overwrite the tail of the primary stamp.
I'd like to rewind this whole discussion a bit though and point out
that I
don't see the need for the sub-puzzles. Is the only reason that it
makes
minting time more predictable by reducing the variance? I can't see
much
value in that, although maybe I will when I'm desperate to send an
email
quickly :-)
I think that's the primary goal, yes. However, it does open up a
potential resource-consumption hole at the receiver that wasn't there
before. Each sub-stamp has to be tracked as if it were a normal one,
and you can specify a rather large number of stamps using a relatively
small amount of data - thus weakening my earlier argument about the DoS
risks.
We'd need to think carefully about how to limit this damage, and
whether the benefits outweigh the additional complexity in the spec.
An obvious solution is to limit the number of microstamps allowed per
macrostamp - this automatically puts a lower limit on the value of each
microstamp. Processing of a macrostamp should stop as soon as an
invalid microstamp is found, but valid microstamps should be kept so
that the same stamp can't be reused in the same attack.
OK, I'm just sounding off here, it's not something I'm actually
interested in pushing. Since we're already at the stage where all
"respectable" processors take only a few seconds on average to produce
a full-value stamp, is the variance really a problem?
But here's yet another perspective: the Apple Human Interface
Guidelines. This states that where possible, you should provide some
indication of progress while performing a time-consuming task (that is,
likely to take more than 1 second). With a monolithic stamp, this is
near impossible - the best you can do is a beachball/hourglass pointer
and a barber-pole, along with a blithe estimate of the average-case
time. But with microstamps, you can show how many you've done so far,
and actually make a reasonable prediction of when it'll finish, which
is much more helpful to the ordinary user.
The optimised minter then overwrites part of the pre-assembled SHA-1
block with the counter.
It's the overwriting of random that I find distasteful. The sole
purpose
of the minter should be to append a counter that works. It doesn't
even
need to care about the format of the string it's been given to work on.
Mmmm... I think I disagree here. I consider the minter to be the
"value add" part of the program. The random field and the counter work
in concert to provide value.
If the minter only handled the counter, and it was presented with the
same "random" field, it would produce exactly the same stamp - which
would then be considered invalid and therefore worthless by the
recipient.
Now, the counting part of the minter doesn't actually overwrite the
random data. It overwrites a blank field, which is provided after the
random data for that purpose. So don't worry, there's no "waste" of
entropy going on!
Now, admittedly, padding with random data does potentially consume
more
entropy from the system's RNG,
Ah, this was one of my original points against, but I removed it as it
wasn't very convincing. The random field can easily be padded with
non-random data, and who's going to prove you didn't generate it
randomly?
Entirely true - you could take 96 bits of data from /dev/urandom
(making a round 16 characters of base64) and repeat it as much as
necessary.
I'll sum up my position by saying that the minting function's job is to
append a counter to some string, without parsing it in any way, within
the
restrictions of the character set and total length limit (both of
which are
being discussed elsewhere). Tricks such as padding which can be
applied to
the counter string are fair game, modifying the string passed in is
definitely bad form.
Ah, another misunderstanding I think. I'm only appending things to the
string, not overwriting any of it. The random stuff goes first, of
course.
Here's an example of input and output, using the benchmark harness:
static const unsigned int test_bits = 20;
static const char *test_string =
"0:040404:foo@xxxxxxxx:0123456789abcdef00000000000000000";
static const int test_tail = 55; /* must be less than 56 */
[...]
for(i=0; i < num_minters; i++) {
[...]
/* set up SHA-1 block */
strncpy(block, test_string, SHA1_INPUT_BYTES);
block[test_tail] = 0x80;
memset(block+test_tail+1, 0, 59-test_tail);
PUT_WORD(block+60, test_tail << 3);
/* Run minter, with clock running */
end = clock();
while((begin = clock()) == end)
;
got_bits = minters[i].func(test_bits, block, SHA1_IV, test_tail, 1 <<
30);
end = clock();
elapsed = (end-begin) / (double) CLOCKS_PER_SEC;
The above is actually simulating an input of "0:040404:foo@xxxxxxxx:"
to the regular minting routine, except that it appends a fixed "random"
string rather than a real one in order to provide a fair comparison
between the cores. And the output?
2621997 AMD64/x86 MMX Standard 1x2-pipe
Solution: 0:040404:foo@xxxxxxxx:0123456789abcdef0000000000001HDvz
Iterations: 21290621
Time taken: 8.120
There is a slight difference in the real minting routine, which
actually assumes it's going to be fed a v1 stamp and therefore puts a
separator between the random and count fields. I still don't see the
value in that separator, especially if the double-spend database is
going to contain the hash rather than the random field. :)
Bear in mind that for a single-pipe routine running on a superscalar
processor, the XOR and shift calculations are essentially free,
because
they can very easily be overlapped with the other half of each round
calculation.
A fair point. I haven't implemented any of this yet myself so I'm not
familiar with what overlaps well. Still, eliminating operations is
generally good as sometimes those "free" instructions can be used to
get
another pipeline in. And not every processor is superscalar, the
vector
unit on a G4 (7450) effectively isn't as it only does one simple
integer op
per cycle.
Perhaps, and some machines (grrr, MMX) don't have a rotate instruction,
which makes the XOR stage about twice as expensive in both instructions
and registers. I ran into that problem *after* writing that post. GCC
decided to be braindead about it, too, which made it a real problem and
not just an annoyance.
What I did find was that I could calculate any block of four W values
with only one data dependency, which I was able to overlap with two-way
parallelism. A block of three can be done entirely independently of
each other. I also found that W, K, S(5,A) and E could be summed
together while waiting for the result of F(B,C,D), and that these two
processes seem to have similar latencies.
If I'd had more than 8 registers to work with, I'd have put the W
computation in parallel with the sum and function as well, making a
three-way parallel operation that should saturate most modern machines
even without double-piping. Indeed, I plan to do exactly that with
PowerPC scalar and Altivec optimisations, complete with a circular W
buffer that lives entirely in core ... perhaps tomorrow. Who knows, my
TiBook may yet regain the lead over Adam's P4.
--------------------------------------------------------------
from: Jonathan "Chromatix" Morton
mail: chromi@xxxxxxxxxxxxxxxxxxxxx
website: http://www.chromatix.uklinux.net/
tagline: The key to knowledge is not to rely on people to teach you it.
Other related posts: