[hashcash] Re: Hashcash performance improvements

  • From: Adam Back <adam@xxxxxxxxxxxxxxx>
  • To: hashcash@xxxxxxxxxxxxx
  • Date: Sun, 30 May 2004 17:50:22 -0400

On Sun, May 30, 2004 at 09:26:13PM +0100, Jonathan Morton wrote:
> >Well there are the guidelines.  In any case, a limit should be named.
> >Standards normally have arbitrary limits written into them, as without 
> >them no computer with finite memory could ever claim to be standards 
> >compliant.
> 
> I have a feeling the 512 bytes of SMTP should be the guideline, minus 
> some reasonable amount for the command itself.  That's over 6 80-column 
> lines, BTW.

Sounds reasonable.  People might chew into it a bit because eg CAMRAM
was proposing to transport a public key.  Now these are eg 1024-bit +
range, that's only 2-3 lines, but they have to be careful what other
cruft they put in there.  (If they were to literally use some S/MIME
(very cruft laden) or PGP certs (smaller, but if WoT stuff in there)
they would have problems.  So I think that is mostly their problem, to
define a public key format that is a small profile of one of those, or
a simple replacement like SSH etc (bare public key hex encoded or
whatever).

> 1:10,4:040525:foo::c82a215126afb2c2:0000000000000000416
> 1:10,4:040525:foo::c82a215126afb2c2:0000000000000005970
> 1:10,4:040525:foo::c82a215126afb2c2:0000000000000006119
> 1:10,4:040525:foo::c82a215126afb2c2:0000000000000007023

This introduces a sub-stamp re-use problem (unless you double-spend on
the random field only, and not hte hash of the whole stamp -- also not
clearl what the hash of the whole stamp is (without having to take
another hash == more cost for recipient -- if you do it this way)).

Also you have to prevent re-use of sub-stamps across stamps, so you
need the random field to be unique (or the random+date probably).  So
I don't think if you use multiple sub-puzzles you can use hash of
stamp alone.  And you don't ideally want to have to store one db entry
for each sub-puzzle even if there are only 16 max or such of them.

> >I'd like to rewind this whole discussion a bit though and point out 
> >that I
> >don't see the need for the sub-puzzles.  Is the only reason that it 
> >makes
> >minting time more predictable by reducing the variance?  I can't see 
> >much
> >value in that, although maybe I will when I'm desperate to send an 
> >email
> >quickly :-)
> 
> I think that's the primary goal, yes.  

I believe that was Izzy's motivation.  I'm kind of hung on this one, I
did consider it also earlier (well back in 97 already I was thinking
whether 2-way or k-way collisions would be better technically, but
rejected for complexity).  I hate adding complexity, and sub-stamps
add complexity.  (I like the fact that you can verify stamps with
sha1sum binary plus a bit of sh script).

> However, it does open up a potential resource-consumption hole at
> the receiver that wasn't there before.  Each sub-stamp has to be
> tracked as if it were a normal one, and you can specify a rather
> large number of stamps using a relatively small amount of data -
> thus weakening my earlier argument about the DoS risks.

I was thinking they would be evaluated as follows:

1:10,4:040525:foo::c82a215126afb2c2:416
1:10,4:040525:foo::c82a215126afb2c2:416,5970
1:10,4:040525:foo::c82a215126afb2c2:416,5970,6119
1:10,4:040525:foo::c82a215126afb2c2:416,5970,6119,7023

where you stop as soon as there was an invalid stamp.  However I think
your comment above is correct; if the 1st stamp was valid, but the
next one not, the attacker can turn around and have you verify the 1st
stamp again, ad-nauseum and you won't notice you already verified it,
unless you really do store double-spent stamps separately.  So in this
case he doesn't have to do more work (he gets re-use).  Now obviously
someone can send you invalid monolothic stamps, and you'll keep
verifying them, but at the cost of 1 hash only, here they can have you
do n-1 verifications from an n sub-stamp stamp where the n-th
sub-stamp is invalid.  I don't think you want to store failed
stamps/random values etc as disk is likely much slower than stamp
verification even so.  However I'm not sure this is a big deal.  I
mean someone can send you multiple invalid hashcash headers each
addressed to you, multiple Cc style.  You'll be obliged to verify them
all anyway.  QED.

Other thing would be to put max sub-stamps at 16 or something to limit
the damage.

> We'd need to think carefully about how to limit this damage, and 
> whether the benefits outweigh the additional complexity in the spec.

Yes.

> An obvious solution is to limit the number of microstamps allowed per 
> macrostamp - this automatically puts a lower limit on the value of each 
> microstamp.  Processing of a macrostamp should stop as soon as an 
> invalid microstamp is found, but valid microstamps should be kept so 
> that the same stamp can't be reused in the same attack.

I presume you mean would have to avoid:

1:10,4:040525:foo::c82a215126afb2c2:416,416,416,416

being considered valid with the over-writing semantics.

The down-side of the approach I proposed earlier is speed-optimized
versions might literally try padding out to another 64-9 boundary for
each sub-stamp, which would eat into extension space and make ugly
less human readable-stamps.

> OK, I'm just sounding off here, it's not something I'm actually 
> interested in pushing.  Since we're already at the stage where all 
> "respectable" processors take only a few seconds on average to produce 
> a full-value stamp, is the variance really a problem?

Problem would be the 366Mhz laptop that I use quite a bit because it
weights about 1/2 what my work 3Ghz laptop weighs, coupled with the
variance examples anonymous gave -- eg say it takes 20sec instead of
1sec, and you get unlucky and it takes 4x longer than expected...

> But here's yet another perspective:  the Apple Human Interface 
> Guidelines.  This states that where possible, you should provide some 
> indication of progress while performing a time-consuming task (that is, 
> likely to take more than 1 second).  With a monolithic stamp, this is 
> near impossible - the best you can do is a beachball/hourglass pointer 
> and a barber-pole, along with a blithe estimate of the average-case 
> time.

Lapo had a cute UI suggestion for dealing with the unclear end.  The
1st time-period the progress goes from 0 to 50%; next time period
50-75%, next 75-87.5% etc so that it just goes slower and slower
asymptotically towards 100%.  Talk about non-linear progress bars.

> But with microstamps, you can show how many you've done so far, and
> actually make a reasonable prediction of when it'll finish, which is
> much more helpful to the ordinary user.

Yes.

> There is a slight difference in the real minting routine, which 
> actually assumes it's going to be fed a v1 stamp and therefore puts a 
> separator between the random and count fields.  I still don't see the 
> value in that separator, especially if the double-spend database is 
> going to contain the hash rather than the random field.  :)

Yes the reason for the separator if I recall was to recognize where
the random part ended.  There are issues with keeping just the stamp
output:

- if you do sub-stamps you have to avoid re-use of sub-stamps

- crypto conservatively you want a fresh computation -- storing the
output does not ensure that, someone could make another computation
with the same random field (I think it makes more conservative
assumptions about that hash functions security if you do not allow
people to hammer on selected areas of input space; also perhaps
someone finds some short-cut or re-use we do not think of, forced
fresh-work is a defense)

Another nice thing about the separator is if you do have sub-stamps,
you can see more easily where the sub-stamp definitions are grouped
together, n of them separated by , after the :.

And finally its safer I think to have people not try (mentally)
combine randomness and counters.  If they are told ok, put 96-bit
randomness here, and then count as required here, less likely to be
security critical coding errors.  (My original code had problems as
you get to larger bit sizes, there was only 64-bit counter, and the
chance of accidental collision would increase.)  Now granted you can
defend against with or without : separation, but simpler for people to
understand and not repeat my earlier mistake by having a logically
distinct field that has a different purpuse -- entirely to avoid
collision.

So some musings towards retaining clean, short stamps (aka taking the
performance advantage out of padding):

- define max padding (eg 96 to 128-bits)

- define counters with no leading 0 (non-starter -- 11s or anything
  else works equally)

- define max counter-width (per sub-stamp counter) -- careful there is
no max, but practically eg 16 digits of base64 should be "enough"
future-proofing because takes forever to count to 2^96

- put counters at the beginning (so before I said leads to pre-compute
across stamps Malcolm said negligible; other reason not sure think
that is a good idea is that I think to be crypto conservative want to
force fresh computation, or someone may find something you did not
think of.  (This is another reason for storing randomness field --
simply and reliably prevents any related computation from being
accepted).

  - another reason putting counters at the beginning is bad is it
makes large extensions expensive -- have 256B of extensions, and your
stamps would cost 5x more, can't have that, another non-stater!

alternative: just give up; how much perf-difference does it make?

Estimates?

So could have an option to do it, or do it by default and turn it off
automatically if you start using large extensions.  Or if advantage is
smallish (eg 1% range) just not do it (at least by default) and
concede that to the spammers.

Adam

Other related posts: