*From*: Adam Back <adam@xxxxxxxxxxxxxxx>*To*: hashcash@xxxxxxxxxxxxx*Date*: Tue, 25 May 2004 15:30:22 -0400

On Tue, May 25, 2004 at 07:01:40AM +0200, Anonymous wrote: > Without further ado, here is the table: > > v range Probability > ----------------------- > 0 - 1/16 6% > 1/16 - 1/8 6% > 1/8 - 1/4 10% > 1/4 - 1/2 17% > 1/2 - 1 24% > 1 - 2 23% > 2 - 4 12% > 4 - 8 1.8% > 8 or over 0.03% This distribution is borne out by empirical tests also. > This was pretty surprising to me, as it seems like I often spend more > time waiting for a collision to happen in my tests. Another peculiarity > is that if we run for the time corresponding to v=1, there is actually > a greater than 1/2 chance of having seen a hit. The chance is 1 - 1/e > or 63%. Nevertheless the mean or expected time is actually the nominal. Ie as given by the (discrete) geometric distribution where P(n) = q^{n-1}.p, mean = 1/p variance = q/(p^2) and p = 1/2^b. So mean = 2^b. (Where b is the unmber of bits). > One way to smooth these values would be instead of calculating, say, > a single 23-bit collision, calculate 8 20-bit collisions. Correct. This was proposed by Juels & Brainard. Earlier I also explored using 2-way and k-way parital collisions (such as used by Rivest and Shamir in micromint). The micromint collision function is to set z bits to zero, and put bits into buckets until you find k of them, constituting a k-way partial collision (k-inputs which give partially matching output). However there isn't much difference in variance between multiple sub-puzzles and k-way (k-way is a little lower, but more complex to code). Also multiple sub-puzzle doesn't have any memory requirement, where-as k-way does. The downside, or trade-off with either of these approaches is that a) it makes the verification k-times larger (for k sub-puzzles or k-way collisions); and b) they are a constant plus k times larger. Wrt to verification time you can deter time-wasting (attacker sending one valid sub-puzzle, but having the verifier verify all of them) by making them dependent in some way. eg. example approach I was considering for v1 format: 1:10,4:040525:foo::c82a215126afb2c2:16,970,119,23 (for 4 sub-puzzles each of size 10 bits) where the collision is on the string including the previous one so 1st is on 1:10,4:040525:foo::c82a215126afb2c2: 2nd on: 1:10,4:040525:foo::c82a215126afb2c2:16 and so forth. This means the verifier will only verify one invalid stamp worst case. Clearly it would desirable if we could have a fixed cost (exact same computation on each try), or at least bounded-probabilistic cost (as provided by Juels & Brainard's client-puzzles approach). The problem with the client-puzzles approach is that it is interactive. (The scheme is to send a challenge with a known solution and fixed solution space to explore: H( x || 1234 ) = 5678 and user must find x such that the equality is true. So worst case it takes the user 2^|x| operations. Unfortunately it seems that most problems with self-evident solutions (solutions that can be verified much more quickly than computed) and fixed solution spaces are public key style operations. So I give a few examples of those in the paper, one being cube-root (which is the non-interactive version of Dwork and Naor's square-root they proposed in their original paper). The downside with these approaches are: - larger stamps (16kbit operations to slow things down much) - more complex code (bignum operations) - (relatively) more expensive verify (raising 16kbit number to 3rd power costs more) - scales badly over time (verification cost increases as computation cost increases; stamp size increases as computation cost increases) - note there is still some small variance due to the addition-chain sliding window etc approaches that provide efficient modular exponentiation However for there is a range of computation sizes where cube-root competes with multiple-sub-puzzle on variance, size and verification cost. If you or anyone has any insights into achieving fixed cost, or even bounded probabilistic cost non-interactively without resorting to public key operations, let me know. I've been looking at it on-and-off and have not found anything. Adam

**References**:**[hashcash] Statistics of hashcash collisions***From:*Anonymous

- » [hashcash] Statistics of hashcash collisions
- » [hashcash] Re: Statistics of hashcash collisions
- » [hashcash] Re: Statistics of hashcash collisions