[hashcash] Statistics of hashcash collisions
- From: Anonymous <cripto@xxxxxxx>
- To: hashcash@xxxxxxxxxxxxx, mail2news@xxxxxxxxx
- Date: Tue, 25 May 2004 07:01:40 +0200 (CEST)
One of the surprising properties of hashcash type collisions is the
variation in how long it takes to find them. A collision which takes
a nominal or expected time of 1 minute sometimes seems to happen in
just a few seconds, and other times drags on for several minutes.
Here is some information on the statistical nature of this variation.
(Note: these are not actually collisions, but rather outputs of the hash
function which are distinguished by having several bits in predefined
states, for example a certain number of leading zero bits.)
The time for hashcash collisions of a substantial number of bits,
10 or more, can be represented by a Poisson distribution. This is a
distribution characteristic of a low-probability event which has many
independent attempts at success. For such collision sizes, this is
exactly how hashcash works. Each individual hash has a low probability
of success, but the program calculates many hashes until it finds a hit.
Call v the product of the probability of a success on each trial times
the number of trials. For a 20 bit hashcash collision, v of 1 would
correspond to approximately 1 million trials. Then the chance of having
exactly n hits in time v is:
P_v(n) = v^n * exp(-v) / n!
Now, with hashcash we don't run past our first hit. So for this analysis
I will calculate the probability of 1 - P_v(0) for various values of v.
This will tell us the probability that we will have seen a hit by that
value of v. By taking the difference of successive values this will
tell us the probability of a hit in various intervals of v.
Without further ado, here is the table:
v range Probability
-----------------------
0 - 1/16 6%
1/16 - 1/8 6%
1/8 - 1/4 10%
1/4 - 1/2 17%
1/2 - 1 24%
1 - 2 23%
2 - 4 12%
4 - 8 1.8%
8 or over 0.03%
To use an example, suppose you have a computer and a collision size which
you calculate (based on the counts needed and the speed of the iteration
in counts/second) will take 1 minute on average to create a collision.
Then there is a 6% chance it will take less than 4 seconds; 6% for 4-7
seconds; 10% for 7-15 seconds; 17% for 15-30 seconds; 24% for 30-60
seconds; 23% for 1-2 minutes; 12% for 2-4 minutes, 1.8% for 4-8 minutes,
and roughly 0 for any time over that.
This was pretty surprising to me, as it seems like I often spend more
time waiting for a collision to happen in my tests. Another peculiarity
is that if we run for the time corresponding to v=1, there is actually
a greater than 1/2 chance of having seen a hit. The chance is 1 - 1/e
or 63%. So the median time to a collision is somewhat less, about 69%
of the nominal time. In the one minute nominal trial case (calculated
on the basis of trials/second), the median time would be 42 seconds.
One way to smooth these values would be instead of calculating, say,
a single 23-bit collision, calculate 8 20-bit collisions. You'd have
to make the hashcash field bigger to hold 8 collisions rather than
just one. But this would even out the compute time by averaging over
several attempts, making the time more predictable. My calculations
indicate that with 8 collisions, there is a 5% chance of it taking half
as long as it "should", and a 1% chance of it taking twice as long,
compared to 39% and 14% respectively with a single collision.
- Follow-Ups:
- [hashcash] Re: Statistics of hashcash collisions
- From: Adam Back
- [hashcash] empirical results (Re: Statistics of hashcash collisions)
- From: Adam Back
- [hashcash] Re: Statistics of hashcash collisions
- From: Hubert Chan
- [hashcash] geometric vs poisson (Re: Statistics of hashcash collisions)
- From: Adam Back
Other related posts:
- » [hashcash] Statistics of hashcash collisions
- » [hashcash] Re: Statistics of hashcash collisions
- » [hashcash] Re: Statistics of hashcash collisions
- [hashcash] Re: Statistics of hashcash collisions
- From: Adam Back
- [hashcash] empirical results (Re: Statistics of hashcash collisions)
- From: Adam Back
- [hashcash] Re: Statistics of hashcash collisions
- From: Hubert Chan
- [hashcash] geometric vs poisson (Re: Statistics of hashcash collisions)
- From: Adam Back