*From*: Anonymous <cripto@xxxxxxx>*To*: hashcash@xxxxxxxxxxxxx, mail2news@xxxxxxxxx*Date*: Tue, 25 May 2004 07:01:40 +0200 (CEST)

One of the surprising properties of hashcash type collisions is the variation in how long it takes to find them. A collision which takes a nominal or expected time of 1 minute sometimes seems to happen in just a few seconds, and other times drags on for several minutes. Here is some information on the statistical nature of this variation. (Note: these are not actually collisions, but rather outputs of the hash function which are distinguished by having several bits in predefined states, for example a certain number of leading zero bits.) The time for hashcash collisions of a substantial number of bits, 10 or more, can be represented by a Poisson distribution. This is a distribution characteristic of a low-probability event which has many independent attempts at success. For such collision sizes, this is exactly how hashcash works. Each individual hash has a low probability of success, but the program calculates many hashes until it finds a hit. Call v the product of the probability of a success on each trial times the number of trials. For a 20 bit hashcash collision, v of 1 would correspond to approximately 1 million trials. Then the chance of having exactly n hits in time v is: P_v(n) = v^n * exp(-v) / n! Now, with hashcash we don't run past our first hit. So for this analysis I will calculate the probability of 1 - P_v(0) for various values of v. This will tell us the probability that we will have seen a hit by that value of v. By taking the difference of successive values this will tell us the probability of a hit in various intervals of v. Without further ado, here is the table: v range Probability ----------------------- 0 - 1/16 6% 1/16 - 1/8 6% 1/8 - 1/4 10% 1/4 - 1/2 17% 1/2 - 1 24% 1 - 2 23% 2 - 4 12% 4 - 8 1.8% 8 or over 0.03% To use an example, suppose you have a computer and a collision size which you calculate (based on the counts needed and the speed of the iteration in counts/second) will take 1 minute on average to create a collision. Then there is a 6% chance it will take less than 4 seconds; 6% for 4-7 seconds; 10% for 7-15 seconds; 17% for 15-30 seconds; 24% for 30-60 seconds; 23% for 1-2 minutes; 12% for 2-4 minutes, 1.8% for 4-8 minutes, and roughly 0 for any time over that. This was pretty surprising to me, as it seems like I often spend more time waiting for a collision to happen in my tests. Another peculiarity is that if we run for the time corresponding to v=1, there is actually a greater than 1/2 chance of having seen a hit. The chance is 1 - 1/e or 63%. So the median time to a collision is somewhat less, about 69% of the nominal time. In the one minute nominal trial case (calculated on the basis of trials/second), the median time would be 42 seconds. One way to smooth these values would be instead of calculating, say, a single 23-bit collision, calculate 8 20-bit collisions. You'd have to make the hashcash field bigger to hold 8 collisions rather than just one. But this would even out the compute time by averaging over several attempts, making the time more predictable. My calculations indicate that with 8 collisions, there is a 5% chance of it taking half as long as it "should", and a 1% chance of it taking twice as long, compared to 39% and 14% respectively with a single collision.

**Follow-Ups**:**[hashcash] Re: Statistics of hashcash collisions***From:*Adam Back

**[hashcash] empirical results (Re: Statistics of hashcash collisions)***From:*Adam Back

**[hashcash] Re: Statistics of hashcash collisions***From:*Hubert Chan

**[hashcash] geometric vs poisson (Re: Statistics of hashcash collisions)***From:*Adam Back

- » [hashcash] Statistics of hashcash collisions
- » [hashcash] Re: Statistics of hashcash collisions
- » [hashcash] Re: Statistics of hashcash collisions