On Mon, Aug 16, 2004 at 02:25:58AM +0100, Jonathan Morton wrote: > Just call the initialisation function - it guesstimates the "best" > function to use immediately after populating the core table. OK done. > BTW, it sounds like you're calling the benchmark at least twice - yes the code is supposed to use hc_per_second which calls hashcash_per_second and caches the result so the work is only done once. An call directly to hashcash_per_second crept in, fixed. > >Also I think it would help if we used similar to the old approach > >where you work on hashcash in a loop _until_ the timer changes (ie > >measure the minimum timer resolution worth of work). > > If it would help, I could whip up a relatively fast function to measure > the speed of the currently-selected core. This would let you keep the > present statistics while avoiding the huge startup penalty. Yes that would help a lot, to use the current method I need a way to do a known quantity of work. I put up pr4 (still has problems, tho less of them, but for discussion purposes): http://www.hashcash.org/source/current/hashcash-1.03/ long hashcash_per_sec( void ) { #if defined( CHROMATIX ) if ( !hashcash_init ) { hashcash_select_minter(); hashcash_init = 1; } #endif clock_t t1, t2, t1c,t2c,tmp; double elapsed; unsigned long n_collisions = 0; char token[ MAX_TOK+1 ]; word32 step = 100; /* wait for start of tick */ t2 = clock(); do { t1 = clock(); } while ( t1 == t2 ); t1c = t1; /* do computations for next tick */ do { n_collisions += step; find_collision( "000101", "flame", 40, token, step, "", "", "" ); t2c = t2 = clock(); if ( t2c < t1c ) { tmp = t2c; t2c = t1c; t1c = tmp; } } while ( (t2c - t1c) == 0 ); if ( t2 < t1 ) { tmp = t2; t2 = t1; t1 = tmp; } fprintf( stderr, "collisions = %ld\n", n_collisions ); /* see how many us the tick took */ elapsed = (double)(t2 - t1 )/CLOCKS_PER_SEC; return (word32) ( 1.0 / elapsed * (double)n_collisions + 0.499999999 ); } so this code is broken because it is always timing the legacy code (not libfastmint). It should have #if defined( CHROMATIX ) call something which does some chunk of work. (Note: the actual hashcash_mint() function calls hashcash_fastmint() when CHROMATIX is defined). > >However this elapsed / wall-clock time > >rather than process time so inaccurate on loaded systems, so we > >switched to clock() however on linux I think I am finding this timer > >is lower resolution. > > But not so low resolution that it requires many seconds to get an even > approximately accurate result. Most platforms have a clock() > resolution of 1/60 (Classic Mac) or 1/100 (Linux/x86) sec, sometimes > better like 1/1000 sec on Linux/Alpha. It should normally be easy to > get a reliable result within 1/10 sec. So in the above code I did work for one clock resolution interval. The output is like this: collisions = 4400 speed: 440000 collision tests per second collisions = 5800 speed: 580000 collision tests per second collisions = 7400 speed: 740000 collision tests per second collisions = 7200 speed: 720000 collision tests per second collisions = 5000 speed: 500000 collision tests per second collisions = 1400 speed: 140000 collision tests per second collisions = 7300 speed: 730000 collision tests per second collisions = 4400 speed: 440000 collision tests per second collisions = 7300 speed: 730000 collision tests per second collisions = 8100 speed: 810000 collision tests per second accuracy +/- 50%! (as you can see the resolution is 1/100 sec). I intentionally do a busy wait for timer tick before starting to be at the beginning of an interval. So I think you can only explain this variance by the counter not only being output in 1/100th second units, but also massively erratic and yet this is supposed to be cpu work accounting on a machine with a clock rate 30 million times faster than that. The work inside the loop is purely deterministic, all that is left is process scheduling (which I would think might be discounted from cpu work accounting), and processor cache effects which I do not think can account for 50% speed difference over such a large chunk of CPU cycles. Weird. I also tried timing for 1/10 sec eg instead of minimum timer interval instead of this line: } while ( (t2c - t1c) == 0 ); using: } while ( (t2c - t1c) < CLOCKS_PER_SEC/10 ); and there the output is: collisions = 74500 speed: 745000 collision tests per second collisions = 73600 speed: 736000 collision tests per second collisions = 66800 speed: 668000 collision tests per second collisions = 66700 speed: 667000 collision tests per second collisions = 67400 speed: 674000 collision tests per second collisions = 74800 speed: 748000 collision tests per second collisions = 74400 speed: 744000 collision tests per second collisions = 70700 speed: 707000 collision tests per second collisions = 70200 speed: 702000 collision tests per second collisions = 70200 speed: 702000 collision tests per second ie still pretty variable. Actually the wall-clock time code was also pretty variable: repeat 10 ./hashcash -s speed: 393701 collision tests per second speed: 555556 collision tests per second speed: 366300 collision tests per second speed: 378788 collision tests per second speed: 529101 collision tests per second speed: 555556 collision tests per second speed: 375940 collision tests per second speed: 170940 collision tests per second speed: 510204 collision tests per second speed: 534759 collision tests per second (from hashcash-0.32) The gettimeofday() call seems to give wall clock resolution of 5 usec or maybe less (tight loop calling gettimeofday() until timers differ). > The core-selection benchmark takes a long time because it has to > benchmark every core, not just one, and it has to use a work quantity > that will take long enough to be accurate even on a very fast computer. Yes this is why I use the idiom of doing small chunks of work repeatedly until the next interval starts -- just want the work chunk to reasonably out weigh the function call overhead (which is negligible compared to even 100 hashcash tests I would think) and the rest can cope with arbitrary processor speeds. At this point I am thinking /proc/cpuinfo and pentium tick counter! This is crazy. > Here are some useful sets of compiler options: I also made some targets for these (by calling make recursively with the chosen macros). btw In enabling the static selection I seem to have re-exposed this problem: ./hashcash -mb 16 -r foo collisions = 63800 time estimate: 0 seconds (103 milli-seconds) ERROR: requested 16 bits, reported 16 bits, got 1 bits using AMD64/x86 MMX Standard 1x2-pipe minter: "1:16:040816:foo::09bQn59OYq6RRjso:00000XtE" Adam