Urias McCullough - 2009-06-16 02:32 :
FWIW, I have other P4 variants I can test with - including some more modern variants. Tonight I'll use a USB stick to test my wife's P4 Prescott 3.0ghz and tomorrow I'll try to pop it into an even faster P4 machine here at work ;) I also have a P4 Celeron 2.0ghz laptop I can test with... are these potentially useful?
Er, don't overdo it! ;-) And it won't be too interesting with the current version of the benchmark anyway, I'd rather have you do extensive tests with the next release. I already have done several improvements to the various routines, so the benchmark is already obsolete. The next version will also include some unrolled routines, which might behave very differently on various architectures.
The current build includes: Minimum Average Maximum # 1: 323765 325858 343043 - 'C, original' # 2: 363579 364208 365526 - 'C, precise' # 3: 393465 394915 397851 - 'C, precise DIV' # 4: 174734 175592 182045 - 'MMX/SSE v1.1' # 5: 174692 174937 175210 - 'MMX/SSE v1.2' # 6: 172006 172236 172907 - 'MMX/SSE unrolled x2' # 7: 168955 169235 171547 - 'SSE2 v1.1' # 8: 152464 152580 153196 - 'SSE2 v1.2 unrolled x2' # 9: 144602 144706 145087 - 'SSSE3 v1.1' #10: 126846 127011 127687 - 'SSSE3 v1.1 unrolled x2'where 'MMX/SSE v1.1' is an improved version of the 'optim-test' of the first release, and the v1.2 adds a few tests with different addressing encodings; while the results are the same on my Core2, the versions might behave differently on other CPUs.
The SSE2/SSSE3 routines are also improved. Of the unrolled versions only the SSSE3 variant is finished, the MMX and SSE2 variants need more work. I'm sceptical that they will yield much improvement, anyway; the unrolled SSSE3 only gives 14% more performance than the unrolled version, I don't think improvements will be much greater for MMX/SSE2, but maybe some CPUs will perform well on them.
It might be a few days to the next release though, especially the unrolled SSE2 variant gives me some serious headache.
Christian