Christian Packmann wrote:
I've put the program with source up at <http://www.elenthara.de/BeOS/B_OP_ADD_Test.zip>, if anybody wants to look at it (it's just a quick hack, don't expect comments; if you have questions, contact me). I'd love benchmark results from a P4, as I'm very curios on how much it differs between SIMD and integer code. The program should auto-detect the supported SIMD sets, and run only appropriate routines; but the CPU ID routine has never been tested on PII/III/4s, so it might crash.
Here are your tests on a P4 2.6GHz HT: $ ./B_OP_ADD_Test 800 600 1 Benchmarking C integer 93.13 MPixels/second
Benchmarking C integer, loop unrolling x4 146.16 MPixels/second
Benchmarking plain MMX 33.75 MPixels/second
Benchmarking SSE, loop unrolling x4, PREFETCHT0 657.53 MPixels/second
$ ./B_OP_ADD_Test 800 600 2 Benchmarking C integer 99.21 MPixels/second
Benchmarking C integer, loop unrolling x4 153.87 MPixels/second
Benchmarking plain MMX 35.94 MPixels/second
Benchmarking SSE, loop unrolling x4, PREFETCHT0 587.52 MPixels/second
$ ./B_OP_ADD_Test 100 100 1 Benchmarking C integer 101.05 MPixels/second
Benchmarking C integer, loop unrolling x4 174.55 MPixels/second
Benchmarking plain MMX 40.51 MPixels/second
Benchmarking SSE, loop unrolling x4, PREFETCHT0 1600.00 MPixels/second
$ ./B_OP_ADD_Test 100 100 2 Benchmarking C integer 109.09 MPixels/second
Benchmarking C integer, loop unrolling x4 174.55 MPixels/second
Benchmarking plain MMX 40.42 MPixels/second
Benchmarking SSE, loop unrolling x4, PREFETCHT0 1476.92 MPixels/second =======================
MMX performance a bit odd?
Adi.