Christian Packmann wrote:
After being sidetracked on several other things, I'm currently working on the SIMD optims for BilinearCopy again.Benchmark binaries for Haiku/GCC2 and Windows/Cygwin can be downloaded here:http://www.elenthara.de/Haiku/Benchmarks/AppserverBilinCopyBench_v1.0.zip(The Windows version *requires* a basic Cygwin environment to be installed; I'll have to setup a MingW environment to get rid of the Cygwin dependency, but don't have time for this right now)The benchmark should automatically detect the supported SIMD instruction sets for the installed CPU and skip any routines with instructions not supported by the CPU, but I've only tested the code on my Core2 system so far; if the program crashes, please let me know on what CPU that happened so I can fix the problem. The code only supports SIMD capability detection for AMD/Intel/VIA so far, Transmeta CPUs should work, but MMX/SSE will not be detected on them, so only the C integer benchmarks will be run.
Benchmark: Haiku app_server bilinear copy Compile date: Jun 14 2009 14:38:02 GCC version: 2.95.3-haiku-081024 CPU vendor ID: GenuineIntel CPU: Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz SIMD instructions: MMX SSE SSE-Integer SSE2 SSE3 SSSE3 SSE4.1 Can't lock process to CPU on this platform. Estimated CPUID/RDTSC overhead: 231 clock cycles. 10 runs per benchmark. -- Results -- Minimum Average Maximum # 1: 359513 370865 464366 - 'C, original' # 2: 334362 334476 334873 - 'C, precise' # 3: 349440 349717 349979 - 'C, precise DIV' # 4: 186858 186991 187173 - 'MMX/SSE' # 5: 176309 176410 176729 - 'MMX/SSE optim-test' # 6: 178493 179086 184079 - 'SSE2' # 7: 155708 155750 155904 - 'SSSE3' And: Benchmark: Haiku app_server bilinear copy Compile date: Jun 14 2009 14:38:02 GCC version: 2.95.3-haiku-081024 CPU vendor ID: GenuineIntel CPU: Intel(R) Xeon(TM) CPU 3.20GHz SIMD instructions: MMX SSE SSE-Integer SSE2 SSE3 Can't lock process to CPU on this platform. Estimated CPUID/RDTSC overhead: 528 clock cycles. 10 runs per benchmark. -- Results -- Minimum Average Maximum # 1: 459336 504028 883224 - 'C, original' # 2: 537852 541143 560832 - 'C, precise' # 3: 494304 495826 506484 - 'C, precise DIV' # 4: 349416 349748 352692 - 'MMX/SSE' # 5: 325248 337629 381876 - 'MMX/SSE optim-test' # 6: 334044 336484 355212 - 'SSE2' Skipped 'SSSE3', insufficient SIMD support