[haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32

  • From: Adam K Kirchhoff <adamk@xxxxxxxxxxxx>
  • To: haiku-development@xxxxxxxxxxxxx
  • Date: Mon, 15 Jun 2009 09:01:48 -0400

Christian Packmann wrote:
After being sidetracked on several other things, I'm currently working on the SIMD optims for BilinearCopy again.

Benchmark binaries for Haiku/GCC2 and Windows/Cygwin can be downloaded here:
http://www.elenthara.de/Haiku/Benchmarks/AppserverBilinCopyBench_v1.0.zip
(The Windows version *requires* a basic Cygwin environment to be installed; I'll have to setup a MingW environment to get rid of the Cygwin dependency, but don't have time for this right now)

The benchmark should automatically detect the supported SIMD instruction sets for the installed CPU and skip any routines with instructions not supported by the CPU, but I've only tested the code on my Core2 system so far; if the program crashes, please let me know on what CPU that happened so I can fix the problem. The code only supports SIMD capability detection for AMD/Intel/VIA so far, Transmeta CPUs should work, but MMX/SSE will not be detected on them, so only the C integer benchmarks will be run.


Benchmark: Haiku app_server bilinear copy
Compile date: Jun 14 2009 14:38:02
GCC version: 2.95.3-haiku-081024

CPU vendor ID: GenuineIntel
CPU: Intel(R) Core(TM)2 Quad  CPU   Q8200  @ 2.33GHz
 SIMD instructions: MMX SSE SSE-Integer SSE2 SSE3 SSSE3 SSE4.1

Can't lock process to CPU on this platform.
Estimated CPUID/RDTSC overhead: 231 clock cycles.
10 runs per benchmark.

                   --  Results  --

      Minimum    Average    Maximum
# 1:    359513     370865     464366  - 'C, original'
# 2:    334362     334476     334873  - 'C, precise'
# 3:    349440     349717     349979  - 'C, precise DIV'
# 4:    186858     186991     187173  - 'MMX/SSE'
# 5:    176309     176410     176729  - 'MMX/SSE optim-test'
# 6:    178493     179086     184079  - 'SSE2'
# 7:    155708     155750     155904  - 'SSSE3'

And:

Benchmark: Haiku app_server bilinear copy
Compile date: Jun 14 2009 14:38:02
GCC version: 2.95.3-haiku-081024

CPU vendor ID: GenuineIntel
CPU:                   Intel(R) Xeon(TM) CPU 3.20GHz
 SIMD instructions: MMX SSE SSE-Integer SSE2 SSE3

Can't lock process to CPU on this platform.
Estimated CPUID/RDTSC overhead: 528 clock cycles.
10 runs per benchmark.

                   --  Results  --

      Minimum    Average    Maximum
# 1:    459336     504028     883224  - 'C, original'
# 2:    537852     541143     560832  - 'C, precise'
# 3:    494304     495826     506484  - 'C, precise DIV'
# 4:    349416     349748     352692  - 'MMX/SSE'
# 5:    325248     337629     381876  - 'MMX/SSE optim-test'
# 6:    334044     336484     355212  - 'SSE2'
Skipped 'SSSE3', insufficient SIMD support


Other related posts: