[haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32

  • From: Urias McCullough <umccullough@xxxxxxxxx>
  • To: haiku-development@xxxxxxxxxxxxx
  • Date: Sun, 14 Jun 2009 10:44:14 -0700

On Sun, Jun 14, 2009 at 7:16 AM, Christian
Packmann<Christian.Packmann@xxxxxx> wrote:
> I could need a few volunteers now to run the benchmark on various systems
> and post/mail the results. This would help me in deciding which routines
> should be aggressively optimized.
> I'd be especially interested in the following systems (but other systems
> would be welcome as well):
> * Intel Atom
> * Intel Core2 65nm (can be recognized by lack of SSE4.1 support)
> * Intel Pentium 4
> * Intel Core/Pentium M
> * AMD K10 - Phenom/Shanghai
> * AMD K8 - Athlon64/Sempron
> * AMD K7 - Athlon(XP)/Duron

PIII 450 results running on gcc4 Haiku r30993 (I apologize for the
multitude of gcc4 results, but I use gcc4 Haiku far more than gcc2
these days):

~> sysinfo
Kernel name: kernel_x86 built on: Jun  7 2009 10:27:27 version 0x1
1 Intel Pentium III, revision 0673 running at 447MHz (ID: 0x00000000 0x00000000)

CPU #0: GenuineIntel
        Type 0, family 6, model 7, stepping 3, features 0x0383f9ff
                FPU VME DE PSE TSC MSR PAE MCE CX8 SEP MTRR PGE MCA
CMOV PAT PSE36
                MMX FXSTR SSE
        Extended Intel: 0x00000000

        Instruction TLB: 4k-byte pages, 4-way set associative, 32 entries
        Instruction TLB: 4M-byte pages, fully associative, 2 entries
        Data TLB: 4k-byte pages, 4-way set associative, 64 entries
        L2 cache: 512 KB, 4-way set associative, 32 bytes/line
        L1 inst cache: 16 KB, 4-way set associative, 32 bytes/line
        Data TLB: 4M-byte pages, 4-way set associative, 8 entries
        L1 data cache: 16 KB, 4-way set associative, 32 bytes/line

 194973696 bytes free      (used/max   73453568 /  268427264)
                           (cached     24084480)
     31547 semaphores free (used/max       1221 /      32768)
      3971 ports free      (used/max        125 /       4096)
      3989 threads free    (used/max        107 /       4096)
      2031 teams free      (used/max         17 /       2048)
~> runme_haiku
Benchmark: Haiku app_server bilinear copy
Compile date: Jun 14 2009 14:38:02
GCC version: 2.95.3-haiku-081024

CPU vendor ID: GenuineIntel
CPU:
  SIMD instructions: MMX SSE SSE-Integer

Can't lock process to CPU on this platform.
Estimated CPUID/RDTSC overhead: 109 clock cycles.
10 runs per benchmark.

                    --  Results  --

       Minimum    Average    Maximum
# 1:    453962     492521     676056  - 'C, original'
# 2:    502890     523050     652266  - 'C, precise'
# 3:    495008     499859     516316  - 'C, precise DIV'
# 4:    291554     298556     343949  - 'MMX/SSE'
Skipped 'MMX/SSE optim-test', insufficient SIMD support
Skipped 'SSE2', insufficient SIMD support
Skipped 'SSSE3', insufficient SIMD support

Since the CPUID results from the benchmark were incomplete, I threw in
a sysinfo too.

Other related posts: