[haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32

  • From: André Braga <meianoite@xxxxxxxxx>
  • To: haiku-development@xxxxxxxxxxxxx
  • Date: Sun, 14 Jun 2009 19:16:13 -0300

2009/6/14 Christian Packmann <Christian.Packmann@xxxxxx>:
> After looking through the results, I think that a SSE2 codepath is not
> interesting, as the MMX/SSE code always performs better. A SSSE3 version may
> make sense for modern Core2 and i7/Nehalem systems, and maybe AMDs Bulldozer
> (due 2011). So I'll try to optimize the MMX/SSE routine further, as it is
> the most useful for general use.

My CPU liked the SSE2 code :)

I don't know if this makes a difference for this code, but this is a
AM2 board, not AM2+ (which is the native type for this CPU).

Benchmark: Haiku app_server bilinear copy
Compile date: Jun 14 2009 14:38:02
GCC version: 2.95.3-haiku-081024

CPU vendor ID: AuthenticAMD
CPU: AMD Phenom(tm) 9950 Quad-Core Processor
  SIMD instructions: MMX SSE SSE-Integer SSE2 SSE3  MOVU

Can't lock process to CPU on this platform.
Estimated CPUID/RDTSC overhead: 122 clock cycles.
10 runs per benchmark.

                    --  Results  --

       Minimum    Average    Maximum
# 1:    429197     439534     507568  - 'C, original'
# 2:    440918     440982     441223  - 'C, precise'
# 3:    449571     453421     474110  - 'C, precise DIV'
# 4:    198232     200319     218137  - 'MMX/SSE'
# 5:    196354     199110     217796  - 'MMX/SSE optim-test'
# 6:    178408     180968     203687  - 'SSE2'
Skipped 'SSSE3', insufficient SIMD support

sysinfo
Kernel name: kernel_x86 built on: Jun  8 2009 01:28:05 version 0x1
4 AMD Phenom, revision 40f23 running at 2611MHz (ID: 0x00000000 0x00000000)

CPU #0: "AMD Phenom(tm) 9950 Quad-Core Processor"
        Type 0, family 16, model 2, stepping 3, features 0x178bfbff
                FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE
MCA CMOV PAT
                PSE36 CFLUSH MMX FXSTR SSE SSE2 HTT
        Extended Intel: 0x00802009
                SSE3 MONITOR CMPXCHG16B
        Extended AMD: type 0, family 16, model 2, stepping 3, features
0xefd3fbff
                SCE NX AMD-MMX FFXSTR RDTSCP 64 3DNow+ 3DNow!
        Power Management Features: TS TTP TM STC

        Inst TLB: 2M/4M-byte pages, 16 entries, fully associative
        Data TLB: 2M/4M-byte pages, 48 entries, fully associative
        Inst TLB: 4K-byte pages, 32 entries, fully associative
        Data TLB: 4K-byte pages, 48 entries, fully associative
        L1 inst cache: 64 KB, 2-way set associative, 1 lines/tag, 64 bytes/line
        L1 data cache: 64 KB, 2-way set associative, 1 lines/tag, 64 bytes/line
        L2 cache: 512 KB, 16-way set associative, 1 lines/tag, 64 bytes/line

(same for CPUs #1, 2 and 3)

2018222080 bytes free      (used/max  128081920 / 2146304000)
                           (cached     56213504)
    129516 semaphores free (used/max       1556 /     131072)
      3967 ports free      (used/max        129 /       4096)
      3965 threads free    (used/max        131 /       4096)
      2031 teams free      (used/max         17 /       2048)

-- 
One last piece of advice: "ice".

Other related posts: