[openbeos] Re: app_server: MMX/SSE help wanted

  • From: Adi Oanca <e2joseph@xxxxxxxxxx>
  • To: openbeos@xxxxxxxxxxxxx
  • Date: Sun, 08 Aug 2004 17:19:19 +0300

Hi,

Christian Packmann wrote:

I've got a blur routine (3x3 matrix) for B_RGB32 bitmaps, which gives following results on my Athlon XP 2100+ (1733MHz) with DDR266 memory:

Bitmap 640x480, 1200 KB Bitmap 100x100, 9.76KB
Code MegaPixels/second MegaPixels/second C integer 33 35
MMX 80 125
3DNow! 110 134

Wow! I was sure there was at least a 2x difference. Thanks for these tests.

The MMX routine is faster by virtue of processing multiple values with one instruction. The 3DNow! routine adds data prefetching, so that the CPU preloads the next chunk of data while the current chunk is being processed. The C version could be improved slightly by using loop unrolling, which both MMX and 3DNow! use; but this would give 10-20% increase at best.

What about SSE, SSE2, SSE3? what can you tell us?
Knowing they use 128bit registers, do they deliver a 4x performance gain over the CPU? These have support for floating point instructions isn't it?


Similar speedups are likely for many bitmap operations which use alpha or blending. In some extreme cases the improvements might be far more spectacular, especially on the P4. The P4 design made many compromises in the integer engine in order to achieve high clock speeds - shifts and multiplies are very slow compared to other architectures (PIII, K7/8). This will hurt performance of integer code using these instructions; and especially in graphics processing you need shifts all the time to isolate and join color components. By using SIMD you can alleviate this problem, as the P4 delivers very good SIMD performance.

I have a P4. I am curios about the results. :-P

I'm not really a SIMD pro, but I'll gladly help with whatever I know. And I already have a few suggestions about data alignment of bitmaps, which would help SIMD coders a lot in writing efficient code.

Good, let's hear them. Before that: do you want to write some code for Haiku project?


I guess we should move this to interfacekit@xxxxxxxxxxxxx?

Let's continue here, maybe others can help too.


Adi.

Other related posts: