
|
[openbeos]
||
[Date Prev]
[08-2004 Date Index]
[Date Next]
||
[Thread Prev]
[08-2004 Thread Index]
[Thread Next]
[openbeos] Re: app_server: MMX/SSE help wanted
- From: Adi Oanca <e2joseph@xxxxxxxxxx>
- To: openbeos@xxxxxxxxxxxxx
- Date: Sun, 08 Aug 2004 17:19:19 +0300
Hi,
Christian Packmann wrote:
I've got a blur routine (3x3 matrix) for B_RGB32 bitmaps, which gives
following results on my Athlon XP 2100+ (1733MHz) with DDR266 memory:
Bitmap 640x480, 1200 KB Bitmap 100x100, 9.76KB
Code MegaPixels/second MegaPixels/second
C integer 33 35
MMX 80 125
3DNow! 110 134
Wow! I was sure there was at least a 2x difference.
Thanks for these tests.
The MMX routine is faster by virtue of processing multiple values with one
instruction. The 3DNow! routine adds data prefetching, so that the CPU
preloads the next chunk of data while the current chunk is being processed.
The C version could be improved slightly by using loop unrolling, which
both MMX and 3DNow! use; but this would give 10-20% increase at best.
What about SSE, SSE2, SSE3? what can you tell us?
Knowing they use 128bit registers, do they deliver a 4x performance
gain over the CPU? These have support for floating point instructions
isn't it?
Similar speedups are likely for many bitmap operations which use alpha or
blending. In some extreme cases the improvements might be far more
spectacular, especially on the P4. The P4 design made many compromises in
the integer engine in order to achieve high clock speeds - shifts and
multiplies are very slow compared to other architectures (PIII, K7/8). This
will hurt performance of integer code using these instructions; and
especially in graphics processing you need shifts all the time to isolate
and join color components. By using SIMD you can alleviate this problem, as
the P4 delivers very good SIMD performance.
I have a P4. I am curios about the results. :-P
I'm not really a SIMD pro, but I'll gladly help with whatever I know. And I
already have a few suggestions about data alignment of bitmaps, which would
help SIMD coders a lot in writing efficient code.
Good, let's hear them. Before that: do you want to write some code for
Haiku project?
I guess we should move this to interfacekit@xxxxxxxxxxxxx?
Let's continue here, maybe others can help too.
Adi.
|

|