Go to the FreeLists Home Page Home Signup Help Login
 



[openbeos] || [Date Prev] [08-2004 Date Index] [Date Next] || [Thread Prev] [08-2004 Thread Index] [Thread Next]

[openbeos] Re: app_server: MMX/SSE help wanted

  • From: Christian Packmann <Christian.Packmann@xxxxxx>
  • To: openbeos@xxxxxxxxxxxxx
  • Date: Sun, 08 Aug 2004 14:04:49 +0000
On 2004-08-07 17:46:31 [+0000], DarkWyrm wrote:

> Perhaps I should clarify. Adi and I were curious are thinking ahead. 
> Unless I have misjudged the code at this point, depending on how quickly 
> he and I can fling code, we could be seeing some quite useful milestones 
> in the coming months. We are thinking about potential optimizations in 
> the drawing code and such. While I won't speak for Adi, I personally was 
> wondering what can be done with these other instructions with respect to 
> graphics and whether or not it is worth it to even consider writing code 
> to utilize them.

It is, at least when you're doing bitmap processing. SIMD *rules* for 
mass-processing of data. Speedups of 2x-3x should be achievable in most 
cases. 

I've got a blur routine (3x3 matrix) for B_RGB32 bitmaps, which gives 
following results on my Athlon XP 2100+ (1733MHz) with DDR266 memory:

              Bitmap 640x480, 1200 KB    Bitmap 100x100, 9.76KB
        Code         MegaPixels/second          MegaPixels/second 
C integer              33                           35
MMX                    80                          125
3DNow!                110                          134

The MMX routine is faster by virtue of processing multiple values with one 
instruction. The 3DNow! routine adds data prefetching, so that the CPU 
preloads the next chunk of data while the current chunk is being processed. 
The C version could be improved slightly by using loop unrolling, which 
both MMX and 3DNow! use; but this would give 10-20% increase at best.

Similar speedups are likely for many bitmap operations which use alpha or 
blending. In some extreme cases the improvements might be far more 
spectacular, especially on the P4. The P4 design made many compromises in 
the integer engine in order to achieve high clock speeds - shifts and 
multiplies are very slow compared to other architectures (PIII, K7/8). This 
will hurt performance of integer code using these instructions; and 
especially in graphics processing you need shifts all the time to isolate 
and join color components. By using SIMD you can alleviate this problem, as 
the P4 delivers very good SIMD performance.

> I don't know much about any of this kind of stuff, and 
> I'm pretty sure Adi doesn't, either, so at this point we just wanted to 
> see if there is anyone who is knowledgeable in this kind of thing who 
> could answer some questions and such.

I'm not really a SIMD pro, but I'll gladly help with whatever I know. And I 
already have a few suggestions about data alignment of bitmaps, which would 
help SIMD coders a lot in writing efficient code.

I guess we should move this to interfacekit@xxxxxxxxxxxxx?

Bye,
Chris





[ Home | Signup | Help | Login | Archives | Lists ]

All trademarks and copyrights within the FreeLists archives are owned by their respective owners.
Everything else ©2007 Avenir Technologies, LLC.