[openbeos] Re: app_server: MMX/SSE help wanted

  • From: "Marcus Overhagen" <ml@xxxxxxxxxxxx>
  • To: <openbeos@xxxxxxxxxxxxx>
  • Date: Wed, 11 Aug 2004 08:50:50 +0200

Christian Packmann <Christian.Packmann@xxxxxx> wrote:

> Not quite; MMX can access the bytes as single operands and perform the 
> addition on all 8 values in register at once, it has no need for shifting 
> any values - this is a huge advantage. And additionally it can do saturated 
> additions, i.e. all values >255 are automatically clipped to 255; in C you 
Thats correct. The automatic saturation to 0 and 255 is a huge benefit of MMX.

> need to do that in a separate step with masking (value & 0xff).
Thats wrong. Doing saturation with masking won't work. For example, the value
256 (0x100) would be clipped to 1 this way, which gives the wrong result.

Thus you need to compare with <0 and >255, which creates ugly jump instructions
in the genereated assembly code, and is slow. MMX doesn't need it, and is 
faster.

I had to implement such saturation code when wirting a color space conversion 
from
YCbCr420p(lanar) to RGB32 colorspace.

A fast way to do saturation on a 32 bit signed integer was:
#define SATURATE(a) if (0xffffff00 & (uint32)a) { if (a < 0) a = 0; else a = 
255; }  
These other I have tried but they were slower:
// #define SATURATE(a) if (0xffffff00 & (uint32)a) { if (0x80000000 & 
(uint32)a) a = 0; else a = 0xff; }  
// #define SATURATE(a) if (a < 0) a = 0; else if (a > 255) a = 255;  
// #define SATURATE(a) if (a < 0) a = 0; else if (a & 0xffffff00) a = 255;  
// #define SATURATE(a) if (a < 0) a = 0; if (a & 0xffffff00) a = 255;  

Even faster saturation was possible by using a lookup table. I precalculated 
the range
of input data, like -100 to  + 350, and made a lookup table with all entries 
from -100 to 0 having
the value 0, and everything above 255 having the value 255.
This was the fastet code that I was able to write in C for this purpose.

Code with saturation checking is Line 175 and below:
http://cvs.sourceforge.net/viewcvs.py/open-beos/current/src/add-ons/media/plugins/avcodec/gfx_conv_c.cpp?annotate=1.7

Code witch uses lookup tables is this:
http://cvs.sourceforge.net/viewcvs.py/open-beos/current/src/add-ons/media/plugins/avcodec/gfx_conv_c_lookup.cpp?annotate=1.1

regards
Marcus


Other related posts: