Christian Packmann <Christian.Packmann@xxxxxx> wrote: > Not quite; MMX can access the bytes as single operands and perform the > addition on all 8 values in register at once, it has no need for shifting > any values - this is a huge advantage. And additionally it can do saturated > additions, i.e. all values >255 are automatically clipped to 255; in C you Thats correct. The automatic saturation to 0 and 255 is a huge benefit of MMX. > need to do that in a separate step with masking (value & 0xff). Thats wrong. Doing saturation with masking won't work. For example, the value 256 (0x100) would be clipped to 1 this way, which gives the wrong result. Thus you need to compare with <0 and >255, which creates ugly jump instructions in the genereated assembly code, and is slow. MMX doesn't need it, and is faster. I had to implement such saturation code when wirting a color space conversion from YCbCr420p(lanar) to RGB32 colorspace. A fast way to do saturation on a 32 bit signed integer was: #define SATURATE(a) if (0xffffff00 & (uint32)a) { if (a < 0) a = 0; else a = 255; } These other I have tried but they were slower: // #define SATURATE(a) if (0xffffff00 & (uint32)a) { if (0x80000000 & (uint32)a) a = 0; else a = 0xff; } // #define SATURATE(a) if (a < 0) a = 0; else if (a > 255) a = 255; // #define SATURATE(a) if (a < 0) a = 0; else if (a & 0xffffff00) a = 255; // #define SATURATE(a) if (a < 0) a = 0; if (a & 0xffffff00) a = 255; Even faster saturation was possible by using a lookup table. I precalculated the range of input data, like -100 to + 350, and made a lookup table with all entries from -100 to 0 having the value 0, and everything above 255 having the value 255. This was the fastet code that I was able to write in C for this purpose. Code with saturation checking is Line 175 and below: http://cvs.sourceforge.net/viewcvs.py/open-beos/current/src/add-ons/media/plugins/avcodec/gfx_conv_c.cpp?annotate=1.7 Code witch uses lookup tables is this: http://cvs.sourceforge.net/viewcvs.py/open-beos/current/src/add-ons/media/plugins/avcodec/gfx_conv_c_lookup.cpp?annotate=1.1 regards Marcus