[openbeos] Re: app_server: MMX/SSE help wanted

  • From: Christian Packmann <Christian.Packmann@xxxxxx>
  • To: openbeos@xxxxxxxxxxxxx
  • Date: Tue, 10 Aug 2004 21:54:08 +0200

On 2004-08-10 14:52:32 [+0200], Alexander G. M. Smith wrote:
> Christian Packmann wrote on Mon, 09 Aug 2004 23:03:49 +0200:

>> Even for non-cacheable data and simple operations, SIMD processing (and 
>> use of data prefetch instructions) can give more than decisive 
>> advantages.
 
> Looks like somewhere between 2 and 3 times speedup for large data.

On my system with its slow RAM; P4s with fast RAM are a different kind of 
breed, the same will go for Athlon64s.  So on modern systems a speedup of 4 
times seems more likely. 

> Sure are lots of shift instructions in the C code - that's what MMX does 
> do all in one operation.

Not quite; MMX can access the bytes as single operands and perform the 
addition on all 8 values in register at once, it has no need for shifting 
any values - this is a huge advantage. And additionally it can do saturated 
additions, i.e. all values >255 are automatically clipped to 255; in C you 
need to do that in a separate step with masking (value & 0xff).

> I wonder if it would be faster or slower with 
> byte pointers and math rather than shift operations to extract the bytes. 

Good idea about the byte pointers, I just tested this and while it gives a
marginal +2% improvement for RAM data, it's +30% for cache.

You can't use byte arithmetic though, as x86 has no saturated integer 
addition; any overflow would give garbage results. But ADDs are usually 
heavily optimized nowadays and should execute in 1 cycle irregardless of 
width.

>  I'd also check the generated code to make sure *src was not being 
> reloaded for every operation (copy it to a local variable first in that 
> case) and compile with optimization.

The byte pointer version uses a local var, so this shouldn't be a problem. 
And I had opt=full from the beginning.

I'll clean up the program a bit and upload a new version, hopefully by 
tomorrow.

> Anyway, it's nice to see those actual numbers!

A pleasure! :)

Bye,
Chris


Other related posts: