[openbeos] Re: app_server: MMX/SSE help wanted

  • From: Christian Packmann <Christian.Packmann@xxxxxx>
  • To: openbeos@xxxxxxxxxxxxx
  • Date: Tue, 10 Aug 2004 20:59:15 +0200

On 2004-08-10 19:20:59 [+0200], Scott Donaldson wrote:

> AMD's code optimizing manual describes how to allocate memory aligned.
> 
> double *p;
> double *np;
> 
> p = (double *)malloc(sizeof(double) * number_of_doubles + 7L); np = 
> (double *)((((long)(p)) + 7L) & (-8L));

I use memalign(), easier and safer.
 
> Can someone clear something up for me with SSE. PADDUSB on XMM will 
> affect all 128b, so why does the instruction need SSE2 support according 
> to the AMD x86-64 manual?

SSE only defines the 4x32bit floating point operations working on xmm regs, 
and some additional MMX instructions which only work on mm regs.
SSE2 adds 2x64bit floating point, and the ability to perform integer 
operations on 128bit registers. AFAIK this extends to all integer 
instructions of the MMX and SSE instruction sets. 

> I do recall there is a difference in AMD's SSE and Intel's, I can't find 
> it again but I recall stumbling across it a while back when I was 
> writting a benchmarking program. It was something like 3DNow! Pro allowed 
> operation on all 128b of the XMM regs using MMX instructions where as 
> Intel's SSE required the appropriate SSE instruction to use the whole 
> 128b otherwise it would only work on the lower 64b.

I think that all MMX instructions which work on xmm regs at all will 
automatically work on all 128bits.
However, there are a few SSE2 move and conversion functions which only work 
on 64bits of an xmm register, like MOVQ2DQ which moves an mm register to 
the lower half of an xmm reg. Maybe that is what's been bugging you?

Bye,
Chris

Other related posts: