[openbeos] Re: app_server: MMX/SSE help wanted

  • From: Christian Packmann <Christian.Packmann@xxxxxx>
  • To: openbeos@xxxxxxxxxxxxx
  • Date: Sun, 08 Aug 2004 18:52:11 +0000

On 2004-08-08 15:30:10 [+0000], François Revol wrote:

>> MMX uses 64 bit integer registers, independent of CPU core. That means 
>> that the CPU can process anything else while we use MMX. There is a 
> That's plain wrong, intel in its inifinite wisdom reused the FPU register 
> file, so you can't interleave floating point ops and mmx, defeating the
> very purpose of it...

What was the option? A larger register file would have required new 
commands to store/recall the file during task switches, and this would have 
been unusable until OS support was added. MMX may be a hack, but at least 
it could be used right away.

> and you have to call an opcode before using mmx to safe 
> the fpu regs.

Nope, the other way around. Use of any MMX command automatically puts the 
CPU into MMX mode, but to use FP again requires calling EMMS (or FEMMS on 
3DNow!).

>> catch with MMX. If you want to use floating point operations then MMX 
>> "burrows" the CPU core halting the process executing. What's why MMX 
> I think it's "borrow" there.
> Yup, so it's not so 'independant' :)
> 
>> program loops should use integer processing only. As I see things, 
>> using 
>> 64 bit registers in parallel with CPU core gives us a 2+ performance 
>> gain when handling 32bit pixels bitmaps.
> Yes, usually they are read 2 by 2 and then treated as int64.

Hmm? Can't make sense of that "2 by 2". MMX mem accesses usually operate 
with 64bits; this alone gives a speed advantage, as only one memory access 
has to be processed (instruction decoding, scheduling, address generation, 
etc.) for reading 2 dwords.

And MMX certainly doesn't treat quadword as int64, but defines operations 
for packed datatypes, which can be of size byte, word or quadword (that's 
the whole purpose of SIMD=Single Instruction Multiple Data processing).
The only way to do int64s natively on x86 are the new 64bit extensions 
AFAIK.

>> SSE introduces additional integer AND floating point instructions, all 
>> operating on 128 bit registers! See, you can process 4(!) 32bit pixels 
>> at a time in parallel with CPU core. ALSO, floating point registers 
>> could help us calculate pixels for *any* drawing instruction. (at least 
>> I think so)
>> 
>> You see, this sounds TOO good even to be left out for R1.1. If you Gabe 
>> can study this and make use of the MMX power, then I will cancel this 
>> post. But if you are busy with other things we can use outside help.
>> 
> Yes but it's x86 specific, and even vendor specific for some points, 
> (some cpus have 3Dnow instead, ...

Nowadays that's not too much of a problem; by doing MMX and SSE versions 
you have SIMD support for all current CPUs. Adding SSE2 would improve 
support for P4/A64, but this isn't so crucial unless you need fast 64bit FP 
ops.

> even though AMD now also implements SSE(and 
> badly, which is why R5 crashes on AthlonXPs, since the cpu fakes an intel 
> chip and BeOS treats it as such)).

The XP certainly doesn't fake any kind of Intel CPU; the CPU identification 
in BeOS is broken, badly. It's no small feat to identify a CPU as Intel 
which returns a vendor ID of "AuthenticAMD". :)

Bye,
Chris

Other related posts: