[openbeos] Re: app_server: MMX/SSE help wanted

> Hi Gabe,
[...]
> 
>       I  have talked with some of my colleagues at work and then searched 
> the 
> net for some info about MMX concept and I found that using this set 
> can 
> dramatically improves performance. At least 2 times faster when using 
> MMX.
> 
>       Let me say what I know about this:
> MMX uses 64 bit integer registers, independent of CPU core. That 
> means 
> that the CPU can process anything else while we use MMX. There is a 
That's plain wrong, intel in its inifinite wisdom reused the FPU 
register file,
so you can't interleave floating point ops and mmx, defeating the very 
purpose of it... and you have to call an opcode before using mmx to 
safe 
the fpu regs.

> catch with MMX. If you want to use floating point operations then MMX 
> "burrows" the CPU core halting the process executing. What's why MMX 
I think it's "borrow" there.
Yup, so it's not so 'independant' :)

> program loops should use integer processing only. As I see things, 
> using 
> 64 bit registers in parallel with CPU core gives us a 2+ performance 
> gain when handling 32bit pixels bitmaps.
Yes, usually they are read 2 by 2 and then treated as int64.

> 
> SSE introduces additional integer AND floating point instructions, 
> all 
> operating on 128 bit registers! See, you can process 4(!) 32bit 
> pixels 
> at a time in parallel with CPU core. ALSO, floating point registers 
> could help us calculate pixels for *any* drawing instruction. (at 
> least 
> I think so)
> 
> You see, this sounds TOO good even to be left out for R1.1. If you 
> Gabe 
> can study this and make use of the MMX power, then I will cancel this 
> post. But if you are busy with other things we can use outside help.
> 
Yes but it's x86 specific, and even vendor specific for some points, 
(some 
cpus have 3Dnow instead, ... even though AMD now also implements 
SSE(and 
badly, which is why R5 crashes on AthlonXPs, since the cpu fakes an 
intel 
chip and BeOS treats it as such)).

François.


Other related posts: