[openbeos] Re: app_server: MMX/SSE help wanted

  • From: "François Revol" <revol@xxxxxxx>
  • To: openbeos@xxxxxxxxxxxxx
  • Date: Sun, 08 Aug 2004 19:29:46 +0200 CEST

> On 2004-08-08 15:30:10 [+0000], François Revol wrote:
> 
> >> MMX uses 64 bit integer registers, independent of CPU core. That 
> > > means 
> >> that the CPU can process anything else while we use MMX. There is 
> > > a 
> > That's plain wrong, intel in its inifinite wisdom reused the FPU 
> > register 
> > file, so you can't interleave floating point ops and mmx, defeating 
> > the
> > very purpose of it...
> 
> What was the option? A larger register file would have required new 
> commands to store/recall the file during task switches, and this 
> would have 
Isn't it what has been done in SSE anyway ?

> been unusable until OS support was added. MMX may be a hack, but at 
> least 
> it could be used right away.
> 
> > and you have to call an opcode before using mmx to safe 
> > the fpu regs.
> 
> Nope, the other way around. Use of any MMX command automatically puts 
> the 
> CPU into MMX mode, but to use FP again requires calling EMMS (or 
> FEMMS on 
> 3DNow!).
Whichever.

> >> catch with MMX. If you want to use floating point operations then 
> > > MMX 
> >> "burrows" the CPU core halting the process executing. What's why 
> > > MMX 
> > I think it's "borrow" there.
> > Yup, so it's not so 'independant' :)
> > 
> >> program loops should use integer processing only. As I see things, 
> >> using 
> >> 64 bit registers in parallel with CPU core gives us a 2+ 
> > > performance 
> >> gain when handling 32bit pixels bitmaps.
> > Yes, usually they are read 2 by 2 and then treated as int64.
> 
> Hmm? Can't make sense of that "2 by 2". MMX mem accesses usually 
> operate 
> with 64bits; this alone gives a speed advantage, as only one memory 
> access 
> has to be processed (instruction decoding, scheduling, address 
> generation, 
> etc.) for reading 2 dwords.
2 by 2 == 2 at once.

> And MMX certainly doesn't treat quadword as int64, but defines 
> operations 
> for packed datatypes, which can be of size byte, word or quadword 
> (that's 
> the whole purpose of SIMD=Single Instruction Multiple Data 
> processing).
> The only way to do int64s natively on x86 are the new 64bit 
> extensions 
> AFAIK.
I meant int64 as "sizeof(int64)"

> 
> >> SSE introduces additional integer AND floating point instructions, 
> > > all 
> >> operating on 128 bit registers! See, you can process 4(!) 32bit 
> > > pixels 
> >> at a time in parallel with CPU core. ALSO, floating point 
> > > registers 
> >> could help us calculate pixels for *any* drawing instruction. (at 
> > > least 
> >> I think so)
> >> 
> >> You see, this sounds TOO good even to be left out for R1.1. If you 
> > > Gabe 
> >> can study this and make use of the MMX power, then I will cancel 
> > > this 
> >> post. But if you are busy with other things we can use outside 
> > > help.
> >> 
> > Yes but it's x86 specific, and even vendor specific for some 
> > points, 
> > (some cpus have 3Dnow instead, ...
> 
> Nowadays that's not too much of a problem; by doing MMX and SSE 
> versions 
> you have SIMD support for all current CPUs. Adding SSE2 would improve 
> support for P4/A64, but this isn't so crucial unless you need fast 
> 64bit FP 
> ops.
> 
> > even though AMD now also implements SSE(and 
> > badly, which is why R5 crashes on AthlonXPs, since the cpu fakes an 
> > intel 
> > chip and BeOS treats it as such)).
> 
> The XP certainly doesn't fake any kind of Intel CPU; the CPU 
> identification 
> in BeOS is broken, badly. It's no small feat to identify a CPU as 
> Intel 
> which returns a vendor ID of "AuthenticAMD". :)

AFAIK it does.

François.


Other related posts: