[openbeos] Re: app_server: MMX/SSE help wanted

  • From: Adi Oanca <e2joseph@xxxxxxxxxx>
  • To: openbeos@xxxxxxxxxxxxx
  • Date: Sun, 08 Aug 2004 11:03:39 +0300

Hi Gabe,

Gabe Yoder wrote:
Perhaps I should clarify. Adi and I were curious are thinking ahead.
Unless I have misjudged the code at this point, depending on how
quickly he and I can fling code, we could be seeing some quite useful
milestones in the coming months. We are thinking about potential
optimizations in the drawing code and such. While I won't speak for
Adi, I personally was wondering what can be done with these other
instructions with respect to graphics and whether or not it is worth it
to even consider writing code to utilize them. I don't know much about
any of this kind of stuff, and I'm pretty sure Adi doesn't, either, so
at this point we just wanted to see if there is anyone who is
knowledgeable in this kind of thing who could answer some questions and
such.


Well, that's nice of you two to go looking for outside help without even checking with the person who has written most of the graphics code. BTW, Francois hit the nail on the head. First we worry about getting stuff working and checking our current performance. After that, we can consider using non-portable optimizations where needed.

Gabe, I am REALLY sorry because I posted without talking to you first. I just can't find the answer why I didn't talked to you. It was stupid and I apologize.


I have talked with some of my colleagues at work and then searched the net for some info about MMX concept and I found that using this set can dramatically improves performance. At least 2 times faster when using MMX.

Let me say what I know about this:
MMX uses 64 bit integer registers, independent of CPU core. That means that the CPU can process anything else while we use MMX. There is a catch with MMX. If you want to use floating point operations then MMX "burrows" the CPU core halting the process executing. What's why MMX program loops should use integer processing only. As I see things, using 64 bit registers in parallel with CPU core gives us a 2+ performance gain when handling 32bit pixels bitmaps.


SSE introduces additional integer AND floating point instructions, all operating on 128 bit registers! See, you can process 4(!) 32bit pixels at a time in parallel with CPU core. ALSO, floating point registers could help us calculate pixels for *any* drawing instruction. (at least I think so)

You see, this sounds TOO good even to be left out for R1.1. If you Gabe can study this and make use of the MMX power, then I will cancel this post. But if you are busy with other things we can use outside help.


Adi.

Other related posts: