[openbeos] Re: app_server: MMX/SSE help wanted

Hi,

Mat Hounsell wrote:
We are looking for somebody who can help us using MMX/SSE for 2D graphics (color conversion, transparency/alpha_blending, translucency, other_things possible_that_I_am_not_aware_of). If you feel you can do it please let us know.

Thank you,
    InterfaceKit Team.

I'm working in the imaging field.

Then I asume that you know enough. Still, maybe I am wrong but some of what you are saying here doesn't make sense to me.


The industry has found that the algorithms make all the speed difference.

If that was the case, why did Intel introduced(and continues to improve) MMX/SSE? Don't tell me: because of market share! If you still do I would say to you: why a movie would play nicely on a Pentium 133A (with MMX) and on a plain P133 you could (not) see some skipped frames?
If something is designed to solve some problems, why not use it?
I bet you can work six months to find/optimize one of you algorithms and still be (maybe a lot) slower that one that uses MMX developed in just a month.
My opinion: you cannot compare best-engineered on a general purpose platform, with well-engineered on a specific platform.


In most cases, the consensus is keep everything as shapes as long as possible
and render the minimal ammount per refresh.

I don't here neither. That's flickering.

Microsft in longhorn is taking a different approach. They are passing
everything of to the GPU and every window paints its entire self every time a
draw is requested.

And, Apple is delegating to the GPU.

Yes, but not in the way you describe it.
Windows are double buffered. That means only one draw request is made and its output is rendered into that offscreen window. When Window's display manager wants to display a piece of that window, it just uses the GPU to blit on screen the piece that it wants from the cached window.
::Invalidate is the way a window can force a redraw. (or when a cached(offscreen) window was lost - for example - when exiting a game)


I don't know exactly how Longhorn's app_server works, but me, DW, Gabe and Rudolf have a long time since we discussed this, and after reading an article about Avalon, it seems they have thought "it" the same way.

The display industry is looking at selling larger displays and higher
resolutions. Because, it's very hard to increase market share with the same old
product. They are looking at making displays double the size with at least the
same dpi or the same size and douple the horizontal and vertical dpi's. Either
way that's quadruple the number of pixels. Some displays require two DV
connections. The same problem applies for multiple monitors.

That means, more work for MMXs! :-)

As such it doesn't matter about CPU optimization if your algorithm isn't up to
the task.

You don't imagine that one would give in for MMX a low performance algorithm.
But, if I think again, there is no algotithm involved. We just need to execute an instruction for a number of pixels, and that means:
for (){
for(){
// do your magic here...
}
}
It's all about bandwidth!


But the simple fact is that these instructions are difficult and not always
useful; hand optimized code may be not be worth the maintainance work.

Do not agree. A blur effect can be made in 10 assembler lines ( that is: what you have modify after you generated the .asm file). And, what maintenance work are you talking about? Once you have a blur or transparency effect, what more needs to be done?
Also, I don't think AMD has a different (or buggy) set of MMX/SSE instructions, and the code once written it will work anywhere. OK, there may be some differences, which I bet are very small and easy to fix if you have the manuals.


Get the latest GCC and compile tests optomized for the 686 and 686 + MMX etc
and compare the results. An improved GCC may be good enough. (Or you could
improve it).

I don't think GCC can get it more right. It doesn't know how to optimally organize my data. I do.



Adi.

Other related posts: