[openbeos] Re: app_server: MMX/SSE help wanted

On 2004-08-11 02:27:33 [+0200], Mat Hounsell wrote:

> (Scott was right.) I was refering to the higher level drawing control 
> algoritms, these will specify your real general peformance. Suprising the 
> next speed bottleneck will be the font/glyph engine. Then your 
> compositing, blending and drawing algorithms.

All these are important. And if you mess up any part, total speed will 
suffer. But as the different parts are pretty much decoupled from another, 
you can optimize them separately, as the need arises. And a low-resource 
project like Haiku does not have the ability to perfect algorithms on each 
level, so it might be best to aim for 70-80%-optimal solutions now, and try 
to get to 90-95% later on. And if Haiku should be succesful, there should 
be a lot more resources available for that purpose.

> [...]
> By all means hand code assembler for MMX. But if Intel/AMD etc are going 
> dual core then multithreading will give you better speed, than MMX could.

If SIMD code runs up to 4x faster on a P4, how would two P4s on integer 
beat one P4 running SIMD? And with dual-core chips finetuned control over 
RAM access becomes yet more important due to the sharing of bandwith; which 
is one more reason to use the prefetch and streaming store instructions of 
modern CPUs.

> At the same time don't rule out the other end of the market like VIA's 
> Eden. These low cost, low electricity but good peformance platforms will 
> be important in the commercial and "developing" (i.e. poorer) world. 
> Longhorn wont run on these. http://www.via.com.tw/en/Products/eden.jsp

They are nice, I know. From looking at the datasheet, I'd guess they'll 
profit more from good SIMD code than K7/8s or P4s. If you need 
single-precision FP calculations, 3DNow! may be up to 4x faster, as the FPU 
is clocked at only half the CPU frequency, while 3DNow! can perform two FP 
ops simultaneously. On this chip it may be useful to generally use 3DNow! 
for FP when you can live with single-precision.

Also, only one integer pipeline, compared to three on K7 or two 
double-pumped ones on the P4; the ratio of integer to MMX power is worse on 
the 
Eden, giving SIMD code more of an advantage.

Thanks for providing arguments in favor of SIMD! ;)

Bye,
Chris

Other related posts: