[haiku-development] Re: Fixing get_cpu_model_string(), Ticket 3541

From: André Braga <meianoite@xxxxxxxxx>
To: "haiku-development@xxxxxxxxxxxxx" <haiku-development@xxxxxxxxxxxxx>
Date: Sun, 21 Jun 2009 14:17:56 -0300

[Warning: the first part of this message is pretty much off-topic, butIMO necessary, essential even. The second part goes back to discussingvector acceleration of code, which could be considered off-topic aswell, as Richard pointed out, and would really be a better fit at the"Optimizing Painter::_DrawBitmapBilinearCopy32" thread. But I won'tmove it there not to disrupt THAT thread with the first part of thismessage. And I won't split this message risking losing all the contextand making myself sound harsher than what I'm supposing this willsound like already.


Skip at your discretion.]


[--- Part one ---]


2009/6/21 Richard Jasmin <jasminr@xxxxxxxxxxx>:

You mean like a mmx/opencL routine? you are confusing me.

Richard... *You* are confusing the hell out of me. And re-reading yourposting history to the Haiku lists, you seem to have confused most ofthe people that have actually *tried* to understand your emails. Manydidn't even bother.


**Which is, IMHO, to the loss of Haiku the project, not yours.**

And I mean it.

However, I can sympathise with them. You top-post. You don't punctuatecorrectly. You do direct translations from French to English, and eventhose in here whose native language *is* French have the hardest timetrying to grasp what you talk about.

We have advised you on all these issues before. I urge you *not* toignore them again, and *not* pretend they're not directed at you, likeyou did with an email Urias wrote last August.

Sorry if this sounds harsh, but this is because I actually consideryou to be an *asset* to the Haiku community given you extensiveknowledge of assembly. Your knowledge of Pascal is appreciated aswell. I'm afraid that if the language impedance continues to escalatelike this, we'll lose you. *OUR* loss.

I also urge you to enrol in a formal English-for-Speakers-of-Other-Languages course. If you don't have the time to, at least take thetime to proof-read your messages, or ask a co-worker to do it for you.You have mentioned Firefox before; if you use webmail, please installand enable the Firefox English dictionary add-on. In case you don't,Thunderbird supports that too.

I'm not a native English speaker, and believe me when I tell you thatwhile in theory I should be able to understand you with only littleextra effort as our respective native languages share a common root,it's actually *a hell of a lot harder* to understand you because youswing back and forth between two different grammar structures throughthe course of most of your messages.

So, despite risking not understanding your points at all, I'll takethe plunge and at least try to answer you. Because I want you to stickaround. Because I care and I'm willing to take the effort todemonstrate it.



[--- Part two ---]

vendors ship
optimized drivers like nvidia drivers and such when VESA modeswould workfine.I can understand somewhat why, but the rest is crappyprogramming.MMXisn't optimal when you think about it and -O3 does most of theoptimizationsanyway when compiling.You can use -O4, but even some optimizationswith -O3
break certain compiles.

There is no -O4 on GCC. Everything above -O3 is mapped back to -O3.However Apple has aliased -Os to -O4 on Xcode. Which is braindeadmoronic, but perhaps they're laying the ground for some optimisationlevel on clang.

[explanation on MMX snipped]

OpenCl is different.It uses hardware in the GPU to do thework,similar to
what apple did in the G4 series[also works with audio routines, BTW.]

Now you're confusing OpenCL with Altivec. Which is actually good,because if nothing else that just demonstrates how useful an unifiedacceleration kit would be.

Its nothing new, just on
Intel, not PPC systems.Why do you think everything requires a G4 orbetterthese days? It allows part of the GPU to function as an addedprocessor, not
to mention the speed of the GPU is always faster than ram.

You *definitely* mixed up the two. And you're confusing the purpose ofGPGPU acceleration. It's not because it's faster than RAM, it'sbecause you can treat the GPU as a massively parallel floating pointmath coprocessor. If anything, reading from the GPU texture memory isusually *very* slow, which is the reason why GPGPU acceleration is notpanacea. Speedups are very dataset-dependent. You'd better be doing alot of operations on that data for offloading to the GPU to beeffective, and it works best if your result is much smaller than thedata you submitted for processing. Canonical example: large matrixmultiplication and physics simulation. Streaming data accelerationworks fine *if* the CPU is not to consume the results back. Canonicalexample: audio acceleration.

Accelerate.framework abstracts away vector operations into intrinsics-like interfaces. There are code paths for Altivec, SSE (and I'msupposing also ARM Neon now), and optimised "plain" C code. It thentests the CPU for what level of those instructions it supports, anduses the fastest available. In this sense it behaves exactly likeliboil, only universal.

Being an API that produces the right acceleration code given the rightcapabilities, there's no fundamental reason why it shouldn't alsosupport generating code that would run on the GPU.

The converse *is* being done already with OpenGL.framework: when a GPUdoesn't have capabilities that Mac OS X uses already, it falls back toemulating them on the CPU using highly optimised vector code. But thiscode is generated on the fly by leveraging LLVM. So the actual"assembly" vector code used on OpenGL.framework is LLVM IL, and thejob of compiling this abstract vector code to the best representationavailable on the CPU where the code is to be run falls on the hand ofthe LLVM JIT compiler.

This was done because on PowerPC-based Macs the GPU was alwaysdiscrete, and usually decent enough. But since the move to Intel, itmade economic sense to use integrated Intel GMA, which lacks somecapabilities that Mac OS X was already using on the PowerPC for QuartzExtreme.

So, as you see, there's nothing fundamental that prevents OpenGL codefrom running on the CPU; if anything, this is actually the norm. I didnote however that Be did the *same* thing with the unreleased OpenGLKit, because back then 3D accelerators lacked stuff that today weconsider basic and take for granted, like Transform and Lighting (T&L)hardware units, and support for vector code on x86 CPUs was justbeginning. OpenGL Kit generated machine code on the fly to cover forthose cases.

And finally I didn't propose much other than raising awareness to thefollowing: despite the very surprising that the stream computingindustry has settled on an unified programming API in the form ofOpenCL, in its current form it only targets off-CPU stream processorslike NVIDIA Tesla, AMD FireStream, Intel Larrabee and the GPU productsderived from those. (It was actually the reverse, but for simplicityof argument let's pretend it was not).

So I argued that a Grand Unified acceleration kit that can also targetin-CPU vector units would be ideal, and I opined that the easiest andcleanest way to accomplish this would be using a language that can becompiled into both kinds of code. However nobody would like to leavetheir language of choice for some esoteric alternative. Which is whyif code is compiled with LLVM and then JITted to the targetarchitecture, this would work wonders. And I strongly suspect this isthe direction that Apple will head to eventually, unless they'recompletely stupid and short-sighted, because it would solve most oftheir cross-platform support issues in one sweep. I think they'll movethe entire OS to LLVM, and universal binaries will be but LLVM ILobjects.

Even if Apple ends up being stupid, we should not be. By using LLVM wecan lower the barrier for supporting multiple architectures and notfall into the JVM/CLR tarpit, *and* we get generalised acceleration asa bonus. Or the other way around. Pick your preferred buzzword-compliant maketing pitch. :)

Start a new thread please, this one is on detection, notoptimization.[hence
the title.]


Oh well. Maybe next time :P


Cheers,
A.


--
One last piece of advice: "ice".

[haiku-development] Re: Fixing get_cpu_model_string(), Ticket 3541

Other related posts: