[haiku-development] Re: Fixing get_cpu_model_string(), Ticket 3541

  • From: André Braga <meianoite@xxxxxxxxx>
  • To: "haiku-development@xxxxxxxxxxxxx" <haiku-development@xxxxxxxxxxxxx>
  • Date: Sun, 21 Jun 2009 14:17:56 -0300

[Warning: the first part of this message is pretty much off-topic, but IMO necessary, essential even. The second part goes back to discussing vector acceleration of code, which could be considered off-topic as well, as Richard pointed out, and would really be a better fit at the "Optimizing Painter::_DrawBitmapBilinearCopy32" thread. But I won't move it there not to disrupt THAT thread with the first part of this message. And I won't split this message risking losing all the context and making myself sound harsher than what I'm supposing this will sound like already.


Skip at your discretion.]


[--- Part one ---]


2009/6/21 Richard Jasmin <jasminr@xxxxxxxxxxx>:
You mean like a mmx/opencL routine? you are confusing me.

Richard... *You* are confusing the hell out of me. And re-reading your posting history to the Haiku lists, you seem to have confused most of the people that have actually *tried* to understand your emails. Many didn't even bother.

**Which is, IMHO, to the loss of Haiku the project, not yours.**

And I mean it.

However, I can sympathise with them. You top-post. You don't punctuate correctly. You do direct translations from French to English, and even those in here whose native language *is* French have the hardest time trying to grasp what you talk about.

We have advised you on all these issues before. I urge you *not* to ignore them again, and *not* pretend they're not directed at you, like you did with an email Urias wrote last August.

Sorry if this sounds harsh, but this is because I actually consider you to be an *asset* to the Haiku community given you extensive knowledge of assembly. Your knowledge of Pascal is appreciated as well. I'm afraid that if the language impedance continues to escalate like this, we'll lose you. *OUR* loss.

I also urge you to enrol in a formal English-for-Speakers-of-Other- Languages course. If you don't have the time to, at least take the time to proof-read your messages, or ask a co-worker to do it for you. You have mentioned Firefox before; if you use webmail, please install and enable the Firefox English dictionary add-on. In case you don't, Thunderbird supports that too.

I'm not a native English speaker, and believe me when I tell you that while in theory I should be able to understand you with only little extra effort as our respective native languages share a common root, it's actually *a hell of a lot harder* to understand you because you swing back and forth between two different grammar structures through the course of most of your messages.

So, despite risking not understanding your points at all, I'll take the plunge and at least try to answer you. Because I want you to stick around. Because I care and I'm willing to take the effort to demonstrate it.


[--- Part two ---]


vendors ship
optimized drivers like nvidia drivers and such when VESA modes would work fine.I can understand somewhat why, but the rest is crappy programming.MMX isn't optimal when you think about it and -O3 does most of the optimizations anyway when compiling.You can use -O4, but even some optimizations with -O3
break certain compiles.

There is no -O4 on GCC. Everything above -O3 is mapped back to -O3. However Apple has aliased -Os to -O4 on Xcode. Which is braindead moronic, but perhaps they're laying the ground for some optimisation level on clang.

[explanation on MMX snipped]

OpenCl is different.It uses hardware in the GPU to do the work,similar to
what apple did in the G4 series[also works with audio routines, BTW.]

Now you're confusing OpenCL with Altivec. Which is actually good, because if nothing else that just demonstrates how useful an unified acceleration kit would be.

Its nothing new, just on
Intel, not PPC systems.Why do you think everything requires a G4 or better these days? It allows part of the GPU to function as an added processor, not
to mention the speed of the GPU is always faster than ram.

You *definitely* mixed up the two. And you're confusing the purpose of GPGPU acceleration. It's not because it's faster than RAM, it's because you can treat the GPU as a massively parallel floating point math coprocessor. If anything, reading from the GPU texture memory is usually *very* slow, which is the reason why GPGPU acceleration is not panacea. Speedups are very dataset-dependent. You'd better be doing a lot of operations on that data for offloading to the GPU to be effective, and it works best if your result is much smaller than the data you submitted for processing. Canonical example: large matrix multiplication and physics simulation. Streaming data acceleration works fine *if* the CPU is not to consume the results back. Canonical example: audio acceleration.

Accelerate.framework abstracts away vector operations into intrinsics- like interfaces. There are code paths for Altivec, SSE (and I'm supposing also ARM Neon now), and optimised "plain" C code. It then tests the CPU for what level of those instructions it supports, and uses the fastest available. In this sense it behaves exactly like liboil, only universal.

Being an API that produces the right acceleration code given the right capabilities, there's no fundamental reason why it shouldn't also support generating code that would run on the GPU.

The converse *is* being done already with OpenGL.framework: when a GPU doesn't have capabilities that Mac OS X uses already, it falls back to emulating them on the CPU using highly optimised vector code. But this code is generated on the fly by leveraging LLVM. So the actual "assembly" vector code used on OpenGL.framework is LLVM IL, and the job of compiling this abstract vector code to the best representation available on the CPU where the code is to be run falls on the hand of the LLVM JIT compiler.

This was done because on PowerPC-based Macs the GPU was always discrete, and usually decent enough. But since the move to Intel, it made economic sense to use integrated Intel GMA, which lacks some capabilities that Mac OS X was already using on the PowerPC for Quartz Extreme.

So, as you see, there's nothing fundamental that prevents OpenGL code from running on the CPU; if anything, this is actually the norm. I did note however that Be did the *same* thing with the unreleased OpenGL Kit, because back then 3D accelerators lacked stuff that today we consider basic and take for granted, like Transform and Lighting (T&L) hardware units, and support for vector code on x86 CPUs was just beginning. OpenGL Kit generated machine code on the fly to cover for those cases.

And finally I didn't propose much other than raising awareness to the following: despite the very surprising that the stream computing industry has settled on an unified programming API in the form of OpenCL, in its current form it only targets off-CPU stream processors like NVIDIA Tesla, AMD FireStream, Intel Larrabee and the GPU products derived from those. (It was actually the reverse, but for simplicity of argument let's pretend it was not).

So I argued that a Grand Unified acceleration kit that can also target in-CPU vector units would be ideal, and I opined that the easiest and cleanest way to accomplish this would be using a language that can be compiled into both kinds of code. However nobody would like to leave their language of choice for some esoteric alternative. Which is why if code is compiled with LLVM and then JITted to the target architecture, this would work wonders. And I strongly suspect this is the direction that Apple will head to eventually, unless they're completely stupid and short-sighted, because it would solve most of their cross-platform support issues in one sweep. I think they'll move the entire OS to LLVM, and universal binaries will be but LLVM IL objects.

Even if Apple ends up being stupid, we should not be. By using LLVM we can lower the barrier for supporting multiple architectures and not fall into the JVM/CLR tarpit, *and* we get generalised acceleration as a bonus. Or the other way around. Pick your preferred buzzword- compliant maketing pitch. :)

Start a new thread please, this one is on detection, not optimization.[hence
the title.]

Oh well. Maybe next time :P


Cheers,
A.


--
One last piece of advice: "ice".


Other related posts: