[haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32

From: André Braga <meianoite@xxxxxxxxx>
To: "haiku-development@xxxxxxxxxxxxx" <haiku-development@xxxxxxxxxxxxx>
Date: Tue, 16 Jun 2009 14:16:33 -0300

Em 16/06/2009, às 13:32, Christian Packmann<Christian.Packmann@xxxxxx> escreveu:

Hm, it doesn't strike me as that interesting for high-performancecode. More a general solution for platform-independent JIT-code (buta very elegant one at that). The instruction set has too manyomissions though, and at least on x86 it wouldn't be easy to doproper compilation for efficient SSE3+ code, especially when doingJIT.

The IR of LLVM is an extensible bitstream format, not something set instone. They have some vector ops already, and numbers are limitless inprecision as far as the IR is concerned. Which means that you can domath on 1024bit-wide integers and it will convert to operationsspecific to your architecture. How optimised it is depends onextending the compiler so it can match the semantics of the ops to theavailable instructions. As a matter of fact LLVM is built around theidea of extensibility.

OpenCL is the solution for high-performance computing, and I'mlooking forward to it being implemented widely. Using graphics cardsas very wide SIMD units is really the most efficient way ofachieving high performance at acceptable power consumption for thecurrent hardware. And as OpenCL code can run on anything fromgraphics cards to normal CPUs to Larrabee, OpenCL code will likelybe the best solution for doing high-performance code, as you can runit nearly anywhere.

Take a wild guess if it's possible to extend LLVM so it maps arbitraryvector code to OpenCL calls. Or calls to accelerator boards. Or... :)

Should be *interesting* to do with LLVM :D
As stated, it doesn't match the x86 SSE instructions too well andthus would loose performance compared to native code.

Re-read my statement with what I just said in mind. LLVM is atoolchest for building compilers. It's not set in stone and you don'tdepend on someone taking pity on you to allow for a sane, cleaninterface for extending the compiler *cough* GCC *cough*.

And doing JIT for performance-critical code is really a nice idea,but so far it never seemed to work out too well. :-) It all dependson compiler technology which always seems to lag a bit behindhardware abilities.

That's a given for non-commercial compilers. That is, until LLVM came.It's almost trivial to build frontends for it, and backends aren'tmuch difficult either!

(I should arrange for being hired as a LLVM evangelist before peoplestart calling me "LLVM Cheerleader", or worse, Chris Lattner's bitch :P)

Hm, I'm not sure that VLIW would actually be that useful. It justputs all responsibility for good performance on the compiler; assuch it can only work on the desktop if you use JIT compilation, asstatic compilation will prevent any drastic changes to your CPUarchitecture. And on the desktop, you want the ability to distributebinaries without having to recompile on every machine. Okay, let'signore Linux. ;-)


I see you're joining the dots now :)

Platform-independent binary format, tracing, JIT, LLVM, "esoteric"architectures like VLIW...

And from neither a performance nor efficiency POV VLIW fared well.Itanium never beat the other architectures decisively on all scores(for some benchmarks yes, but not universally)

You can mostly blame GCC for that, because Joseph Fischer's compilersused on HP Labs ought to have been fantastic for Itanium. They surewere for Trace/xx. (As far as History goes; I never touched any ofthose!)

and Transmetas CPUs turned out to be not better from a power/performance perspective than Intels CPUs once Intel started tooptimize for power.

Well, no wonder, they had the overhead of decoding x86 instead ofexposing their actual architecture!

And as x86 development costs can be shared across mobile, desktopand server CPUs, it just has the most R&D money available which putsit at an advantage over other CPU architectures.

The fact that it has economies of scale in its favour doesn't mean x86/x64 is the ultimate computer architecture. As GPGPUs came todemonstrate.(...that 70's style transputers would become the salvation board forcomputational performance race 40 years later :))

Using "dedicated" hardware like GPUs for high-width vectorprocessing is the better solution here IMO. For vectorizablealgorithms they can deliver 2-4x times the MIPS/Watt than CPUs. Andas most really compute-intensive workloads happen to be vector-friendly, that is a basically perfect solution. Now we just need anOpenCL port and drivers for Haiku. :-)


Should we start lobbying at Intel and AMD now? :)

I remember reading somewhere that both had some kind of informalpartnerships with Be ans used BeOS as a testing platform when theyneeded a low-overhead but still practical OS to run media applicationsand the like... We can sure make the case for them to contributeaccelerated libraries to Haiku, as we don't demand GPL'd code and allthat. Binary blobs even.

Not ideal from FSF's point of view, but our ideals for Haiku both asOS builders and end users actually point elsewhere anyway, right?



Cheers,
A.

References:
- [haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32
  - From: Christian Packmann
- [haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32
  - From: Adam K Kirchhoff
- [haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32
  - From: Christian Packmann
- [haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32
  - From: Urias McCullough
- [haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32
  - From: Christian Packmann
- [haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32
  - From: André Braga
- [haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32
  - From: Christian Packmann
- [haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32
  - From: André Braga
- [haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32
  - From: Christian Packmann

[haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32

Other related posts: