[haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32

  • From: André Braga <meianoite@xxxxxxxxx>
  • To: "haiku-development@xxxxxxxxxxxxx" <haiku-development@xxxxxxxxxxxxx>
  • Date: Tue, 16 Jun 2009 14:16:33 -0300

Em 16/06/2009, às 13:32, Christian Packmann <Christian.Packmann@xxxxxx> escreveu:


Hm, it doesn't strike me as that interesting for high-performance code. More a general solution for platform-independent JIT-code (but a very elegant one at that). The instruction set has too many omissions though, and at least on x86 it wouldn't be easy to do proper compilation for efficient SSE3+ code, especially when doing JIT.

The IR of LLVM is an extensible bitstream format, not something set in stone. They have some vector ops already, and numbers are limitless in precision as far as the IR is concerned. Which means that you can do math on 1024bit-wide integers and it will convert to operations specific to your architecture. How optimised it is depends on extending the compiler so it can match the semantics of the ops to the available instructions. As a matter of fact LLVM is built around the idea of extensibility.

OpenCL is the solution for high-performance computing, and I'm looking forward to it being implemented widely. Using graphics cards as very wide SIMD units is really the most efficient way of achieving high performance at acceptable power consumption for the current hardware. And as OpenCL code can run on anything from graphics cards to normal CPUs to Larrabee, OpenCL code will likely be the best solution for doing high-performance code, as you can run it nearly anywhere.

Take a wild guess if it's possible to extend LLVM so it maps arbitrary vector code to OpenCL calls. Or calls to accelerator boards. Or... :)


Should be *interesting* to do with LLVM :D

As stated, it doesn't match the x86 SSE instructions too well and thus would loose performance compared to native code.

Re-read my statement with what I just said in mind. LLVM is a toolchest for building compilers. It's not set in stone and you don't depend on someone taking pity on you to allow for a sane, clean interface for extending the compiler *cough* GCC *cough*.

And doing JIT for performance-critical code is really a nice idea, but so far it never seemed to work out too well. :-) It all depends on compiler technology which always seems to lag a bit behind hardware abilities.

That's a given for non-commercial compilers. That is, until LLVM came. It's almost trivial to build frontends for it, and backends aren't much difficult either!

(I should arrange for being hired as a LLVM evangelist before people start calling me "LLVM Cheerleader", or worse, Chris Lattner's bitch :P)

Hm, I'm not sure that VLIW would actually be that useful. It just puts all responsibility for good performance on the compiler; as such it can only work on the desktop if you use JIT compilation, as static compilation will prevent any drastic changes to your CPU architecture. And on the desktop, you want the ability to distribute binaries without having to recompile on every machine. Okay, let's ignore Linux. ;-)

I see you're joining the dots now :)
Platform-independent binary format, tracing, JIT, LLVM, "esoteric" architectures like VLIW...

And from neither a performance nor efficiency POV VLIW fared well. Itanium never beat the other architectures decisively on all scores (for some benchmarks yes, but not universally)

You can mostly blame GCC for that, because Joseph Fischer's compilers used on HP Labs ought to have been fantastic for Itanium. They sure were for Trace/xx. (As far as History goes; I never touched any of those!)

and Transmetas CPUs turned out to be not better from a power/ performance perspective than Intels CPUs once Intel started to optimize for power.

Well, no wonder, they had the overhead of decoding x86 instead of exposing their actual architecture!

And as x86 development costs can be shared across mobile, desktop and server CPUs, it just has the most R&D money available which puts it at an advantage over other CPU architectures.

The fact that it has economies of scale in its favour doesn't mean x86/ x64 is the ultimate computer architecture. As GPGPUs came to demonstrate. (...that 70's style transputers would become the salvation board for computational performance race 40 years later :))

Using "dedicated" hardware like GPUs for high-width vector processing is the better solution here IMO. For vectorizable algorithms they can deliver 2-4x times the MIPS/Watt than CPUs. And as most really compute-intensive workloads happen to be vector- friendly, that is a basically perfect solution. Now we just need an OpenCL port and drivers for Haiku. :-)

Should we start lobbying at Intel and AMD now? :)

I remember reading somewhere that both had some kind of informal partnerships with Be ans used BeOS as a testing platform when they needed a low-overhead but still practical OS to run media applications and the like... We can sure make the case for them to contribute accelerated libraries to Haiku, as we don't demand GPL'd code and all that. Binary blobs even.

Not ideal from FSF's point of view, but our ideals for Haiku both as OS builders and end users actually point elsewhere anyway, right?


Cheers,
A.

Other related posts: