[haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32

  • From: Christian Packmann <Christian.Packmann@xxxxxx>
  • To: haiku-development@xxxxxxxxxxxxx
  • Date: Tue, 16 Jun 2009 17:12:42 +0200

Stephan Assmus - 2009-06-16 15:16 :
Hm, it appears the optimized code is already about twice as fast as the plain C code for pretty much every architecture that was benchmarked. So what I would love to see is a patch against app_server which integrates this code so I can watch movies fullscreen with smooth scaling. If the code can later be made even faster, nice, but it's darn useful already. Or would be if there were a patch. ;-)

Impatient, are we? :-) But you're right, it's time to think about integration of the code.

As a first step I would only add the simple MMX/SSE routine, addition of other variants should be easier to implement/test once the framework changes for this have been done.

However, I think it's better if you do the actual integration into the source tree, as I don't have experience with any wider changes this may require. And as the current Haiku revisions only boot in safemode natively on my machine, I'll have trouble testing the integration properly, anyway. :-) But I will provide any actual required code/patches, you'll just have to integrate/test the code - I hope that is okay.

Integration of the simple MMX/SSE routine will require the following:

- include the assembly source file into the tree and Jamfile, make sure it builds okay. As yasm has been added to build requirements, this should pose no problem. We also have to decide on a naming scheme for assembly files to ease future additions of other assembly routines. I'd strongly advise keeping the different assembly variants for each supported C function in a separate file, it's confusing enough to have all the different variants in one file, mixing code for different routines would be nightmarish.
Hm, "painter_bilinearcopy_simd.asm" for this code?

- we need CPU identification code and storage for the results. The CPU id should be done once at startup and the results stored in global vars or made available via global methods. This should eventually be done in the kernel, but AFAIK there are no such functions yet. So we could integrate this into app_server as an intermediate step; this could be replaced by kernel functionality once that is implemented. I have code for doing the identification, which is being developed and tested together with the benchmark. SIMD detection seems to work reliably enough for production code. However, I don't know where this should be added within app_server, any suggestion?

- modification of _DrawBitmapBilinearCopy32() to call the MMX/SSE version if the CPU supports that. I can write that code, you'll just have to integrate it.


If that plan is okay with you, I'll start with the adaption of the required code segments.

Christian

Other related posts: