[haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32

From: Christian Packmann <Christian.Packmann@xxxxxx>
To: haiku-development@xxxxxxxxxxxxx
Date: Tue, 16 Jun 2009 17:12:42 +0200

Stephan Assmus - 2009-06-16 15:16 :

Hm, it appears the optimized code is already about twice as fast as theplain C code for pretty much every architecture that was benchmarked. Sowhat I would love to see is a patch against app_server which integratesthis code so I can watch movies fullscreen with smooth scaling. If the codecan later be made even faster, nice, but it's darn useful already. Or wouldbe if there were a patch. ;-)

Impatient, are we? :-) But you're right, it's time to think aboutintegration of the code.

As a first step I would only add the simple MMX/SSE routine, addition ofother variants should be easier to implement/test once the frameworkchanges for this have been done.

However, I think it's better if you do the actual integration into thesource tree, as I don't have experience with any wider changes this mayrequire. And as the current Haiku revisions only boot in safemode nativelyon my machine, I'll have trouble testing the integration properly, anyway.:-) But I will provide any actual required code/patches, you'll just haveto integrate/test the code - I hope that is okay.


Integration of the simple MMX/SSE routine will require the following:

- include the assembly source file into the tree and Jamfile, make sure itbuilds okay. As yasm has been added to build requirements, this shouldpose no problem.We also have to decide on a naming scheme for assembly files to easefuture additions of other assembly routines. I'd strongly advise keepingthe different assembly variants for each supported C function in aseparate file, it's confusing enough to have all the different variants inone file, mixing code for different routines would be nightmarish.

Hm, "painter_bilinearcopy_simd.asm" for this code?

- we need CPU identification code and storage for the results. The CPU idshould be done once at startup and the results stored in global vars ormade available via global methods. This should eventually be done in thekernel, but AFAIK there are no such functions yet.So we could integrate this into app_server as an intermediate step; thiscould be replaced by kernel functionality once that is implemented. I havecode for doing the identification, which is being developed and testedtogether with the benchmark. SIMD detection seems to work reliably enoughfor production code.However, I don't know where this should be added within app_server, anysuggestion?

- modification of _DrawBitmapBilinearCopy32() to call the MMX/SSE versionif the CPU supports that. I can write that code, you'll just have tointegrate it.

If that plan is okay with you, I'll start with the adaption of therequired code segments.


Christian

Follow-Ups:
- [haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32
  - From: Stefano Ceccherini

References:
- [haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32
  - From: Christian Packmann
- [haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32
  - From: Adam K Kirchhoff
- [haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32
  - From: Christian Packmann
- [haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32
  - From: Urias McCullough
- [haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32
  - From: Christian Packmann
- [haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32
  - From: André Braga
- [haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32
  - From: Stephan Assmus

[haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32

Other related posts: