[haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32 (was: Re: ShowImage patch)

  • From: Christian Packmann <Christian.Packmann@xxxxxx>
  • To: haiku-development@xxxxxxxxxxxxx
  • Date: Fri, 13 Mar 2009 12:48:51 +0100

Stephen Deken - 2009-03-11 21:25 :
On Wed, Mar 11, 2009 at 11:59 AM, Christian Packmann <Christian.Packmann@xxxxxx <mailto:Christian.Packmann@xxxxxx>> wrote:

    So we can do
        (component_value * 129) >> 23
    to approximate the division by 65025.


I'll prefix this by saying I know only slightly more than nothing about MMX and SSE. But I do notice:

  (x * 129) >> 23 == (x >> 16) + (x >> 23)

That's just by definition, of course. Is that faster at all? (One more bit of accuracy could be eked out by adding x >> 31, but that's probably overkill.)

Nice, didn't realize this. But this would only be faster in integer code, and then only on some CPUs. MUL is very fast on most recent designs, and your version needs four instructions compared to two in my variant on x86 (x has to be duplicated into another register). It might be faster on K7 and Pentium III, but probably not on anything more modern.

Christian

Other related posts: