[haiku-development] Re: Optimizing Painter::_DrawBitmapBilinearCopy32

  • From: Stephan Assmus <superstippi@xxxxxx>
  • To: haiku-development@xxxxxxxxxxxxx
  • Date: Tue, 16 Jun 2009 15:16:49 +0200

On 2009-06-16 at 15:09:06 [+0200], André Braga <meianoite@xxxxxxxxx> wrote:
> Em 16/06/2009, às 09:14, Christian Packmann  
> <Christian.Packmann@xxxxxx> escreveu:
> > The SSE2/SSSE3 routines are also improved. Of the unrolled versions 
> > only the SSSE3 variant is finished, the MMX and SSE2 variants need more 
> > work. I'm sceptical that they will yield much improvement, anyway; the 
> > unrolled SSSE3 only gives 14% more performance than the unrolled 
> > version, I don't think improvements will be much greater for MMX/SSE2, 
> > but maybe some CPUs will perform well on them.
> Just for kicks, could you compile a static .o for AMD64 that we could 
> then link to produce an executable for a 64-bit OS of choice? I'd like to 
> see what GCC4.2+ manage to do to your code with extra registers, 
> optimization levels and autovectorization switches.
> Also, I see that you have SSSE3 versions for the routines, but why not 
> SSE3 "plain" with 33.33% less S? :)
> No useful added functionality in those 13 extra instructions compared to 
> what you're already doing in SSE2?

Hm, it appears the optimized code is already about twice as fast as the 
plain C code for pretty much every architecture that was benchmarked. So 
what I would love to see is a patch against app_server which integrates 
this code so I can watch movies fullscreen with smooth scaling. If the code 
can later be made even faster, nice, but it's darn useful already. Or would 
be if there were a patch. ;-)

Best regards,

Other related posts: