[haiku-development] PixelConverter Results

  • From: David McPaul <dlmcpaul@xxxxxxxxx>
  • To: dlmcpaul@xxxxxxxxx
  • Date: Mon, 14 Sep 2009 11:05:18 +1000

Attached is a PNG of the spreadsheet I have collated the results in.
Nice to see so many helpful people and good to see a result slower
than my N270 Atom :-)

I would really like to replace the 2 V1 results with V3 results so if
you had the AMDX2 or X9650 please send me your latest results.

(The last 2 columns are the speedup over the C code and the speedup
from 1 to 2 CPUs for the SSE2 code)

The minimum speedup from C to SSE2 should be 4 as the code is
processing 4 times the data in parallel
We generally just beat that on the AMD CPUs and easily beat that on
the Intel CPUs  (AMDs first generation SSE2 engine was not that good
:-( )

Any other speedup comes from more efficient unpacking of YUV data and
packing of RGB data (That is basically the changes from V2 to V3).

As for scaling from 1 processor to 2 processors.  Although there is
some benefit, it quickly drops away as the process is limited by
memory bandwidth.

I am going to start looking at implementing the SSE2 code in libORC
(http://www.schleef.org/blog/2009/05/31/orc-040/) and see the
differences.

The eventual aim is to be able to implement a colour conversion node
in the media kit similar to how we have a audio format conversion
built in so video codecs like audio codecs can work in the best format
they need to.

-- 
Cheers
David



Other related posts: