[haiku] Re: Haiku's SMP

  • From: "Cyan" <cyanh256@xxxxxxxxxxxx>
  • To: haiku@xxxxxxxxxxxxx
  • Date: Tue, 18 Nov 2008 10:14:46 GMT

> Okay.  I appreciate your benchmarks but you haven't proven me
> right or wrong.

Glad to hear it -- because doing neither was my intent. ;)

I only brought that up because I've found benchmarking with
video-intensive applications to be quite hit-and-miss -- depending
on how much time is spent transferring data to the video card,
whether the transfer takes place in a separate thread (e.g., BBitmap
drawing vs DirectWindow), etc.

For comparison, with Chart on R5, I get 155 frames per second with
the star count at maximum, all the star colours enabled, the
requested FPS set to 600, the drawing mode set to BBitmap (no other
option on a dual-head Matrox card) with 16 bit colour,
"slow rotation" selected, and the window size set by clicking the
zoom box.

Enabling two-thread mode brings it up to 169 frames per second;
a 9% increase. Looking at ProcessController, almost all of the CPU
time is being spent inside a single thread in app server (presumably
handling BBitmap painting), which is wiping out the benefits of using
two threads for me. DirectWindow should fix that where available.


All I'm really saying is that the time wasted by the CPUs, waiting
to write to the video card, can be very severe in visual demos.
The problem is a lot less noticeable on a PCIe x16 card, but on a
PCIe x1 card (and to a lesser extent standard PCI), it doesn't
matter how many CPUs are generating data; it's all got to be
crammed down a tiny shared channel!


There's another interesting thing that can be observed in Chart --
try partially obscuring the window. I don't know what happens in
Haiku, but in R5, the performance drastically increases the more
the window is hidden by other windows. This might only apply to
BBitmap drawing mode, but it's interesting to see clipping making
a measurable performance difference.



> What you've shown is what happens when running multiple 
> applications on a multi-core system.  What I was trying to point
> out was running 1 highly multi-threaded application on a multi-core
> system. Two different things.

Fair enough, but multi-threaded applications also vary in terms of
how much data-sharing takes place. For applications that split a
large data set up into (megabyte-size) blocks, and have each thread
process its own block, wouldn't you say that's pretty comparable
to running separate apps?

I can see that there'd be a difference in performance if each thread
was working with a very small data set, but then I'd be concerned
about how much time is wasted in locking and communication...



> The perfect quad core benchmark would be running a single ( one )
> program which maxes out all 4 cores at 100%.  Then we could disable
> cores and bench how it scales.

XaoS is the only real-world one I know of off the top of my head
which actually does something productive with multiple heavy
number-crunching threads, but there might be others. BeRometer
includes some multithreaded tests too, but I'm not sure how well
those reflect real-world usage.

I did write a multi-threaded filter a few weeks ago as part of
another project, so if you want to do some more benchmarking I could
post the (very untidy) code somewhere?
It uses the "split dataset into large blocks" technique, with no
inter-thread communication apart from the "finished" semaphore,
and has the advantage of being quite a bad (CPU-intensive) algorithm.

I've not benchmarked it yet -- I did notice it was a bit quicker
after adding multithreading, so that was enough. =P


> When I ran 2 cores + 1 Thread I said 100% CPU load because that's 
> what Chart reported but I'm sure that was not correct.

Actually, I've noticed something funny about Chart's CPU usage
report too. If you select "Off" for animation (so the starfield is
static), the frames per second and CPU usage reports are accurate --
the same as they would be when animation is turned on.
But ProcessController reports almost 0% CPU usage across the whole
system! (all four kernel idle threads near 100% too)

That suggests that it's taking a measurement of drawing time just
once, and then not updating the screen periodically. But the strange
thing is that the display is still fluctuating slightly (suggesting
ongoing measurements), and if you cover the window up, like before
the frames per second count increases.
But according to ProcessController it can't be doing much, if
anything! Maybe it's updating but at a very low framerate?

Either way it sounds like the measurements shown in Chart are fudged
in some way. The best overall impression of real CPU usage seems
to come from looking at the idle threads in ProcessController --
sometimes Pulse or ProcessController catch a thread mid-way between
flipping cores or whatever so the report has artifacts, but the idle
threads always seem to show exactly the inverse of CPU usage.

Other related posts: