> Okay. I appreciate your benchmarks but you haven't proven me > right or wrong. Glad to hear it -- because doing neither was my intent. ;) I only brought that up because I've found benchmarking with video-intensive applications to be quite hit-and-miss -- depending on how much time is spent transferring data to the video card, whether the transfer takes place in a separate thread (e.g., BBitmap drawing vs DirectWindow), etc. For comparison, with Chart on R5, I get 155 frames per second with the star count at maximum, all the star colours enabled, the requested FPS set to 600, the drawing mode set to BBitmap (no other option on a dual-head Matrox card) with 16 bit colour, "slow rotation" selected, and the window size set by clicking the zoom box. Enabling two-thread mode brings it up to 169 frames per second; a 9% increase. Looking at ProcessController, almost all of the CPU time is being spent inside a single thread in app server (presumably handling BBitmap painting), which is wiping out the benefits of using two threads for me. DirectWindow should fix that where available. All I'm really saying is that the time wasted by the CPUs, waiting to write to the video card, can be very severe in visual demos. The problem is a lot less noticeable on a PCIe x16 card, but on a PCIe x1 card (and to a lesser extent standard PCI), it doesn't matter how many CPUs are generating data; it's all got to be crammed down a tiny shared channel! There's another interesting thing that can be observed in Chart -- try partially obscuring the window. I don't know what happens in Haiku, but in R5, the performance drastically increases the more the window is hidden by other windows. This might only apply to BBitmap drawing mode, but it's interesting to see clipping making a measurable performance difference. > What you've shown is what happens when running multiple > applications on a multi-core system. What I was trying to point > out was running 1 highly multi-threaded application on a multi-core > system. Two different things. Fair enough, but multi-threaded applications also vary in terms of how much data-sharing takes place. For applications that split a large data set up into (megabyte-size) blocks, and have each thread process its own block, wouldn't you say that's pretty comparable to running separate apps? I can see that there'd be a difference in performance if each thread was working with a very small data set, but then I'd be concerned about how much time is wasted in locking and communication... > The perfect quad core benchmark would be running a single ( one ) > program which maxes out all 4 cores at 100%. Then we could disable > cores and bench how it scales. XaoS is the only real-world one I know of off the top of my head which actually does something productive with multiple heavy number-crunching threads, but there might be others. BeRometer includes some multithreaded tests too, but I'm not sure how well those reflect real-world usage. I did write a multi-threaded filter a few weeks ago as part of another project, so if you want to do some more benchmarking I could post the (very untidy) code somewhere? It uses the "split dataset into large blocks" technique, with no inter-thread communication apart from the "finished" semaphore, and has the advantage of being quite a bad (CPU-intensive) algorithm. I've not benchmarked it yet -- I did notice it was a bit quicker after adding multithreading, so that was enough. =P > When I ran 2 cores + 1 Thread I said 100% CPU load because that's > what Chart reported but I'm sure that was not correct. Actually, I've noticed something funny about Chart's CPU usage report too. If you select "Off" for animation (so the starfield is static), the frames per second and CPU usage reports are accurate -- the same as they would be when animation is turned on. But ProcessController reports almost 0% CPU usage across the whole system! (all four kernel idle threads near 100% too) That suggests that it's taking a measurement of drawing time just once, and then not updating the screen periodically. But the strange thing is that the display is still fluctuating slightly (suggesting ongoing measurements), and if you cover the window up, like before the frames per second count increases. But according to ProcessController it can't be doing much, if anything! Maybe it's updating but at a very low framerate? Either way it sounds like the measurements shown in Chart are fudged in some way. The best overall impression of real CPU usage seems to come from looking at the idle threads in ProcessController -- sometimes Pulse or ProcessController catch a thread mid-way between flipping cores or whatever so the report has artifacts, but the idle threads always seem to show exactly the inverse of CPU usage.