Nick <tonestone57@xxxxxxxxxxx> wrote: > I used the Chart benchmark because it was quick, simple and CPU > intensive.. Only thing is that it supports 1 and 2 Threads meaning > it's only good for single and dual core benches & comparisons. > Chart is not good for testing quad core performance. Require app > with 4+ CPU intensive threads. Video encoder? I don't think any of the currently-available video encoders use multiple threads yet. One application that comes to mind offhand is XaoS -- a realtime fractal zooming app. I haven't tested it with Haiku, but under R5 it uses all four cores to accelerate zooming. Benchmarking is quite difficult really; there are so many variables. The old BeOS standby is usually Teapot, and when set to multiple- launch mode, several copies can be spawned (conducted in R5): One teapot, four cores: http://knothole.no-ip.org/Tea1 Two teapots, four cores: http://knothole.no-ip.org/Tea2 Three teapots, four cores: http://knothole.no-ip.org/Tea3 (note the imperfect balancing here -- partly due to R5's scheduler, possibly also due to teapot positions) Four teapots, four cores: http://knothole.no-ip.org/Tea4 Notice how the performance drops after each teapot is added, and the sharp drop after it goes from two to three teapots. Two reasons: Video bandwidth is very limited (PCIe x1 card) which is a resource shared by all four CPUs. And the memory bandwidth is also shared between all four CPUs. The sudden drop after two teapots is due to the cache architecture of the Intel Core 2 Quad -- each pair of CPUs shares a common cache, but the two pairs have independent caches. You can see the difference cache sharing makes by comparing: http://knothole.no-ip.org/Teacache1 (CPUs #1 and #2 enabled) http://knothole.no-ip.org/Teacache2 (CPUs #1 and #3 enabled) Different applications will exhibit dramatically different behaviour depending on how they access memory. The only algorithms which scale completely smoothly with the number of CPUs are those which fit entirely (with data) into the L1 cache of each core, which is really quite small on the Intel chips. So yeah, finding a good benchmark for SMP systems is going to be very difficult. The simplest option is to launch a bunch of CPU- intensive apps (preferably ones with low memory requirements and minimal video output) and measure the amount of slowdown with each extra application. If SMP is working properly, it should be better than half for each doubling of the number of instances (up to the number of CPUs), but how much better depends on very many factors.