[haiku-development] Re: On timeslices and cycles

André Braga - 2009-03-13 13:03 :
Em 13/03/2009, às 08:12, Christian Packmann <Christian.Packmann@xxxxxx> escreveu:

but there already exists research on this regard for desktop products,
Hm, what designs? Apart from graphics, sound, PhysX and RAID cards, I mean.

http://news.cnet.com/Secret-recipe-inside-Intels-latest-competitor/2100-1006_3-6230748.html?part=rss&tag=2547-1_3-0-20&subj=news

Bought by Sun last April. Maybe some of their ideas will show up in the
Niagara series, where this might even make sense. As Niagara is basically designed for high-latency high-throughput webserver loads, having two types of cores may be useful depending on load. And AFAIK Niagara is only
used with Solaris, so writing a custom scheduler for their own CPUs would
be no problem for Sun. This is not comparable to general-purpose systems which have to deal with highly variant types of thread loads, though.

Researchers are often a bit crazy, you know. :)

Yeah. And most of their ideas amount to nothing in the end. if something like this would've come from IBM or Intel, I would have been interested. I'm certain they look at these kind of ideas, too. And for the most part they find that it isn't really useful, they forget about the $100e6 they invested for the examination, and go on with business.

I've grown pretty tired of all these "great" ideas that are often hyped in the press. Show me working silicon with clear performance and performance/watt advantages, or sod off (not you personally, the companies which produce that hype).

(please ignore the article trying to compare it to Cell; two very different beasts.)

Er, yes. CNet.

and there *are* such products for embedded markets.
See above. Resources will usually be handled by one master CPU running the OS to perform specific tasks on the slave processors. This doesn't affect the OS itself.

True; but I'm sure you meant a master process on the "regular" CPU firing specialized tasks on the special processors.

Yes.

This is a little different situation where the slave processor has the same ISA as the master, the difference being in speed, number of processing units etc.

Yeah, I'd forgotten that these same-ISA-but-asymmetrical-core ideas are
tossed around. I don't believe they're useful for general-purpose
platforms, because they try to solve a software problem (lack of proper
multi-threading in applications) with a hardware solution. This is a just a workaround for the real problem - software which has not adapted to the new reality of heavily multi-threaded execution environments.

The software side will have to be solved, because we're not ever going back to big increases in the speed of single cores. And the only way forward on the hardware side is creating many cores with SMT to provide higher computational power. Once software has adapted to a proper multi-threaded approach, symmetrical systems with many slow cores will work just as well as any AMP system.

And the AMP systems would run into real problems running symmetrical workloads, e.g. a number of worker-threads which are equal in their demands for execution resources. Depending on the nature of the different cores, it might be impossible to do proper allocation of timeslices, because the IPC would be very different for the different core types. You'd have to reschedule threads between full and crippled cores all the time. Depending on the granularity of thread data exchange, this alone might lead to significant performance drops. If you'd use very fine-grained rescheduling, you'd play havoc on cache efficiency. And so on. No clean solution to this one IMO.

Furthermore it is unclear if such systems would actually save power; this depends on the average speed of the different cores. This is not easy to judge and needs extensive power-consumption benchmarks. The German magazine c't did such benchmarks for various x86 CPUs a while ago and found that the "power-saving" VIA chips actually use more power/workload when real computations are done - because they take so much longer to perform a single task than the "high-power" CPUs. As long as normal symmetrical CPUs continue to improve their idle power consumptions, I don't see any distinct advantages in creating complicated and hard-to-use CPU designs which, at the end of the day, won't perform better than traditional CPUs which are highly optimized to perform their tasks quickly and efficiently. Just KISS.

Probably not in x86 space. The

But Haiku won't be limited to x86 only, will it? :)

Of course not. :) But I think that the general principles will apply to all CPU architectures. Of course there may be embedded designs which use AMP to great effect; but that's no problem in embedded space, as you write your software specifically for one chip, or even design the chip to fit the requirements of your software. This is in no way comparable to desktop applications with unpredictable workloads, which depend entirely on the wishes of the user.

(OF COURSE I'm not implying I'm complicating the hell out of the scheduler design in order to support technology that's unavailable for the next 5 years or something; only that this is also a possibility to take into consideration when designing a topology-aware load balancer.)

I agree that a flexibility in design is useful. But making it too flexible may introduce such problems of complexity that you either never finish it, or that the scheduler will not perform well. Even good support for the "mainstream case" of SMT many-cores is hard enough to implement properly, especially if you want to take cases like NUMA, asymmetrical L2 caches on the Core2-Quads, 4-way SMT, etc. into account.

The Cool'n'Quiet drivers wouldn't ramp running cores up fast enough, leading to inconsistent runtime behavior. AFAIK AMD has removed this feature from Shanghai/Phenom II because of these problems.

Still there. And works just as badly when they're run on an AM2 socket (as opposed to AM2+).

Ah, I didn't know it was only an AM2 problem.

It would probably be better to dynamically shut off the idling cores, at least partially, like PA-Semi did with its PowerPC designs. I guess that Intel/AMD will look into this; the results PA- Semi achieved were really impressive.

Being considered as well. But this crosses the boundaries of a thread scheduler and becomes the realm of a power daemon. Of course, designing the data structures so that it's easy to migrate whole groups of threads helps a bit in this case.

Uh, I wasn't thinking about software. PA-Semis CPUs used aggressive
power-saving features which switched off unused parts of the core, under
the CPUs control only. That's why they reached record-breaking
performance/watt levels with their full OOOE PPC cores. I'd guess that
Intel and AMD will use such techniques in the future as well. The
32nm-Atom is supposed to cut idle power by a factor of 100x-1000x compared
to the 45nm version. These things will probably make their way up the line
too.

Of course, bundling threads optimally on a few cores helps with activating
such sleep modes. But not using all cores also implies a
performance-penalty in most cases, as the L1 (and possibly L2) caches on the idling chips go unused, which is a waste of resources. There should be user preferences to adjust the scheduling behavior: "full performance" and "power saving". And maybe steps in between.

Christian

Other related posts: