[haiku-development] Re: On timeslices and cycles

From: Christian Packmann <Christian.Packmann@xxxxxx>
To: haiku-development@xxxxxxxxxxxxx
Date: Sat, 14 Mar 2009 12:47:18 +0100

André Braga - 2009-03-13 13:03 :

Em 13/03/2009, às 08:12, Christian Packmann<Christian.Packmann@xxxxxx> escreveu:

but there already exists research on this regard for desktopproducts,
Hm, what designs? Apart from graphics, sound, PhysX and RAID cards,I mean.
http://news.cnet.com/Secret-recipe-inside-Intels-latest-competitor/2100-1006_3-6230748.html?part=rss&tag=2547-1_3-0-20&subj=news


Bought by Sun last April. Maybe some of their ideas will show up in the

Niagara series, where this might even make sense. As Niagara is basicallydesigned for high-latency high-throughput webserver loads, having twotypes of cores may be useful depending on load. And AFAIK Niagara is only

used with Solaris, so writing a custom scheduler for their own CPUs would

be no problem for Sun. This is not comparable to general-purpose systemswhich have to deal with highly variant types of thread loads, though.

Researchers are often a bit crazy, you know. :)

Yeah. And most of their ideas amount to nothing in the end. if somethinglike this would've come from IBM or Intel, I would have been interested.I'm certain they look at these kind of ideas, too. And for the most partthey find that it isn't really useful, they forget about the $100e6 theyinvested for the examination, and go on with business.

I've grown pretty tired of all these "great" ideas that are often hyped inthe press. Show me working silicon with clear performance andperformance/watt advantages, or sod off (not you personally, the companieswhich produce that hype).

(please ignore the article trying to compare it to Cell; two verydifferent beasts.)


Er, yes. CNet.

and there *are* such products for embedded markets.
See above. Resources will usually be handled by one master CPUrunning the OS to perform specific tasks on the slave processors.This doesn't affect the OS itself.
True; but I'm sure you meant a master process on the "regular" CPUfiring specialized tasks on the special processors.


Yes.

This is a little different situation where the slave processor has thesame ISA as the master, the difference being in speed, number ofprocessing units etc.


Yeah, I'd forgotten that these same-ISA-but-asymmetrical-core ideas are
tossed around. I don't believe they're useful for general-purpose
platforms, because they try to solve a software problem (lack of proper

multi-threading in applications) with a hardware solution. This is a justa workaround for the real problem - software which has not adapted to thenew reality of heavily multi-threaded execution environments.

The software side will have to be solved, because we're not ever goingback to big increases in the speed of single cores. And the only wayforward on the hardware side is creating many cores with SMT to providehigher computational power. Once software has adapted to a propermulti-threaded approach, symmetrical systems with many slow cores willwork just as well as any AMP system.

And the AMP systems would run into real problems running symmetricalworkloads, e.g. a number of worker-threads which are equal in theirdemands for execution resources. Depending on the nature of the differentcores, it might be impossible to do proper allocation of timeslices,because the IPC would be very different for the different core types.You'd have to reschedule threads between full and crippled cores all thetime. Depending on the granularity of thread data exchange, this alonemight lead to significant performance drops. If you'd use veryfine-grained rescheduling, you'd play havoc on cache efficiency. And soon. No clean solution to this one IMO.

Furthermore it is unclear if such systems would actually save power; thisdepends on the average speed of the different cores. This is not easy tojudge and needs extensive power-consumption benchmarks. The Germanmagazine c't did such benchmarks for various x86 CPUs a while ago andfound that the "power-saving" VIA chips actually use more power/workloadwhen real computations are done - because they take so much longer toperform a single task than the "high-power" CPUs. As long as normalsymmetrical CPUs continue to improve their idle power consumptions, Idon't see any distinct advantages in creating complicated and hard-to-useCPU designs which, at the end of the day, won't perform better thantraditional CPUs which are highly optimized to perform their tasks quicklyand efficiently. Just KISS.

Probably not in x86 space. The


But Haiku won't be limited to x86 only, will it? :)

Of course not. :) But I think that the general principles will apply toall CPU architectures. Of course there may be embedded designs which useAMP to great effect; but that's no problem in embedded space, as you writeyour software specifically for one chip, or even design the chip to fitthe requirements of your software. This is in no way comparable to desktopapplications with unpredictable workloads, which depend entirely on thewishes of the user.

(OF COURSE I'm not implying I'm complicating the hell out of thescheduler design in order to support technology that's unavailable forthe next 5 years or something; only that this is also a possibility totake into consideration when designing a topology-aware load balancer.)

I agree that a flexibility in design is useful. But making it too flexiblemay introduce such problems of complexity that you either never finish it,or that the scheduler will not perform well. Even good support for the"mainstream case" of SMT many-cores is hard enough to implement properly,especially if you want to take cases like NUMA, asymmetrical L2 caches onthe Core2-Quads, 4-way SMT, etc. into account.

The Cool'n'Quiet drivers wouldn't ramp running cores up fast enough,leading to inconsistent runtime behavior. AFAIK AMD has removed thisfeature from Shanghai/Phenom II because of these problems.
Still there. And works just as badly when they're run on an AM2 socket(as opposed to AM2+).


Ah, I didn't know it was only an AM2 problem.

It would probably be better to dynamically shut off the idlingcores, at least partially, like PA-Semi did with its PowerPCdesigns. I guess that Intel/AMD will look into this; the results PA-Semi achieved were really impressive.
Being considered as well. But this crosses the boundaries of a threadscheduler and becomes the realm of a power daemon. Of course,designing the data structures so that it's easy to migrate wholegroups of threads helps a bit in this case.


Uh, I wasn't thinking about software. PA-Semis CPUs used aggressive
power-saving features which switched off unused parts of the core, under
the CPUs control only. That's why they reached record-breaking
performance/watt levels with their full OOOE PPC cores. I'd guess that
Intel and AMD will use such techniques in the future as well. The
32nm-Atom is supposed to cut idle power by a factor of 100x-1000x compared
to the 45nm version. These things will probably make their way up the line
too.

Of course, bundling threads optimally on a few cores helps with activating
such sleep modes. But not using all cores also implies a

performance-penalty in most cases, as the L1 (and possibly L2) caches onthe idling chips go unused, which is a waste of resources. There should beuser preferences to adjust the scheduling behavior: "full performance" and"power saving". And maybe steps in between.


Christian

References:
- [haiku-development] On timeslices and cycles
  - From: André Braga
- [haiku-development] Re: On timeslices and cycles
  - From: Christian Packmann
- [haiku-development] Re: On timeslices and cycles
  - From: André Braga

[haiku-development] Re: On timeslices and cycles

Other related posts: