[haiku-development] Re: On timeslices and cycles

From: Christian Packmann <Christian.Packmann@xxxxxx>
To: haiku-development@xxxxxxxxxxxxx
Date: Fri, 13 Mar 2009 11:28:30 +0100

André Braga - 2009-03-13 01:53 :

On Thu, Mar 12, 2009 at 20:58, Stephan Aßmus <superstippi@xxxxxx> wrote:

The human perception is a more or less fixed factor. I don't think anything
can be gained (ie. be made to appear more fluently) by switching more often,
unless you have a lot of threads actually running in parallel (ie not
waiting on something). So switching more often sounds like it would only
waste CPU.

CPU overhead itself should be the lesser problem on modern designs. Cachetrashing may have a bigger impact. Each running thread will fill the cachewith its own data; if you switch to soon, the cache will always bereloaded with different data instead of making efficient use of the cache.

While modern systems have high memory bandwidths which make this reloadingfast, cache sizes are growing as fast as memory bandwidth - we're quicklyheading for 10+MiB as standard 2nd level cache, and the amount willprobably increase to 32-64MiB in the next six years or so.

On the other hand, as far as cycles and IPS are concerned, a
milisecond on a 200MHz is a *lot* different than a milisecond on a
3GHz CPU. Not taking this into consideration if you boost thread
priorities based on consumed quantum is a *bad* idea.

Clock frequency doesn't matter in itself. Average IPC (instruction percycle) times clock frequency does. This varies wildly between differentmicroarchitectures. A dual-issue in-order design like Intel Atom shouldsee an average IPC of maybe 1.25-1.5; when running two threads, obviously<=1 as only two instructions can be dispatched per cycle, one for eachthread. A highly efficient out-of-order design like the Core2 shouldnormally achieve an IPC of 2.5, often higher for integer code (the corecan sustain 4 dispatch+retire per cycle). So comparing them at the samefrequency, you get a factor of >2x in performance difference alone.

It is also possible that Intel will push a design like Larrabee into theCPU sector at some point in time. Larrabee is in-order dual-issue with4-way SMT; with four threads, this would give an average IPC of <=0.5 perthread. Compared to Core2, this would give a difference factor of >5 percycle.

If something like your idea is implemented, the clock frequencies need tobe normalized against the CPU architecture. Not only on x86, but other CPUarchitectures as well; ARM also has in-order and out-of-order designs,even though the IPC doen't vary as wildly as it does on x86.

As for the CPUs which have different speed, I think it's also a concern for
Hyper Threading. You wouldn't want to schedule a thread on a second logical
core, if another physical core is readily available at the same time. So you
need some kind of speed-bonus associated with each CPU anyways.


I'm discussing this very matter on the article I'm writing. :)

First a nitpick: Hyper Threading is Intels trademarked name for its SMTimplementation on x86. It would be better if you'd call SMT SMT in ageneral discussion. :-) Eh, and SMT = Simultaneous Multi-Threading.

Efficient use of SMT CPUs is a problem in itself. I don't think this canbe elegantly solved on the OS side in current architectures (except POWER).For efficient use of SMT, you'd need to know if a thread is memory-boundor CPU-bound; two memory-bound threads on one core will perform verybadly, as they're competing for memory bandwidth. Two CPU-bound threadswill compete for execution resources and also perform badly. The idealsolution is to run one memory-bound and one CPU-bound thread on one CPU.


I can think of two approaches for this:

1. the ability to set CPU affinity for a thread, so that an applicationdeveloper can select the CPUs/thread layout on a CPU himself.2. adding a flag to thread spawning routines which indicate if a thread ismemory-bound, CPU-bound or general (i.e., a mixture of both).

Actually a mixture of both would be good. For some applications which canuse all cores, setting the CPU affinity is extremely useful to prevent"core hopping". This would go for e.g. Handbrake or other videotranscoders, which can load most CPUs fully - making thread reschedulingsuperfluous. This would also allow an application to maximize cache usageon systems with asymmetrical cache (Intels Core2 quads have two 2nd levelcaches, one for core0+1, one for core 2+3). If the threads of a specificapplication could benefit from a peculiar thread/CPU affinity because somethreads share lots of data while others don't, fixed CPU affinity couldoptimize the performance, which the OS never can do having to look atthreads as "black boxes".

Oh, and fixing CPU affinity would also allow me for writing properbenchmarking tools without having to worry about core hopping, so I'm notquite neutral in this matter. ;-)


Christian

Follow-Ups:
- [haiku-development] Re: On timeslices and cycles
  - From: Axel Dörfler
- [haiku-development] Re: On timeslices and cycles
  - From: André Braga
- [haiku-development] Re: On timeslices and cycles
  - From: Danny Robson

References:
- [haiku-development] On timeslices and cycles
  - From: André Braga
- [haiku-development] Re: On timeslices and cycles
  - From: Danny Robson
- [haiku-development] Re: On timeslices and cycles
  - From: André Braga
- [haiku-development] Re: On timeslices and cycles
  - From: Stephan Aßmus
- [haiku-development] Re: On timeslices and cycles
  - From: André Braga

[haiku-development] Re: On timeslices and cycles

Other related posts: