[haiku-development] Re: Scheduler algorithm improvements

From: Mikhail Panasyuk <otoko@xxxxxxxxx>
To: haiku-development@xxxxxxxxxxxxx
Date: Fri, 27 Nov 2009 18:18:46 +0300
Hello, All.

This time I have something to share with you.

Experimental schedulers (based on r33922):
http://otoko.narod.ru/files/haiku/haiku_kernel_schedulers.zip

Updated SOS Tools:
http://otoko.narod.ru/files/haiku/sos_tools.zip


There are 3 types of schedulers in the package: Haiku-fast, BeOS-fast and Fair.


Haiku-fast and BeOS-fast are optimised schedulers with original Haiku (skip 
threads 20% of time) and BeOS-like (skip 1/(2^n) of time, n - difference 
between priorities) CPU time-share distribution.

Most of time these schedulers are as fast as original Haiku. In very simple 
cases they are tiny bit slower and WAY faster (up to ~200x times) when there 
are lots of threads in run queue.


sos_emulator -n 4095 -p 1 3 -r "2:1:*" -q 10000 -Q 3000 -P 0 -s -c

-a "Haiku2":

  Task                  Time    Share (%)
-------------------------------------------
  Scheduler           592429     1.936522
  Others            30000000    98.063478

  Amount: Numbers       Priority       Quantums    Share (%)   Each* (%)
--------------------------------------------------------------------------
       2: 0 - 1         1                   427     4.270000    2.135000
    4093: 2 - 4094      3                  9573    95.730000    0.023389

-a "Haiku2fast":

  Task                  Time    Share (%)
-------------------------------------------
  Scheduler             2565     0.008549
  Others            30000000    99.991451

  Amount: Numbers       Priority       Quantums    Share (%)   Each* (%)
--------------------------------------------------------------------------
       2: 0 - 1         1                  1989    19.890000    9.945000
    4093: 2 - 4094      3                  8011    80.110000    0.019572


sos_emulator -n 4 -p 1 3 -r "2:1:*" -q 10000 -Q 3000 -P 0 -s -c

-a "Haiku2":

  Task                  Time    Share (%)
-------------------------------------------
  Scheduler             1797     0.005990
  Others            30000000    99.994010

  Amount: Numbers       Priority       Quantums    Share (%)   Each* (%)
--------------------------------------------------------------------------
       2: 0 - 1         1                   435     4.350000    2.175000
       2: 2 - 3         3                  9565    95.650000   47.825000

-a "Haiku2fast":

  Task                  Time    Share (%)
-------------------------------------------
  Scheduler             2465     0.008216
  Others            30000000    99.991784

  Amount: Numbers       Priority       Quantums    Share (%)   Each* (%)
--------------------------------------------------------------------------
       2: 0 - 1         1                  2044    20.440000   10.220000
       2: 2 - 3         3                  7956    79.560000   39.780000


When making optimised version of Haiku scheduler I've accidentally fixed bug of 
original: scheduler claimed to skip threads 20% of time but did not, sometimes 
(very often) it skips much less than 20% of time.

And there was another problem I've encountered and fixed. After thread 
insertion into run queue which is based on thread->next_priority value original 
scheduler does meaningless operation.

In scheduler_simple.cpp (102) and scheduler_affine.cpp (186):

        thread->next_priority = thread->priority;
        
Should be:

        thread->priority = thread->next_priority;

Also in scheduler_simple.cpp (122) and scheduler_affine.cpp (284):

                thread->priority = priority;

Should be:

                thread->priority = thread->next_priority = priority;

In some cases when next_priority and priority are not equal run queue messes 
itself up, not sorted anymore. Original scheduler still kind of "works" in such 
circumstances but mine is more picky because it synchronises additional data 
structures with run queue and expects certain things.

Can somebody please tell me what is the whole point of next_priority field if 
it's not used anywhere for anything that priority field alone would be enough 
for? There was no such field in NewOS...


Okay, back on track. All those optimisations are good but not very important 
for Haiku/BeOS-like schedulers. But they ARE crutial for the third scheduler - 
Fair.

Fair scheduler is an experimental scheduler which distributes CPU time more 
fairly. Threads with lower priority are not allowed to get more CPU time than 
threads with higher priority. Actually they can get more but only for very 
short period of time, in long run they won't.

Algorithm is a bit more complex and additional data needed. I do not go over 
run queue extracting all data needed for calculations each time reshedule 
function is called, instead I always keep track of additional data. Very few 
additional cycles wasted when adding and removing threads but much more 
economied when there is a need to select next thread to run. Actually in 
majority of cases there is time economy even during add/remove operations.

Decision is being made very quickly and it's guaranteed to be made in very 
small fixed amount of time (exact time depends on your CPU) which not vary much 
for cases when you select from 2 threads or 4096. Fair scheduler is ~1.5-2x 
times slower than Haiku/BeOS-fast and for the most part faster or as fast as 
original Haiku. With optimised schedulers (all 3 of them) it's safe to make 
quantum even less than 3 milliseconds. My Athlon64 1.8Ghz allows me 300 
microseconds without problems. I've tried even 30 microseconds but penalties 
were too much :P

Important note: Since I only have single-processor computer to test on and QEMU 
is the only emulator I know which can emulate more than one processor on such a 
computer but does it not well enough (not fair power distribution between 
virtual processors, very slow emulation in general) I can't fully optimise 
scheduler_affine. There are certain places (not in my code) which are potential 
bottlenecks, like steal_thread_from_other_cpus() function. All numbers provided 
apply for single-processor computers only.


sos_emulator -n 1000 -p 1 20 -r "2:3:15:4:*:1:6:2:*:4:20" -q 10000 -Q 3000 -P 0 
-s -c

-a "Haiku2":

  Task                  Time    Share (%)
-------------------------------------------
  Scheduler             6781     0.022598
  Others            30000000    99.977402

  Amount: Numbers       Priority       Quantums    Share (%)   Each* (%)
--------------------------------------------------------------------------
       2: 0 - 1         1                     0     0.000000    0.000000
      15: 2 - 16        5                     0     0.000000    0.000000
     479: 17 - 495      10                    1     0.010000    0.000021
       6: 496 - 501     12                   13     0.130000    0.021667
     478: 502 - 979     15                  399     3.990000    0.008347
      20: 980 - 999     20                 9587    95.870000    4.793500

-a "Haiku2fast":

  Task                  Time    Share (%)
-------------------------------------------
  Scheduler             2580     0.008599
  Others            30000000    99.991401

  Amount: Numbers       Priority       Quantums    Share (%)   Each* (%)
--------------------------------------------------------------------------
       2: 0 - 1         1                     8     0.080000    0.040000
      15: 2 - 16        5                    13     0.130000    0.008667
     479: 17 - 495      10                   65     0.650000    0.001357
       6: 496 - 501     12                  358     3.580000    0.596667
     478: 502 - 979     15                 1613    16.130000    0.033745
      20: 980 - 999     20                 7943    79.430000    3.971500

-a "BeOSfast":

  Task                  Time    Share (%)
-------------------------------------------
  Scheduler             2824     0.009412
  Others            30000000    99.990588

  Amount: Numbers       Priority       Quantums    Share (%)   Each* (%)
--------------------------------------------------------------------------
       2: 0 - 1         1                     1     0.010000    0.005000
      15: 2 - 16        5                     0     0.000000    0.000000
     479: 17 - 495      10                    9     0.090000    0.000188
       6: 496 - 501     12                   26     0.260000    0.043333
     478: 502 - 979     15                  266     2.660000    0.005565
      20: 980 - 999     20                 9698    96.980000    4.849000


Note 'Each* (%)' column:

-a "Fair2" (1/2):

  Task                  Time    Share (%)
-------------------------------------------
  Scheduler             4150     0.013831
  Others            30000000    99.986169

  Amount: Numbers       Priority       Quantums    Share (%)   Each* (%)
--------------------------------------------------------------------------
       2: 0 - 1         1                     3     0.030000    0.015000
      15: 2 - 16        5                    31     0.310000    0.020667
     479: 17 - 495      10                 1775    17.750000    0.037056
       6: 496 - 501     12                   45     0.450000    0.075000
     478: 502 - 979     15                 7534    75.340000    0.157615
      20: 980 - 999     20                  612     6.120000    0.306000

N = difference between priorities.

-a "FairN" (1/(N+1)):

  Task                  Time    Share (%)
-------------------------------------------
  Scheduler             4136     0.013785
  Others            30000000    99.986215

  Amount: Numbers       Priority       Quantums    Share (%)   Each* (%)
--------------------------------------------------------------------------
       2: 0 - 1         1                     0     0.000000    0.000000
      15: 2 - 16        5                     2     0.020000    0.001333
     479: 17 - 495      10                  607     6.070000    0.012672
       6: 496 - 501     12                   24     0.240000    0.040000
     478: 502 - 979     15                 7432    74.320000    0.155481
      20: 980 - 999     20                 1935    19.350000    0.967500


So how does Haiku perform with fast/'fair' schedulers as part of it's kernel?

- Haiku-fast and BeOS-fast are close to original one, as user I can't really 
feel any difference.

- Fair schedulers are different. When time share gap between priorities is too 
small or too big it affects user experience with OS in bad way (slow redraws, 
sound glitches, ...). Too big gap means low priority threads get almost 
nothing, too small gap means high priority often skipped and may not get enough 
timeshare for time-critical operations. Something in the middle is not bad.

1/(N+1) formula works well enough for me and it's selected by default. But you 
can select another one. There is certain #define just before 
select_next_thread_to_run() function:

#define SCHED_FORMULA(n) SCHED_FORMULA_N(n)

Just pick another formula from the list or add your own.


But in general Fair scheduler isn't good enough for this OS. Yes you can select 
formula which is good enough for the most cases but there are certain cases 
when any formula will be no good.

As countermeasure from high priority threads being skipped too often I've added 
rule "all threads of highest priority together should always get no less than 
certain percent of time share", certain percent = 50%.

Well, that helped. But still user can spawn enough threads of some priority 
(ex. B_NORMAL_PRIORITY) and cut off all threads which lie below that level, 
they will not recieve anything. System can even freeze because there are some 
important lower priority threads (ex. in kernel).

I've tried some additional restrictions and always there were bad cases which 
render them useless.


My conclusion is: you can't use fair schedulers in Haiku untill all priorities 
of system threads redefined. Or you should somehow protect important system 
threads from being cut off. Or implement some intelligent mechanism which would 
change priorities of CPU-hungry threads on-the-fly. Or give up "some" fairness.

That brings us back to traditional BeOS/Haiku schedulers. Yes they are no 
absolutely fair and have their own drawbacks but still they allow low priority 
threads to be executed and high priority threads are not skipped very often, 
reasonable balance maintained.



But fair scheduler isn't completely useless even now. Fair scheduler + 
sos_tester combo can be used for OS stress/debug. You can find race conditions, 
priority / buffer size problems etc. And they must be identified and fixed for 
OS to be perfectly stable and responsive.

IMHO, new better scheduler requires too much (at least for me alone) changes 
and research / tests to be done, can wait for post-R1.


Thank you for attention.

Mikhail.
Follow-Ups:
- [haiku-development] Re: Scheduler algorithm improvements
  - From: André Braga
- [haiku-development] Re: Scheduler algorithm improvements
  - From: Ingo Weinhold
- [haiku-development] Re: Scheduler algorithm improvements
  - From: Fredrik Holmqvist
References:
- [haiku-development] Scheduler algorithm improvements
  - From: Mikhail Panasyuk
- [haiku-development] Re: Scheduler algorithm improvements
  - From: Mikhail Panasyuk
- [haiku-development] Re: Scheduler algorithm improvements
  - From: Mikhail Panasyuk
[haiku-development] Re: Scheduler algorithm improvements

Other related posts: