Lottery Scheduling
fair share give every process a lottery ticket, more tickets for more priority
TIP
- random algs often avoid strange edge cases
- random algs are usually lightweight, and maintain little to no state
- random algs is useful when very fast decision are required, so long as generating a random number is fast
TIP
if you are in need of a mechanism to represent a proportion of ownership, try using the ticket
Ticket Mechanisms
- ticket currency: allow a user with a set of tickets to allocate tickets among their own jobs in whatever currency they like
User A -> 500 (A’s currency) to A1 -> 50 (global currency)
-> 500 (A’s currency) to A2 -> 50 (global currency)
User B -> 10 (B’s currency) to B1 -> 100 (global currency)
- ticket transfer: process can transfer its tickets temporarily. useful for client server cases, client asks server to do some ask and transfers its tickets (it can’t work in that time anyway)
- ticket inflation: if a process needs more cpu time it temporarily boosts its ticket value
Implementation
- good rng → difficult
- good process list data structure
- total number of tickets
simple ^_^
TIP
knowing which data structure to choose is a hallmark of good engineering. carefully consider its access patterns and frequency of usage
WARNING
we have not discussed how to assign tickets i.e. how to assign priority to a process
Stride Scheduling
a non-random approach to a fair-scheduler
basically run the process with the least progress so far. measure progress as inverse of ticket count
stride: calculate for each process pass value: progress tracker
curr = remove_min(queue);
schedule(curr);
curr->pass += curr->stride;
insert(queue, curr);
NOTE
lottery scheduling has a nice property in that their is no global state. therefore no extra memory, and it makes adding new processes so much easier. if a new process was added in a stride scheduler and had its pass value = 0, it would monopolize the cpu
The Linux Completely Fair Scheduler (CFS)
Basic Operation
fairly divide the cpu evenly among all competing processes. uses a counting based technique virtual runtime
as each process runs its accumulates vruntime
. Each processes’s vruntime
increases at the same rate, in proportion with real time. When a scheduling decision occurs, CFS picks the process with the vruntime
to run next.
QUESTION
how does the scheduler know when to context switch? what exactly is a time slice?
maintains various control parameters
sched_latency
: used to determine how long a process should run before considering a switch? basically create a dynamic time slice, usually done something like wheresched_latency
. Now what if ? that’s too small a time slicemin_granularity
: minimum value ofsched_latency
that can be assigned, typicallymin_granularity
=
NOTE
time slices are only for the context switches, the scheduler still has periodic timer interrupts from the hardware to precisely update
vruntime
Weighting (Niceness)
CFS enables control over process priority. nice level , with a default of . Each niceness is mapped to a weight, that allows us to compute the effective time slice of a process
weight of -20 = 88761, weight of +19 = 15 ⇒ ratio of 5900
thus vruntime
is also updated as
Using Red-Black Trees
only running or runnable processes are kept in the red-black tree. i/o operations etc move it out of the tree.
[1] https://www.cs.usfca.edu/~galles/visualization/RedBlack.html → red-black tree visualiser
[2] https://stackoverflow.com/questions/11224830/inside-the-linux-2-6-completely-fair-scheduler → “why not avl???”
Dealing with I/O and Sleeping Processes
if a process sleeps for a long time, then when it wakes up it will monopolise the CPU :((
instead of letting it keep its same vruntime
CFS will instead assign it the vruntime
of the minimum process in the RB tree (which only contains running processes) preventing starvation
however frequent sleeping processes often do not get their fair share of the CPU
[3] https://www.cs.utexas.edu/~EWD/ewd08xx/EWD831.PDF → why numbers should start from 0
[4] https://www.usenix.org/system/files/conference/atc18/atc18-bouron.pdf → battle of the schedulers (bsd vc linux)
[5] https://www.waldspurger.org/carl/papers/esx-mem-osdi02.pdf → memory resource management in VMware ESX server
[6] https://www.cs.unm.edu/~eschulte/data/bfs-v-cfs_groves-knockel-schulte.pdf → CFS vs BFS