SLIDE 1 CPU Inheritance Scheduling Bry an F
Sai Susa rla Computer Systems Lab
ry Depa rtment
Computer Science Universit y
Utah flux@cs.utah.edu http://www.cs.utah.edu/projects/flu x/ Octob er 30, 1996 1
SLIDE 2 Key Concepts Threads schedule each
b y donating the CPU using a directed yield p rimiti ve. One ro
scheduler thread p er p ro cesso r sources all CPU time. Kernel dispatcher manages threads, events, and CPU donation without making any scheduling p
decisions. 2
SLIDE 3 The Dispatcher Implements thread sleep, w ak eup, schedule, etc. Runs in the context
currently running thread. Has no notion
thread p rio rit y , CPU usage, clo cks,
timers. Dispatcher w ak es a scheduler thread when:
client blo cks.
interest to the scheduler
3
SLIDE 4 Scheduling Example
Port CPU thread Scheduler Running thread Ready threads Waiting thread Scheduler donation CPU App 2 App 1 queues Ready scheduling requests
4
SLIDE 5 The schedule()
eration schedule(thr ead , port, sensitivity ) Sensitivit y levels:
BLOCK: W ak e the scheduler any time its client thread blo cks.
SWITCH: W ak e the scheduler
when a dierent client is requesting the CPU.
CONFLICT: W ak e the scheduler
when t w
mo re clients a re runnable at the same time. 5
SLIDE 6 Implicit Donation W
lik e schedule(), except done implici tl y; e.g.:
attempting to lo ck a held mutex donates to current
thread donates to server thread fo r the duration
an RPC Analogous to p rio rit y inheritance in traditional systems.
(high-priority) T0 CPU S0 T1 (low-priority)
6
SLIDE 7 Multip ro cesso r Scheduling
Ready Scheduler threads CPU 1 CPU 0 Scheduler App 2 App 1 queues
7
SLIDE 8 Benets
rchical, stack able scheduling p
ecic scheduling p
dula r CPU usage control
p rio rit y inheritance
CPU usage accounting
extends to multip ro cesso rs
p ro cesso r anit y p
and scheduler activations 8
SLIDE 9 Protot yp e Implementation Implemented as a fancy threads pack age in a BSD p ro cess. Schedulers implemented:
p rio rit y round-robin and FIF O
monotonic
9
SLIDE 10 Scheduling Hiera rchy
Round-robin Real-time Scheduler Rate-monotonic Root Scheduler Fixed-priority FIFO Scheduler Non-preemptive threads Cooperating Real-time periodic threads Java applet threads
RM2 LS1 JAVA1 JAVA2 FIFO1 RM1 FIFO2 RR1 RR2
Timesharing Class Background Web browser Lottery scheduling Lottery scheduling
10
SLIDE 11 Results Three measures:
b ehavio r (co rrectness)
complexit y 11
SLIDE 12
Multi-policy Scheduling Behavior
0.5 1 1.5 2 2.5 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100
Time (clock ticks) Accumulated CPU usage (sec)
Rate-monotonic thread 1 (50%) Rate monotonic thread 2 (25%) Lottery thread (Interactive - bursty) Round-robin thread 1 (Insatiable) Round-robin thread 2 (Insatiable)
RM1 (50%) RM2 (25%) LS1 (burst) RR1 (compute) RR2 (compute)
SLIDE 13
Modular Control of CPU Usage
Round-robin thread 2 Round-robin thread 1 FIFO thread 2 FIFO thread 1 Applet thread 2 Applet thread 1
10 20 30 40 50 60 70 80 90 100 200 600 1000 1400 1800 2200 2600 3000 3400 3800 4200 4600 5000 5400 5800 6200 6600 7000 7400 7800 8200 8600 9000 9400 9800
Time (clock ticks) Relative CPU time allocation (percent)
SLIDE 14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 10 20 30 40 50 60 70
Number of occurrences
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Mutex lock latency for real-time thread (clock ticks)
Real-time Scheduling Behavior
CPU donation on mutex contention No CPU donation
SLIDE 15 P erfo rmance
{ Base cost { Sensitivit y to hiera rchy depth
switching
{ Numb er
additional context switches { Cost
context switches 15
SLIDE 16 Dispatcher Micro-b enchma rks Scheduling Hiera rchy Depth Dispatch Time (s) Ro
scheduler
8.0 2-level scheduling 11.2 3-level scheduling 14.0 4-level scheduling 16.2 8-level scheduling 24.4 16
SLIDE 17 Context switch
p rotot yp e, measure what p rop
context switches a re to scheduler threads (i.e., extra)
a real OS, measure rate
context switches in va rious w
loads
slo wdo wn in t w
based
ex- p ected rate and sp eed
context switches 17
SLIDE 18 Context Switches fo r Simple T ests Client/ P a rallel Real- General Server Database time RM1 57 322 101 RM2 19 26 RM3 19 LS1 25 622 17 JA V A1 46 FIF O1 9 RR1 114 238 249 7 RR2 3 242 14 RR3 234 RR4 243 User invo cations 492 957 1193 165 Ro
scheduler 262 956 1237 142 Rate monotonic 43 1 65 Lottery scheduler 30 57 3 Applet scheduler 2 FIF O scheduler 1 Round-robin sched 8 8 8 Scheduler invo c. 346 956 1303 218 T
csw 838 1913 2496 383 Scheduler % 41% 50% 52% 56% 18
SLIDE 19 Statistics fo r Common Application s gzip gcc tar configure Run time (sec) 26.4 35.3 9.6 26.0 Context switches/sec 11 32 81 202 T raps/sec 10 562 22 3470 System calls/sec 23 651 517 1807 Device interrupts/sec 427 509 3337 1055 19
SLIDE 20
2 4 6 8 10 1 10 100 1000 Overall slowdown (percent) Additional overhead per context switch (microsec) Microkernel:configure (13000 csw/s) Microkernel:gcc (3500 csw/s) Microkernel:gzip (930 csw/s) FreeBSD:configure (202 csw/s) FreeBSD:gcc (32 csw/s) FreeBSD:gzip (11 csw/s)
20
SLIDE 21 Co de Complexit y
550 ra w, 160 lines
semicolons
schedulers: each is 100{200 semicolons 21
SLIDE 22 Related W
Existing multi-p
cy systems:
ass systems: Mach, NT
Exok ernel 22
SLIDE 23 Related W
Existing hiera rchical scheduling p
OS meters
scheduling
rt-tim e F air Queuing (SF Q) CPU inheritance scheduling is not a p
. 23
SLIDE 24 Status W
but needs to b e tried in a real OS Fluk e k ernel implementation in p rogress Source fo r p rotot yp e will b e available from the OSDI and Flux p roject w eb pages: http://www.cs.utah.edu/projects/fl ux/ 24
SLIDE 25 Conclusion CPU inheritance scheduling:
exible CPU scheduling, and sup- p
many existing p
and mecha- nisms
ecient enough fo r common uses
straightfo rw a rd to implement (in user mo de)
the Fluk e nested p ro cess mo del 25