CPU Inheritance Scheduling Bry an F o rd Sai Susa rla - - PDF document

cpu inheritance scheduling bry an f o rd sai susa rla
SMART_READER_LITE
LIVE PREVIEW

CPU Inheritance Scheduling Bry an F o rd Sai Susa rla - - PDF document

CPU Inheritance Scheduling Bry an F o rd Sai Susa rla Computer Systems Lab o rato ry Depa rtment of Computer Science Universit y of Utah flux@cs.utah.edu http://www.cs.utah.edu/projects/flu x/ Octob er 30, 1996 1


slide-1
SLIDE 1 CPU Inheritance Scheduling Bry an F
  • rd
Sai Susa rla Computer Systems Lab
  • rato
ry Depa rtment
  • f
Computer Science Universit y
  • f
Utah flux@cs.utah.edu http://www.cs.utah.edu/projects/flu x/ Octob er 30, 1996 1
slide-2
SLIDE 2 Key Concepts Threads schedule each
  • ther
b y donating the CPU using a directed yield p rimiti ve. One ro
  • t
scheduler thread p er p ro cesso r sources all CPU time. Kernel dispatcher manages threads, events, and CPU donation without making any scheduling p
  • licy
decisions. 2
slide-3
SLIDE 3 The Dispatcher Implements thread sleep, w ak eup, schedule, etc. Runs in the context
  • f
currently running thread. Has no notion
  • f
thread p rio rit y , CPU usage, clo cks,
  • r
timers. Dispatcher w ak es a scheduler thread when:
  • Scheduler's
client blo cks.
  • Event
  • f
interest to the scheduler
  • ccurs.
3
slide-4
SLIDE 4 Scheduling Example

Port CPU thread Scheduler Running thread Ready threads Waiting thread Scheduler donation CPU App 2 App 1 queues Ready scheduling requests

4
slide-5
SLIDE 5 The schedule()
  • p
eration schedule(thr ead , port, sensitivity ) Sensitivit y levels:
  • ON
BLOCK: W ak e the scheduler any time its client thread blo cks.
  • ON
SWITCH: W ak e the scheduler
  • nly
when a dierent client is requesting the CPU.
  • ON
CONFLICT: W ak e the scheduler
  • nly
when t w
  • r
mo re clients a re runnable at the same time. 5
slide-6
SLIDE 6 Implicit Donation W
  • rks
lik e schedule(), except done implici tl y; e.g.:
  • Thread
attempting to lo ck a held mutex donates to current
  • wner
  • Client
thread donates to server thread fo r the duration
  • f
an RPC Analogous to p rio rit y inheritance in traditional systems.

(high-priority) T0 CPU S0 T1 (low-priority)

6
slide-7
SLIDE 7 Multip ro cesso r Scheduling

Ready Scheduler threads CPU 1 CPU 0 Scheduler App 2 App 1 queues

7
slide-8
SLIDE 8 Benets
  • Hiera
rchical, stack able scheduling p
  • licies
  • Application-sp
ecic scheduling p
  • licies
  • Mo
dula r CPU usage control
  • Automatic
p rio rit y inheritance
  • Accurate
CPU usage accounting
  • Naturally
extends to multip ro cesso rs
  • Supp
  • rts
p ro cesso r anit y p
  • licies
and scheduler activations 8
slide-9
SLIDE 9 Protot yp e Implementation Implemented as a fancy threads pack age in a BSD p ro cess. Schedulers implemented:
  • Fixed
p rio rit y round-robin and FIF O
  • Rate
monotonic
  • Lottery
9
slide-10
SLIDE 10 Scheduling Hiera rchy

Round-robin Real-time Scheduler Rate-monotonic Root Scheduler Fixed-priority FIFO Scheduler Non-preemptive threads Cooperating Real-time periodic threads Java applet threads

RM2 LS1 JAVA1 JAVA2 FIFO1 RM1 FIFO2 RR1 RR2

Timesharing Class Background Web browser Lottery scheduling Lottery scheduling

10
slide-11
SLIDE 11 Results Three measures:
  • Scheduling
b ehavio r (co rrectness)
  • Overhead
  • Implementation
complexit y 11
slide-12
SLIDE 12

Multi-policy Scheduling Behavior

0.5 1 1.5 2 2.5 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100

Time (clock ticks) Accumulated CPU usage (sec)

Rate-monotonic thread 1 (50%) Rate monotonic thread 2 (25%) Lottery thread (Interactive - bursty) Round-robin thread 1 (Insatiable) Round-robin thread 2 (Insatiable)

RM1 (50%) RM2 (25%) LS1 (burst) RR1 (compute) RR2 (compute)

slide-13
SLIDE 13

Modular Control of CPU Usage

Round-robin thread 2 Round-robin thread 1 FIFO thread 2 FIFO thread 1 Applet thread 2 Applet thread 1

10 20 30 40 50 60 70 80 90 100 200 600 1000 1400 1800 2200 2600 3000 3400 3800 4200 4600 5000 5400 5800 6200 6600 7000 7400 7800 8200 8600 9000 9400 9800

Time (clock ticks) Relative CPU time allocation (percent)

slide-14
SLIDE 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 10 20 30 40 50 60 70

Number of occurrences

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Mutex lock latency for real-time thread (clock ticks)

Real-time Scheduling Behavior

CPU donation on mutex contention No CPU donation

slide-15
SLIDE 15 P erfo rmance
  • Dispatcher
  • verhead
{ Base cost { Sensitivit y to hiera rchy depth
  • Context
switching
  • verhead
{ Numb er
  • f
additional context switches { Cost
  • f
context switches 15
slide-16
SLIDE 16 Dispatcher Micro-b enchma rks Scheduling Hiera rchy Depth Dispatch Time (s) Ro
  • t
scheduler
  • nly
8.0 2-level scheduling 11.2 3-level scheduling 14.0 4-level scheduling 16.2 8-level scheduling 24.4 16
slide-17
SLIDE 17 Context switch
  • verhead
  • In
p rotot yp e, measure what p rop
  • rtion
  • f
context switches a re to scheduler threads (i.e., extra)
  • On
a real OS, measure rate
  • f
context switches in va rious w
  • rk
loads
  • Project
slo wdo wn in t w
  • OSs,
based
  • n
ex- p ected rate and sp eed
  • f
context switches 17
slide-18
SLIDE 18 Context Switches fo r Simple T ests Client/ P a rallel Real- General Server Database time RM1 57 322 101 RM2 19 26 RM3 19 LS1 25 622 17 JA V A1 46 FIF O1 9 RR1 114 238 249 7 RR2 3 242 14 RR3 234 RR4 243 User invo cations 492 957 1193 165 Ro
  • t
scheduler 262 956 1237 142 Rate monotonic 43 1 65 Lottery scheduler 30 57 3 Applet scheduler 2 FIF O scheduler 1 Round-robin sched 8 8 8 Scheduler invo c. 346 956 1303 218 T
  • tal
csw 838 1913 2496 383 Scheduler % 41% 50% 52% 56% 18
slide-19
SLIDE 19 Statistics fo r Common Application s gzip gcc tar configure Run time (sec) 26.4 35.3 9.6 26.0 Context switches/sec 11 32 81 202 T raps/sec 10 562 22 3470 System calls/sec 23 651 517 1807 Device interrupts/sec 427 509 3337 1055 19
slide-20
SLIDE 20

2 4 6 8 10 1 10 100 1000 Overall slowdown (percent) Additional overhead per context switch (microsec) Microkernel:configure (13000 csw/s) Microkernel:gcc (3500 csw/s) Microkernel:gzip (930 csw/s) FreeBSD:configure (202 csw/s) FreeBSD:gcc (32 csw/s) FreeBSD:gzip (11 csw/s)

20
slide-21
SLIDE 21 Co de Complexit y
  • Dispatcher:
550 ra w, 160 lines
  • f
semicolons
  • Example
schedulers: each is 100{200 semicolons 21
slide-22
SLIDE 22 Related W
  • rk
Existing multi-p
  • li
cy systems:
  • Multi-cl
ass systems: Mach, NT
  • Aegis
Exok ernel 22
slide-23
SLIDE 23 Related W
  • rk
Existing hiera rchical scheduling p
  • licies:
  • KeyK
OS meters
  • Lottery/stride
scheduling
  • Sta
rt-tim e F air Queuing (SF Q) CPU inheritance scheduling is not a p
  • licy
. 23
slide-24
SLIDE 24 Status W
  • rks,
but needs to b e tried in a real OS Fluk e k ernel implementation in p rogress Source fo r p rotot yp e will b e available from the OSDI and Flux p roject w eb pages: http://www.cs.utah.edu/projects/fl ux/ 24
slide-25
SLIDE 25 Conclusion CPU inheritance scheduling:
  • Provides
exible CPU scheduling, and sup- p
  • rts
many existing p
  • licies
and mecha- nisms
  • Is
ecient enough fo r common uses
  • Is
straightfo rw a rd to implement (in user mo de)
  • Supp
  • rts
the Fluk e nested p ro cess mo del 25