[PDF] - CPU Inheritance Scheduling Bry an F o rd Sai Susa rla PDF Document

SLIDE 1 CPU Inheritance Scheduling Bry an F

rd

Sai Susa rla Computer Systems Lab

rato

ry Depa rtment

f

Computer Science Universit y

f

Utah flux@cs.utah.edu http://www.cs.utah.edu/projects/flu x/ Octob er 30, 1996 1

SLIDE 2 Key Concepts Threads schedule each

ther

b y donating the CPU using a directed yield p rimiti ve. One ro

t

scheduler thread p er p ro cesso r sources all CPU time. Kernel dispatcher manages threads, events, and CPU donation without making any scheduling p

licy

decisions. 2

SLIDE 3 The Dispatcher Implements thread sleep, w ak eup, schedule, etc. Runs in the context

f

currently running thread. Has no notion

f

thread p rio rit y , CPU usage, clo cks,

r

timers. Dispatcher w ak es a scheduler thread when:

Scheduler's

client blo cks.

Event
f

interest to the scheduler

ccurs.

3

SLIDE 4 Scheduling Example

Port CPU thread Scheduler Running thread Ready threads Waiting thread Scheduler donation CPU App 2 App 1 queues Ready scheduling requests

4

SLIDE 5 The schedule()

p

eration schedule(thr ead , port, sensitivity ) Sensitivit y levels:

ON

BLOCK: W ak e the scheduler any time its client thread blo cks.

ON

SWITCH: W ak e the scheduler

nly

when a dierent client is requesting the CPU.

ON

CONFLICT: W ak e the scheduler

nly

when t w

r

mo re clients a re runnable at the same time. 5

SLIDE 6 Implicit Donation W

rks

lik e schedule(), except done implici tl y; e.g.:

Thread

attempting to lo ck a held mutex donates to current

wner
Client

thread donates to server thread fo r the duration

f

an RPC Analogous to p rio rit y inheritance in traditional systems.

(high-priority) T0 CPU S0 T1 (low-priority)

6

SLIDE 7 Multip ro cesso r Scheduling

Ready Scheduler threads CPU 1 CPU 0 Scheduler App 2 App 1 queues

7

SLIDE 8 Benets

Hiera

rchical, stack able scheduling p

licies
Application-sp

ecic scheduling p

licies
Mo

dula r CPU usage control

Automatic

p rio rit y inheritance

Accurate

CPU usage accounting

Naturally

extends to multip ro cesso rs

Supp
rts

p ro cesso r anit y p

licies

and scheduler activations 8

SLIDE 9 Protot yp e Implementation Implemented as a fancy threads pack age in a BSD p ro cess. Schedulers implemented:

Fixed

p rio rit y round-robin and FIF O

Rate

monotonic

Lottery

9

SLIDE 10 Scheduling Hiera rchy

Round-robin Real-time Scheduler Rate-monotonic Root Scheduler Fixed-priority FIFO Scheduler Non-preemptive threads Cooperating Real-time periodic threads Java applet threads

RM2 LS1 JAVA1 JAVA2 FIFO1 RM1 FIFO2 RR1 RR2

Timesharing Class Background Web browser Lottery scheduling Lottery scheduling

10

SLIDE 11 Results Three measures:

Scheduling

b ehavio r (co rrectness)

Overhead
Implementation

complexit y 11

SLIDE 12

Multi-policy Scheduling Behavior

0.5 1 1.5 2 2.5 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100

Time (clock ticks) Accumulated CPU usage (sec)

Rate-monotonic thread 1 (50%) Rate monotonic thread 2 (25%) Lottery thread (Interactive - bursty) Round-robin thread 1 (Insatiable) Round-robin thread 2 (Insatiable)

RM1 (50%) RM2 (25%) LS1 (burst) RR1 (compute) RR2 (compute)

SLIDE 13

Modular Control of CPU Usage

Round-robin thread 2 Round-robin thread 1 FIFO thread 2 FIFO thread 1 Applet thread 2 Applet thread 1

10 20 30 40 50 60 70 80 90 100 200 600 1000 1400 1800 2200 2600 3000 3400 3800 4200 4600 5000 5400 5800 6200 6600 7000 7400 7800 8200 8600 9000 9400 9800

Time (clock ticks) Relative CPU time allocation (percent)

SLIDE 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 10 20 30 40 50 60 70

Number of occurrences

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Mutex lock latency for real-time thread (clock ticks)

Real-time Scheduling Behavior

CPU donation on mutex contention No CPU donation

SLIDE 15 P erfo rmance

Dispatcher
verhead

{ Base cost { Sensitivit y to hiera rchy depth

Context

switching

verhead

{ Numb er

f

additional context switches { Cost

f

context switches 15

SLIDE 16 Dispatcher Micro-b enchma rks Scheduling Hiera rchy Depth Dispatch Time (s) Ro

t

scheduler

nly

8.0 2-level scheduling 11.2 3-level scheduling 14.0 4-level scheduling 16.2 8-level scheduling 24.4 16

SLIDE 17 Context switch

verhead
In

p rotot yp e, measure what p rop

rtion
f

context switches a re to scheduler threads (i.e., extra)

On

a real OS, measure rate

f

context switches in va rious w

rk

loads

Project

slo wdo wn in t w

OSs,

based

n

ex- p ected rate and sp eed

f

context switches 17

SLIDE 18 Context Switches fo r Simple T ests Client/ P a rallel Real- General Server Database time RM1 57 322 101 RM2 19 26 RM3 19 LS1 25 622 17 JA V A1 46 FIF O1 9 RR1 114 238 249 7 RR2 3 242 14 RR3 234 RR4 243 User invo cations 492 957 1193 165 Ro

t

scheduler 262 956 1237 142 Rate monotonic 43 1 65 Lottery scheduler 30 57 3 Applet scheduler 2 FIF O scheduler 1 Round-robin sched 8 8 8 Scheduler invo c. 346 956 1303 218 T

tal

csw 838 1913 2496 383 Scheduler % 41% 50% 52% 56% 18

SLIDE 19 Statistics fo r Common Application s gzip gcc tar configure Run time (sec) 26.4 35.3 9.6 26.0 Context switches/sec 11 32 81 202 T raps/sec 10 562 22 3470 System calls/sec 23 651 517 1807 Device interrupts/sec 427 509 3337 1055 19

SLIDE 20

2 4 6 8 10 1 10 100 1000 Overall slowdown (percent) Additional overhead per context switch (microsec) Microkernel:configure (13000 csw/s) Microkernel:gcc (3500 csw/s) Microkernel:gzip (930 csw/s) FreeBSD:configure (202 csw/s) FreeBSD:gcc (32 csw/s) FreeBSD:gzip (11 csw/s)

20

SLIDE 21 Co de Complexit y

Dispatcher:

550 ra w, 160 lines

f

semicolons

Example

schedulers: each is 100{200 semicolons 21

SLIDE 22 Related W

rk

Existing multi-p

li

cy systems:

Multi-cl

ass systems: Mach, NT

Aegis

Exok ernel 22

SLIDE 23 Related W

rk

Existing hiera rchical scheduling p

licies:
KeyK

OS meters

Lottery/stride

scheduling

Sta

rt-tim e F air Queuing (SF Q) CPU inheritance scheduling is not a p

licy

. 23

SLIDE 24 Status W

rks,

but needs to b e tried in a real OS Fluk e k ernel implementation in p rogress Source fo r p rotot yp e will b e available from the OSDI and Flux p roject w eb pages: http://www.cs.utah.edu/projects/fl ux/ 24

SLIDE 25 Conclusion CPU inheritance scheduling:

Provides

exible CPU scheduling, and sup- p

rts

many existing p

licies

and mecha- nisms

Is

ecient enough fo r common uses

Is

straightfo rw a rd to implement (in user mo de)

Supp
rts

the Fluk e nested p ro cess mo del 25