Quest-V a Virtualized Multikernel Richard West richwest@cs.bu.edu - - PowerPoint PPT Presentation

quest v a virtualized multikernel
SMART_READER_LITE
LIVE PREVIEW

Quest-V a Virtualized Multikernel Richard West richwest@cs.bu.edu - - PowerPoint PPT Presentation

Quest-V a Virtualized Multikernel Richard West richwest@cs.bu.edu Ye Li, Eric Missimer {liye, missimer}@cs.bu.edu Computer Science Goals Develop system for high-confidence (embedded) systems Predictable real-time support


slide-1
SLIDE 1

Quest-V – a Virtualized Multikernel

Richard West richwest@cs.bu.edu Ye Li, Eric Missimer {liye, missimer}@cs.bu.edu

Computer Science

slide-2
SLIDE 2

2/32

Goals

  • Develop system for high-confidence

(embedded) systems

  • Predictable – real-time support
  • Resistant to component failures & malicious

manipulation

  • Self-healing
  • Online recovery of software

component failures

slide-3
SLIDE 3

3/32

Target Applications

  • Healthcare
  • Avionics
  • Automotive
  • Factory automation
  • Robotics
  • Space exploration
  • Other safety-critical domains
slide-4
SLIDE 4

4/32

Case Studies

  • $327 million Mars Climate Orbiter

– Loss of spacecraft due to Imperial / Metric conversion error (September 23, 1999)

  • 10 yrs & $7 billion to develop Ariane

5 rocket – June 4, 1996 rocket destroyed during flight – Conversion error from 64-bit double to 16-bit value

  • 50+ million people in 8 states &

Canada in 2003 without electricity due to software race condition

slide-5
SLIDE 5

5/32

Approach

  • Quest-V for multicore processors

– Distributed system on a chip – Time as a first-class resource

  • Cycle-accurate time accountability

– Separate sandbox kernels for system sub-components – Isolation using h/w-assisted memory virtualization

  • Extended page tables (EPTs – Intel)
  • Nested page tables (NPTs – AMD)

– Security enforcible using VT-d + interrupt remapping (IR)

  • Device interrupts scoped to specific sandboxes
  • DMA xfers to specific host memory
slide-6
SLIDE 6

6/32

Architecture Overview

Sandbox M

Main VCPU IO VCPU

Kernel Apps CPU M Monitor

Shared Mem / Msg Channel

. . .

Sandbox 1

Main VCPU IO VCPU

Kernel Apps CPU 1

. . .

Migration

Monitor

Shared Drivers

Sandbox 2

Main VCPU IO VCPU

Kernel Apps CPU 2 Monitor

slide-7
SLIDE 7

7/32

Isolation

  • Memory virtualization using EPTs isolates

sandboxes and their components

  • Dedicated physical cores assigned to sandboxes
  • Temporal isolation using Virtual CPUs (VCPUs)PUs)
slide-8
SLIDE 8

8/32

Extended Page Tables

slide-9
SLIDE 9

9/32

Quest-V Memory Layout

BIOS Sandbox Kernel 1 Shared Driver EPT Data Structure 1 Sandbox Kernel M Shared Driver EPT Data Structure M . . . User Space Shared Memory Region 0x00000000 0xFFFFFFFF Monitor 1 Monitor M . . . Sandbox Kernel M Shared Driver EPT Data Structure M Monitor M User Space Shared Memory Region Physical Memory Layout Virtual Memory Layout Sandbox M . . . Sandbox Kernel 1 Shared Driver EPT Data Structure 1 Monitor 1 User Space Shared Memory Region Virtual Memory Layout Sandbox 1 . . . 0x00000000 0xFFFFFFFF

slide-10
SLIDE 10

10/32

  • VCPUs for budgeted real-time execution of

threads and system events (e.g., interrupts)

  • Threads mapped to VCPUs
  • VCPUs mapped to physical cores
  • Sandbox kernels perform local scheduling on

assigned cores

  • Avoid VM-Exits to Monitor – eliminate

cache/TLB flushes

Predictability

slide-11
SLIDE 11

11/32

VCPUs in Quest(-V)

Main VCPUs I/O VCPUs Threads PCPUs (Cores, HTs)

slide-12
SLIDE 12

12/32

VCPUs in Quest(-V)

  • Two classes

– Main → for conventional tasks – I/O → for I/O event threads (e.g., ISRs)

  • Scheduling policies

– Main → sporadic server (SS) – I/O → priority inheritance bandwidth- preserving server (PIBS)

slide-13
SLIDE 13

13/32

SS Scheduling

  • Model periodic tasks

– Each SS has a pair (C,T) s.t. a server is guaranteed C CPU cycles every period of T cycles when runnable

  • Guarantee applied at foreground priority
  • background priority when budget depleted

– Rate-Monotonic Scheduling theory applies

slide-14
SLIDE 14

14/32

PIBS Scheduling

  • IO VCPUs have utilization factor, U

V,IO

  • IO VCPUs inherit priorities of tasks (or Main

VCPUs) associated with IO events – Currently, priorities are ƒ(T) for corresponding Main VCPU – IO VCPU budget is limited to:

  • T

V,main* U V,IO for period T V,main

slide-15
SLIDE 15

15/32

PIBS Scheduling

  • IO VCPUs have eligibility times, when they

can execute

  • t

e = t + Cactual / U V,IO

– t = start of latest execution – t >= previous eligibility time

slide-16
SLIDE 16

16/32

Example VCPU Schedule

slide-17
SLIDE 17

17/32

Sporadic Constraint

  • Worst-case preemption by a sporadic task for all other tasks

is not greater than that caused by an equivalent periodic task (1) Replenishment, R must be deferred at least t+TV (2) Can be deferred longer (3) Can merge two overlapping replenishments

  • R1.time + R1.amount >= R2.time then MERGE
  • Allow replenishment of R1.amount +R2.amount at

R1.time

slide-18
SLIDE 18

18/32

Example Replenishments

1 10 10 20,00 00,00 00,00 17 20 30 40 50 1 10 1 16 1 60 70 80 10 90 100 12 8 110 02,00 18,50 00,00 02,40 18,50 00,00 18,50 02,90 00,00 02,50 02,90 16,100 02,80 02,90 16,100 02,90 16,100 02,130 16,100 02,130 02,140 1 10 10 17 20 30 40 50 60 70 80 90 100 110 1 10 17 1 10 17 amount , time Replenishment Queue Element VCPU 0 (C=10, T=40, Start=1) VCPU 1 (C=20, T=50, Start=0) Premature Replenishment Corrected Algorithm 2 IOVCPU (Utilization=4%) 2 2 2 (A) (B)

Interval [t=0,100] (A) VCPU 1 = 40%, (B) VCPU 1 = 46%

slide-19
SLIDE 19

19/32

Utilization Bound Test

  • Sandbox with 1 PCPU, n Main VCPUs, and m

I/O VCPUs – Ci = Budget Capacity of Vi – Ti = Replenishment Period of Vi – Main VCPU, Vi – Uj = Utilization factor for I/O VCPU, Vj

i=0 n−1 Ci

Ti +∑

j=0 m−1

(2−Uj) ⋅Uj≤n⋅ (

n

√2−1)

slide-20
SLIDE 20

20/32

Efficiency

  • Lightweight I/O virtualization & interrupt

passthrough capabilities

  • e.g., VNICs provide separate interfaces to

single NIC device

  • Avoid VM-Exits into monitor for scheduling &

I/O mgmt

slide-21
SLIDE 21

21/32

I/O Passthrough

Sandbox M

Main VCPU IO VCPU

Kernel Apps CPU M Monitor

Shared Mem / Msg Channel

. . .

Sandbox 1

Main VCPU IO VCPU

Kernel Apps CPU 1

. . .

Migration

Monitor

Shared Drivers

Sandbox 2

Main VCPU IO VCPU

Kernel Apps CPU 2 Monitor

I/O Device (e.g., NIC)

slide-22
SLIDE 22

22/32

Virtualization Costs

  • Example Data TLB overheads
  • Xeon E5506 4-core @ 2.13GHz, 4GB RAM
slide-23
SLIDE 23

23/32

  • Example NIC RX Ring Buffer

Device (Driver) Sharing

slide-24
SLIDE 24

24/32

Shared Driver Costs

  • Quest-V

Linux Xen (PVM) Xen (HVM)

100 200 300 400 500 600 700 800 900 1000

Netperf UDP Throughput Test

1xNetperf 2xNetperf 4xNetperf Quest

UDP Throughput (Mbps)

slide-25
SLIDE 25

25/32

Example Fault Recovery

Main VCPU IO VCPU Kernel Monitor NIC Driver Main VCPU Kernel Monitor NIC Driver Msg Channel Msg Channel NIC (1) Send Msg Main VCPU Kernel Monitor NIC Driver Msg Channel Receive Msg IO VCPU (2) (3) (4) Component Failure Detection SB Kernel (Guest) Monitor (Host) VM-Exit VM-Entry Fault Identification And Handling Remote Event Notification via IPI Component Recovery in Remote Sandbox Component Recovery in Local Sandbox (1) (2) (3) (4)

slide-26
SLIDE 26

26/32

Faulting Driver for Web Server

  • httperf with web server in presence of

Realtek NIC driver fault

  • Requests / replies set at 120/s

under normal operation – Single-threaded server – Focus on one process – Recovery time rather than throughput

slide-27
SLIDE 27

27/32

Performance Costs

  • Core i5-2500K with 8GB RAM

Recovery Phases CPU Cycles Local Recovery Remote Recovery

VM-Exit 885 Driver Switch 10503 N/A IPI Round Trip N/A 4542 VM-Enter 663 Driver Re-initialization 1.45E+07 Network Re- initialization 78351

slide-28
SLIDE 28

28/32

Inter-Sandbox Communication

  • Via Communication VCPUs

– High rate VCPUs: 50/100ms – Low rate VCPUs: 40/100ms

slide-29
SLIDE 29

29/32

The Quest Team

  • Rich West
  • Ye Li
  • Eric Missimer
  • Matt Danish
  • Gary Wong
slide-30
SLIDE 30

30/32

Further Information

  • Quest website
  • http://www.cs.bu.edu/fac/richwest/quest.html
  • Github public repo
  • http://questos.github.com
slide-31
SLIDE 31

31/32

Quest(-V) Summary

  • About 11,000 lines of kernel code
  • 175,000+ lines including lwIP, drivers, regression

tests

  • SMP, IA32, paging, VCPU scheduling, USB, PCI,

networking, etc

  • Quest-V requires BSP to send INIT-SIPI-SIPI to

APs, as in SMP system – BSP launches 1st (guest) sandbox – APs “VM fork” their sandboxes from BSP copy

slide-32
SLIDE 32

32/32

Final Remarks

  • Quest-V multikernel

– Leverages H/W virtualization for safety/isolation – Avoids VM-Exits for VCPU/thread scheduling – Online fault recovery – Shared memory communication channels – Lightweight I/O virtualization – Predictable VCPU scheduling framework

slide-33
SLIDE 33

33/32

Isolation

  • 4 sandboxes: SB0,..., SB3

– SB1 sends msgs to SB0, SB2 & SB3 at 50ms intervals

  • SB0, SB2 & SB3 rx at 100, 800, 1000ms intervals,

respectively

– SB0 handles ICMP requests

  • sent remotely at 500ms intervals

– Observe failure + recovery in SB0 – Messaging threads on Main VCPUs: 20ms/100ms – NIC driver I/O VCPU: 1ms/10ms

slide-34
SLIDE 34

34/32

Isolation

slide-35
SLIDE 35

35/32

Next Steps

  • VCPU/thread migration
  • API extensions
  • Application development
  • Hardware performance monitoring
  • RT-USB sub-system
  • Fault detection
slide-36
SLIDE 36

36/32

Real-Time Migration

  • At t, guarantee VCPU, Vsrc, moves from SBsrc→SBdest

without violating: (a) Remote VCPU requirements,∀Vdest∈SBdest (b) Requirements of Vsrc

  • Use migration VCPUs, Vmigrate [Cmig,Tmig]
  • Ensure:
  • Ensure: C[memcpy of Vsrc+thread(s)] <= Cmig

– while Vsrc is ineligible for execution

U dest+Csrc T src ≤(n+1)(

n+1

√2−1),∣V dest∣=n@t '<t

slide-37
SLIDE 37

37/32

Real-Time Migration

Make migration decision (Find destination) SB Kernel (Guest) Monitor (Host) VM-Exit VM-Entry Push quest_tss address(es) to destination Copy quest_tss structure(s) Resume local scheduling Resume local scheduling (1) (2) (3) (4) Move addr space and VCPU from source (5) Migration thread event received Main VCPU IO VCPU Kernel Monitor Scheduler Main VCPU Kernel Monitor (1) Main VCPU Kernel Monitor Scheduler Main VCPU (2) (3) (4) (5) IO VCPU Migration Thread Scheduler

slide-38
SLIDE 38

38/32

VCPU API

  • Full thread support

– NB: Limit a VCPU to one address space – Reduces migration costs

int VCPU_create(struct vcpu_param *param) struct vcpu_param { int vcpuid; policy; // SCHED_SPORADIC, SCHED_PIBS int mask; // affinity mask int C; // budget int T; // period }

slide-39
SLIDE 39

39/32

VCPU API

  • int VCPU_destroy(int vcpuid, int force);
  • int VCPU_setparam(int vcpuid, struct vcpu_param

*param);

  • int VCPU_getparam(struct vcpu_param *param);
  • Int VCPU_bind_task(int vcpuid);
  • Policy:

– Which sandboxes assigned which VCPUs?

  • Utilization considerations
  • Cache usage (perfmon)
  • Have SBs announce their utilization (bidding)
slide-40
SLIDE 40

40/32

Real-Time Fault Recovery

  • Real-time fault recovery

– Local & remote – Requires SB with working scheduler for predictable recovery – Remote recovery can avoid re-initialization

  • f faulting service
slide-41
SLIDE 41

41/32

Real-Time Fault Recovery

Fault Recovery Thread Exit Code Entry Code Restore Machine State for Recovery Code Start / Continue Recovery Procedure Monitor LAPIC Timer Handler Save Machine State for Recovery Code LAPIC Timer Interrupt Schedule De-schedule Sandbox Kernel Monitor

slide-42
SLIDE 42

42/32

Applications

  • RacerX
  • TORCS
  • Benchmarks

– Web server – Netperf – Canny – Others?

slide-43
SLIDE 43

43/32

Performance Monitoring

  • (LLC) Cache hits, misses, instrs retired, TSC,...
  • Can predict s/w thread LLC occupancy in real-

time – E' = E + (1-E/C)*M

l - E/C*M

  • – See West, Zaroo, Waldspurger & Zhang
  • OSR, December 2010
slide-44
SLIDE 44

44/32

Experiments

  • Intel Core2 Extreme QX6700 @ 2.66GHz
  • 4GB RAM
  • Gigabit Ethernet (Intel 8254x “e1000”)
  • UHCI USB Host Controller

– 1GB USB memory stick

  • Parallel ATA CDROM in PIO mode
  • Measurements over 5sec windows using

bandwidth-preserving logging thread

slide-45
SLIDE 45

45/32

Experiments

  • CPU-bound threads: increment a counter
  • CD ROM/USB threads: read 64KB data from

filesystem on corresponding device

slide-46
SLIDE 46

46/32

I/O Effects on VCPUs

VCPU VC VT threads VCPU0 2 5 CPU-bound VCPU1 2 8 Reading CD, CPU-bound VCPU2 1 4 CPU-bound VCPU3 1 10 Logging, CPU- bound IOVCPU 10% ATA

slide-47
SLIDE 47

47/32

I/O Effects on VCPUs

slide-48
SLIDE 48

48/32

PIBS vs SS IO VCPU Scheduling

VCPU VC VT threads VCPU0 1 20 CPU-bound VCPU1 1 30 CPU-bound VCPU2 10 100 Network, CPU- bound VCPU3 20 100 Logging, CPU- bound IOVCPU 1% Network

slide-49
SLIDE 49

49/32

PIBS vs SS IO VCPU Scheduling

t=50 start ICMP ping flood. Here, we see comparison overheads of two scheduling policies

slide-50
SLIDE 50

50/32

PIBS vs SS IO VCPU Scheduling

Network bandwidth of two scheduling policies

slide-51
SLIDE 51

51/32

IO VCPU Sharing

VCPU VC VT threads VCPU0 30 100 USB, CPU-bound VCPU1 10 110 CPU-bound VCPU2 10 90 Network, CPU-bound VCPU3 100 200 Logging, CPU-bound IO VCPU 1% USB,Network VCPU0 30 100 USB, CPU-bound VCPU1 10 110 CPU-bound VCPU2 10 90 Network, CPU-bound VCPU3 100 200 Logging, CPU-bound IO VCPU1 1% USB IO VCPU2 1% Network

slide-52
SLIDE 52

52/32

IO VCPU Sharing

slide-53
SLIDE 53

53/32

Conclusions

  • Temporal isolation on IO events and tasks
  • PIBS + SS Main & IO VCPUs can guarantee

utilization bounds

  • Future investigation of higher-level policies
  • Future investigation of h/w performance

counters for VCPU-to-PCPU scheduling

slide-54
SLIDE 54

54/32

Architecture Overview

slide-55
SLIDE 55

55/32

Example Fault Recovery

Main VCPU Main VCPU IO VCPU IO VCPU

Kernel Monitor Monitor

NIC Driver Main VCPU Main VCPU

Kernel Monitor Monitor

NIC Driver Msg Channel Msg Channel

NIC NIC

(1)

Main VCPU Main VCPU

Kernel Monitor Monitor

NIC Driver Msg Channel IO VCPU IO VCPU

(2) (3) (4) Component Failure Detection Component Failure Detection SB Kernel (Guest) Monitor (Host) VM-Exit V M

  • E

n t r y Fault Identification And Handling Fault Identification And Handling Remote Event Notification (IPI) Remote Event Notification (IPI) Component Recovery in Remote Sandbox Component Recovery in Remote Sandbox Component Recovery in Local Sandbox Component Recovery in Local Sandbox (1) (2) (3) (4)