Nested Virtualization on ARM NEVE: Nested Virtualization Extensions - - PowerPoint PPT Presentation

nested virtualization on arm
SMART_READER_LITE
LIVE PREVIEW

Nested Virtualization on ARM NEVE: Nested Virtualization Extensions - - PowerPoint PPT Presentation

Nested Virtualization on ARM NEVE: Nested Virtualization Extensions Jin Tack Lim Christoffer Dall Shih-Wei Li Jason Nieh Marc Zyngier LEADING jitack@cs.columbia.edu christo ff er.dall@linaro.org


slide-1
SLIDE 1

connect.linaro.org

LEADING 
 COLLABORATION 
 IN THE ARM 
 ECOSYSTEM

Nested Virtualization on ARM

NEVE: Nested Virtualization Extensions

jitack@cs.columbia.edu christoffer.dall@linaro.org shih-wei@cs.columbia.edu, nieh@cs.columbia.edu marc.zyngier@arm.com

Jin Tack Lim Christoffer Dall Shih-Wei Li
 Jason Nieh Marc Zyngier

slide-2
SLIDE 2

Hardware Hypervisor

VM VM Kernel

App App

Nested Virtualization

Hypervisor

VM

Kernel

App App VM

Kernel

App App

slide-3
SLIDE 3

Terminology

Hardware Host Hypervisor

VM VM Kernel App App

Guest Hypervisor

Nested VM

Kernel App App

Nested VM

Kernel App App

slide-4
SLIDE 4

Use Cases

  • 1. Run guest operating systems with built-in virtualization.
  • 2. IaaS hosting private clouds
  • 3. Test your hypervisor in a VM
  • 4. Debug your hypervisor in a VM
  • 5. Develop hypervisors using a cloud
slide-5
SLIDE 5

VM

ARM Virtualization Extensions

EL0 EL1 EL2 Hypervisor

Kernel User Space

VM

Kernel User Space

slide-6
SLIDE 6

VM

ARM Nested Virtualization

EL0 EL1 EL2 Host Hypervisor

Kernel User Space

VM

Kernel User Space

Virtual
 EL2

Guest Hypervisor Guest Hypervisor

slide-7
SLIDE 7

VM

ARM Nested Virtualization

EL0 EL1 EL2 Host Hypervisor

Kernel User Space

VM

Kernel User Space

EL ??

Guest Hypervisor Guest Hypervisor

slide-8
SLIDE 8

VM

ARMv8.0 Nested Virtualization

EL0 EL1 EL2 Host Hypervisor

Kernel User Space

VM

Kernel User Space

EL0

Guest Hypervisor Guest Hypervisor

Trap-and-emulate

slide-9
SLIDE 9

VM

ARMv8.0 Nested Virtualization

EL0 EL1 EL2 Host Hypervisor

Kernel User Space

VM

Kernel User Space

EL1

Guest Hypervisor Guest Hypervisor

?? -and-emulate

slide-10
SLIDE 10

VM

ARMv8.3 Nested Virtualization

EL0 EL1 EL2 Host Hypervisor

Kernel User Space

EL1

Guest Hypervisor

Trap -and-emulate

  • Gives you software emulation of

vEL2 in EL1

  • HCR_EL2.NV:
  • Traps EL2 operations executed in

EL1 to EL2

  • Traps eret to EL2
  • CurrentEL reports EL2 even in EL1
slide-11
SLIDE 11

KVM/ARM Nested Virtualization Implementation

  • EL2 Emulation
  • Stage 2 MMU Virtualization
  • Hyp Timer Virtualization
  • Nested Virtual Interrupts
slide-12
SLIDE 12

Nested CPU Virtualization

struct kvm_cpu_context { u64 sys_regs[NR_SYS_REGS]; + u64 el2_regs[NR_EL2_REGS]; } struct kvm_vcpu_arch { … struct kvm_cpu_context ctxt; }

slide-13
SLIDE 13

Host

Linux App App

VM

Kernel App App KVM

EL0 EL1 EL2

Restore EL1 sys_regs Save EL1 sys_regs

Hypervisor-VM Switch

slide-14
SLIDE 14

Host

Linux App App

VM

Kernel KVM

EL0 EL1 EL2

Save/restore EL1 sys_regs

Guest Hypervisor

Save/restore el2_regs

Hypervisor-Hypervisor Switch

slide-15
SLIDE 15

Emulating EL2 in EL1

  • Define mapping of EL2 registers to EL1 registers
  • Example: TTBR0_EL2 to TTBR0_EL1
  • Example: SCTLR_EL2 adapted to SCTLR_EL1
  • Shadow EL1 registers
slide-16
SLIDE 16

Nested CPU Virtualization

struct kvm_cpu_context { u64 sys_regs[NR_SYS_REGS]; + u64 el2_regs[NR_EL2_REGS]; + u64 shaow_sys_regs[NR_SYS_REGS]; } struct kvm_vcpu_arch { … struct kvm_cpu_context ctxt; }

slide-17
SLIDE 17

&sys_regs u64 *vcpu->ctxtx.hw_regs &shadow_sys_regs

PSTATE.mode == EL2 PSTATE.mode == EL0/1

Shadow Registers

slide-18
SLIDE 18

Virtual Exceptions

  • Trap to virtual EL2
  • “Forward” exceptions
  • Emulate virtual exceptions

VM

EL0 EL1 EL2

Host KVM

Kernel User Space Guest KVM

vEL2

slide-19
SLIDE 19

Virtual Exceptions

  • Returning from virtual EL2
  • Trap eret to EL2 (ARMv8.3)
  • Emulate virtual exception return

VM

EL0 EL1 EL2

Host KVM

Kernel User Space Guest KVM

vEL2

slide-20
SLIDE 20

KVM/ARM Nested Virtualization Implementation

  • EL2 Emulation
  • Stage 2 MMU Virtualization
  • Hyp Timer Virtualization
  • Nested Virtual Interrupts
slide-21
SLIDE 21

Memory Virtualization

EL0 EL1

Kernel User Space

Stage 1: VA -> IPA

slide-22
SLIDE 22

VM

Memory Virtualization

Host Hypervisor EL2 EL0 EL1

Kernel User Space

Stage 1: VA -> IPA Stage 2: IPA -> PA

slide-23
SLIDE 23

VM

Nested VM

Memory Virtualization

Host Hypervisor EL2 EL0 EL1

Kernel User Space Guest Hypervisor

???? Stage 1: VA -> IPA Stage 2: IPA -> PA

slide-24
SLIDE 24

VM

Nested VM

Memory Virtualization

Host Hypervisor EL2

Shadow Stage 2:
 IPA -> PA

EL0 EL1

Kernel User Space

Stage 1: VA -> IPA

Guest Hypervisor

Virtual stage 2

slide-25
SLIDE 25

KVM/ARM Nested Virtualization Implementation

  • EL2 Emulation
  • Stage 2 MMU Virtualization
  • Hyp Timer Virtualization
  • Nested Virtual Interrupts
slide-26
SLIDE 26

Nested Timer Virtualization

  • ARM provides a virtual and physical timer in EL1
  • EL2 provides a separate EL2 “hyp” timer
  • Nested KVM/ARM supports a virtual CPU with EL2 and the hyp timer
slide-27
SLIDE 27

KVM/ARM Nested Virtualization Implementation

  • EL2 Emulation
  • Stage 2 MMU Virtualization
  • Hyp Timer Virtualization
  • Nested Virtual Interrupts
slide-28
SLIDE 28

ARM Generic Interrupt Controller (GIC)

GIC CPU

CPU Interface CPU Interface Dist.

IRQ ACK/EOI

Device
 Interrupt
 Lines

slide-29
SLIDE 29

ARM Generic Interrupt Controller (GIC)

GIC

CPU Interface CPU Interface Dist.

IRQ ACK/EOI

Virtual CPU Interface Virtual CPU Interface

VIRQ ACK/EOI

List Registers (LRs) List Registers (LRs)

CPU

slide-30
SLIDE 30

VM

Nested VM

Nested Interrupt Virtualization

  • Deliver virtual interrupts

from the host to the VM

Host VMM Kernel User Space Guest VMM Virtual CPU Interface

LRs

slide-31
SLIDE 31

VM

Nested VM

Nested Interrupt Virtualization

Host VMM Kernel User Space Guest VMM Virtual CPU Interface

LRs

  • Deliver virtual interrupts

from the guest hypervisor to the nested VM

  • Shadow list registers
  • The nested VM can ACK

and EOI virtual interrupts without trapping

slide-32
SLIDE 32

Performance Evaluation

  • Problem: No ARMv8.3 hardware available.
  • Solution: Use ARMv8.0 hardware with the software modification
slide-33
SLIDE 33

Emulating v8.3 on v8.0

Host Hypervisor VM EL1 EL0 EL2 Guest Hypervisor Nested VM OS Kernel App App App ARMv8.0 Hardware Paravirtualization

HVC HVC HVCHVC HVC

slide-34
SLIDE 34

Hypercall MicroBenchmark

Hypervisor VM OS Kernel App App App EL1 EL0 EL2

Hypercall Return

Host Hypervisor VM EL1 EL0 EL2 Guest Hypervisor Nested VM OS Kernel App App App

Hypercall Return

slide-35
SLIDE 35

Hypercall MicroBenchmark

ARMv8.3 VM Nested VM Cycle counts

2,729 422,720

Ratio to VM

1 155x

slide-36
SLIDE 36

Application Benchmarks

10 20 30 40 50 ARMv8.3 VM ARMv8.3 Nested 1 2 3 4 5 Kernbench Hackbench SPECjvm2008 TCP RR TCP STREAM TCP MAERTS Apache Nginx Memcached MySQL

Normalized overhead (lower is better)

slide-37
SLIDE 37

Nested VM Exit/Entry on ARM

Host Hypervisor VM EL1 EL0 EL2 Guest Hypervisor Nested VM OS Kernel App App App

VM Entry VM Exit

EL1 Registers EL2 Registers > 120 traps

slide-38
SLIDE 38

NEVE: NEsted Virtualization Extensions for ARM

  • Supports unmodified guest hypervisors and OSes
  • Improves performance by providing register redirection
slide-39
SLIDE 39

Register Classification

  • VM registers: EL1 registers only affecting the nested VM’s execution
  • Hypervisor registers: EL2 registers affecting the hypervisor’s execution
slide-40
SLIDE 40

VM Registers

Host Hypervisor VM EL1 EL0 EL2 Guest Hypervisor Nested VM OS Kernel App App App

VM Entry VM Exit

EL1 Registers

This is when VM register states are used

slide-41
SLIDE 41

VM Registers: Logging to Memory

VM Register msr x0, TTBR0_EL1 Without NEVE

Trap! Memory

slide-42
SLIDE 42

VM Registers: Logging to Memory

msr x0, TTBR0_EL1

TTBR0_EL1

Memory

With NEVE VM Register

slide-43
SLIDE 43

Hypervisor control registers

Host Hypervisor EL1 EL2 Guest Hypervisor EL1 Registers EL2 Registers

  • Can’t apply the technique for VM registers
  • They have an immediate impact (EL2 system registers)
  • Traps are handled by redirecting to EL1 registers in software
slide-44
SLIDE 44

Hypervisor control registers

  • Can’t apply the technique for VM registers
  • They have an immediate impact (EL2 system registers)
  • Traps are handled by redirecting to EL1 registers in software
  • Redirect in hardware instead!

Host Hypervisor EL1 EL2 Guest Hypervisor EL1 Registers EL2 Registers

slide-45
SLIDE 45

Hypercall MicroBenchmark

ARMv8.3 NEVE VM Nested VM Nested VM Cycle counts

2,729 422,720 92,385

Ratio to VM

155x 34x

Trap counts

1 126 15

slide-46
SLIDE 46

Application Workloads

Application Description Application Description Kernbench Kernel compile Netperf TCP_RR Network performance Hackbench Scheduler stress Netperf TCP STREAM Network performance SPECjvm2008 Java Runtime Netperf TCP MAERTS Network performance MySQL Database management Apache Web server stress Memcached Key-Value store Nginx Web server stress

slide-47
SLIDE 47

Experimental Setup

  • ARM Hardware
  • APM X-Gene (ARMv8.0)
  • 8-way SMP
  • 64 GB RAM
  • Software
  • KVM on KVM
  • v4.10
  • Native/VM/Nested VM
  • 4-way SMP
  • 12 GB RAM
  • Virt I/O


(VM/nested VM)

  • 10 Gb Ethernet
  • x86 Hardware
  • Intel E5-2630 v3
  • VMCS Shadowing
  • 8-way SMP
  • 128 GB RAM
slide-48
SLIDE 48

Application Benchmarks

10 20 30 40 50 ARMv8.3 VM ARMv8.3 Nested NEVE Nested 1 2 3 4 5 Kernbench Hackbench SPECjvm2008 TCP RR TCP STREAM TCP MAERTS Apache Nginx Memcached MySQL

Normalized overhead (lower is better)

slide-49
SLIDE 49

Application Benchmarks

10 20 30 40 50 ARMv8.3 VM ARMv8.3 Nested NEVE Nested x86 Nested VM 1 2 3 4 5 Kernbench Hackbench SPECjvm2008 TCP RR TCP STREAM TCP MAERTS Apache Nginx Memcached MySQL

Normalized overhead (lower is better)

slide-50
SLIDE 50

Conclusion

  • We have an implementation of KVM/ARM for v8.3
  • Evaluated nested virtualization performance by emulating ARMv8.3
  • Nested virtualization on ARMv8.3 incurs high overhead
  • Due to the exit multiplication problem
  • NEVE enhances performance significantly by reducing number of traps
  • NEVE is used as basis for extended nested virtualization support in ARMv8.4
  • NEVE to appear at SOSP later month - read the paper for more details
slide-51
SLIDE 51

Code

  • Nested CPU Virtualization patches for ARMv8.3 [RFC v2]:


https://lists.cs.columbia.edu/pipermail/kvmarm/2017-July/026388.html

  • Nested Memory Virtualization patches for ARMv8.3 [RFC]:


https://lists.cs.columbia.edu/pipermail/kvmarm/2017-October/027286.html

  • v8.3 and NEVE Paravirtualization on Linux v4.12-rc1:


https://github.com/columbia/nesting-pub

  • QEMU Patches:


https://github.com/columbia/qemu-pub nested-v2.3.0-model