Live Block Device Operations in QEMU Kashyap Chamarthy - - PowerPoint PPT Presentation

live block device operations in qemu
SMART_READER_LITE
LIVE PREVIEW

Live Block Device Operations in QEMU Kashyap Chamarthy - - PowerPoint PPT Presentation

Live Block Device Operations in QEMU Kashyap Chamarthy <kashyap@redhat.com> FOSDEM 2018 Brussels 1 / 34 Part I Background 2 / 34 KVM / QEMU virtualization components OpenStack libguestfs Compute (Nova) KubeVirt guestfish ( Virt


slide-1
SLIDE 1

Live Block Device Operations in QEMU

Kashyap Chamarthy <kashyap@redhat.com> FOSDEM 2018 Brussels

1 / 34

slide-2
SLIDE 2

Part I

Background

2 / 34

slide-3
SLIDE 3

KVM / QEMU virtualization components Linux: KVM – /dev/kvm

VM1 QEMU VM2 QEMU libvirtd OpenStack Compute (Nova) KubeVirt libguestfs guestfish (Virt Driver)

QMP QMP 3 / 34

slide-4
SLIDE 4

In this talk libvirtd QEMU Host: Linux - VFS, FS, Block Layer Hardware: physical disk

Block Layer qcow2 raw ... file NBD ... QMP Talk focus

Image format I/O protocol

4 / 34

slide-5
SLIDE 5

QEMU’s block subsystem

Emulated storage devices: SCSI, IDE, virtio-blk, . . . $ qemu-system-x86_64 -device help Block driver types: – Format: qcow2, raw, vmdk – I/O Protocol: NBD, file, RBD/Ceph Block device operations: – Offline image manipulation: qemu-img, qemu-nbd – Live: snapshots, image streaming, storage migration, . . .

5 / 34

slide-6
SLIDE 6

QEMU’s block subsystem

Emulated storage devices: SCSI, IDE, virtio-blk, . . . $ qemu-system-x86_64 -device help

  • device help

Block driver types: – Format: qcow2, raw, vmdk – I/O Protocol: NBD, file, RBD/Ceph Block device operations: – Offline image manipulation: qemu-img, qemu-nbd – Live: snapshots, image streaming, storage migration, . . .

5 / 34

Look for "Storage devices:"

slide-7
SLIDE 7

QEMU Copy-On-Write overlays

base (raw)

  • verlay (qcow2)

– Read from the overlay if allocated, otherwise from base – Write to overlay only Use cases: Thin provisioning, snapshots, backups, . . .

6 / 34

slide-8
SLIDE 8

QEMU Copy-On-Write overlays

base (raw)

  • verlay (qcow2)

– Read from the overlay if allocated, otherwise from base – Write to overlay only Use cases: Thin provisioning, snapshots, backups, . . . Create a minimal backing chain: $ qemu-img create -f raw base.raw 5G $ qemu-img create -f qcow2 overlay.qcow2 2G \

  • b base.raw
  • b base.raw -F raw
  • F raw

↑ ↑

(Backing file) (Backing file format)

6 / 34

slide-9
SLIDE 9

Accessing disk images opened by QEMU

base

  • verlay1
  • verlay2

(Live QEMU)

Disk images that are opened by QEMU should not be accessed by external tools – QEMU offers equivalent monitor commands For safe, read-only access, use libguestfs: $ guestfish –ro -i -a disk.img

7 / 34

slide-10
SLIDE 10

Disk image locking

Prevents two concurrent writers to a disk image – Using Linux Open File Description (OFD) Locks To query an in use disk image: $ qemu-img info foo.qcow2 ––force-share

8 / 34

slide-11
SLIDE 11

Disk image locking

Prevents two concurrent writers to a disk image – Using Linux Open File Description (OFD) Locks To query an in use disk image: $ qemu-img info foo.qcow2 ––force-share ––force-share

8 / 34

Allows read-only access to an active disk; may return stale data

slide-12
SLIDE 12

Disk image locking

Prevents two concurrent writers to a disk image – Using Linux Open File Description (OFD) Locks To query an in use disk image: $ qemu-img info foo.qcow2 ––force-share When launching QEMU (2.10+): $ qemu-system-x86_64\

  • blockdev driver=qcow2,file.driver=file\

file.filename=./foo.qcow2,file.locking=auto,\ [...]

8 / 34

slide-13
SLIDE 13

Disk image locking

Prevents two concurrent writers to a disk image – Using Linux Open File Description (OFD) Locks To query an in use disk image: $ qemu-img info foo.qcow2 ––force-share When launching QEMU (2.10+): $ qemu-system-x86_64\

  • blockdev driver=qcow2,file.driver=file\

file.filename=./foo.qcow2,file.locking=auto file.locking=auto,\ [...]

8 / 34

Defaults to OFD locking

  • n Linux 3.15+
slide-14
SLIDE 14

Part II

Primer on operating QEMU

9 / 34

slide-15
SLIDE 15

QEMU’s QMP monitor

Provides a JSON RPC interface – Send commands to query / modify VM state – QMP (asynchronous) events on certain state changes If you zoom into libvirt-generated QEMU command-line: /usr/bin/qemu-system-x86_64 [...]\

  • chardev socket,id=charmonitor,\

path=[...]/monitor.sock,server,nowait\

  • mon chardev=charmonitor,id=monitor,mode=control

10 / 34

slide-16
SLIDE 16

QEMU’s QMP monitor

Provides a JSON RPC interface – Send commands to query / modify VM state – QMP (asynchronous) events on certain state changes If you zoom into libvirt-generated QEMU command-line: /usr/bin/qemu-system-x86_64 [...]\

  • chardev socket,id=charmonitor,\
  • chardev socket,id=charmonitor,\

path=[...]/monitor.sock,server,nowait\ path=[...]/monitor.sock,server,nowait\

  • mon chardev=charmonitor,id=monitor,mode=control
  • mon chardev=charmonitor,id=monitor,mode=control

10 / 34

UNIX stream socket setup for libvirt ← → QEMU communication

slide-17
SLIDE 17

QEMU’s QMP monitor

Provides a JSON RPC interface – Send commands to query / modify VM state – QMP (asynchronous) events on certain state changes If you zoom into libvirt-generated QEMU command-line: /usr/bin/qemu-system-x86_64 [...]\

  • chardev socket,id=charmonitor,\

path=[...]/monitor.sock,server,nowait\

  • mon chardev=charmonitor,id=monitor,mode=control

Shorthand for the above:

  • qmp unix:./qmp-sock,server,nowait

10 / 34

slide-18
SLIDE 18

Interacting with the QMP monitor

$ socat UNIX:./qmp-sock \ READLINE,history=$HOME/.qmp_history \ {"QMP": {"version": {"qemu": {"micro": 50, "minor": 11, "major": 2}, "package": "(v2.11.0-355-g281f327487)"}, "capabilities": []}} {"execute": "qmp_capabilities"} {"return": {}} {"execute": "query-status"} {"return": {"status": "running", "singlestep": false, "running": true} }

Send arbitrary commands: query-kvm, blockdev-backup, ...

11 / 34

slide-19
SLIDE 19

Interacting with the QMP monitor

$ socat UNIX:./qmp-sock \ READLINE,history=$HOME/.qmp_history \ {"QMP": {"version": {"QMP": {"version": {"qemu": {"micro": 50, "minor": 11, "major": 2}, {"qemu": {"micro": 50, "minor": 11, "major": 2}, "package": "(v2.11.0-355-g281f327487)"}, "package": "(v2.11.0-355-g281f327487)"}, "capabilities": []}} "capabilities": []}} {"execute": "qmp_capabilities"} {"return": {}} {"execute": "query-status"} {"return": {"status": "running", "singlestep": false, "running": true} }

Send arbitrary commands: query-kvm, blockdev-backup, ...

11 / 34

Indicates successful connection

slide-20
SLIDE 20

Interacting with the QMP monitor

$ socat UNIX:./qmp-sock \ READLINE,history=$HOME/.qmp_history \ {"QMP": {"version": {"qemu": {"micro": 50, "minor": 11, "major": 2}, "package": "(v2.11.0-355-g281f327487)"}, "capabilities": []}} {"execute": "qmp_capabilities qmp_capabilities"} {"return": {}} {"execute": "query-status"} {"return": {"status": "running", "singlestep": false, "running": true} }

Send arbitrary commands: query-kvm, blockdev-backup, ...

11 / 34

Prerequisite

slide-21
SLIDE 21

Interacting with the QMP monitor

$ socat UNIX:./qmp-sock \ READLINE,history=$HOME/.qmp_history \ {"QMP": {"version": {"qemu": {"micro": 50, "minor": 11, "major": 2}, "package": "(v2.11.0-355-g281f327487)"}, "capabilities": []}} {"execute": "qmp_capabilities"} {"return": {}} {"execute": "query-status query-status"} {"return": {"status": "running", "singlestep": false, "running": true} }

Send arbitrary commands: query-kvm, blockdev-backup, ...

11 / 34

Issue regular QMP commands

slide-22
SLIDE 22

Interacting with the QMP monitor

$ socat UNIX:./qmp-sock \ READLINE,history=$HOME/.qmp_history \ {"QMP": {"version": {"qemu": {"micro": 50, "minor": 11, "major": 2}, "package": "(v2.11.0-355-g281f327487)"}, "capabilities": []}} {"execute": "qmp_capabilities"} {"return": {}} {"execute": "query-status"} {"return": {"status": "running", "singlestep": false, "running": true} }

Send arbitrary commands: query-kvm, blockdev-backup, ...

11 / 34

Invoking JSON manually is no fun — thankfully, libvirt automates it all

slide-23
SLIDE 23

Other ways to interact with QMP monitor

qmp-shell: A low-level tool located in QEMU source. Takes key-value pairs (& JSON dicts): $ qmp-shell -v -p ./qmp-sock

(QEMU) block-job-complete device=virtio1

virsh: libvirt’s shell interface $ virsh qemu-monitor-command \ vm1 ––pretty '{"execute":"query-kvm"}'

NB: Modifying VM state behind libvirt’s back voids support warranty!

Useful for test / development

12 / 34

slide-24
SLIDE 24

Part III

Configuring block devices

13 / 34

slide-25
SLIDE 25

Aspects of a QEMU block device QEMU block devices have a notion of a:

Frontend — guest-visible devices (IDE, SCSI, virtio-blk, ...)

  • device: command-line

device_add: run-time; like any other type of guest device Backend — block drivers (NBD, qcow2, raw, ...)

  • drive (legacy) / -blockdev: command-line

blockdev-add: run-time

14 / 34

slide-26
SLIDE 26

Configure on command-line: -blockdev

Provides fine-grained control over configuring block devices Since QEMU 2.9; successor to -drive option E.g. Attach a qcow2 disk to a virtio-blk guest device: $ qemu-system-x86_64 [...]\

  • blockdev node-name=node1,driver=qcow2,\

file.driver=file,file.filename=./base.qcow2\

  • device virtio-blk,drive=node-Base

15 / 34

slide-27
SLIDE 27

Configure on command-line: -blockdev

Provides fine-grained control over configuring block devices Since QEMU 2.9; successor to -drive option E.g. Attach a qcow2 disk to a virtio-blk guest device: $ qemu-system-x86_64 [...]\

  • blockdev node-name=node1,driver=qcow2,\
  • blockdev node-name=node1,driver=qcow2,\

file.driver=file,file.filename=./base.qcow2\ file.driver=file,file.filename=./base.qcow2\

  • device virtio-blk,drive=node-Base

15 / 34

Configures the ‘backend’

slide-28
SLIDE 28

Configure on command-line: -blockdev

Provides fine-grained control over configuring block devices Since QEMU 2.9; successor to -drive option E.g. Attach a qcow2 disk to a virtio-blk guest device: $ qemu-system-x86_64 [...]\

  • blockdev node-name=node1,driver=qcow2,\

file.driver=file,file.filename=./base.qcow2\

  • device virtio-blk,drive=node-Base
  • device virtio-blk,drive=node-Base

15 / 34

Configures the ‘frontend’ (guest-visible device)

slide-29
SLIDE 29

Configure on command-line: -blockdev

Provides fine-grained control over configuring block devices Since QEMU 2.9; successor to -drive option E.g. Attach a qcow2 disk to a virtio-blk guest device: $ qemu-system-x86_64 [...]\

  • blockdev node-name=node1,driver=qcow2,\

file.driver=file,file.filename=./base.qcow2\

  • device virtio-blk,drive=node-Base

→ More details: Talks from previous KVM Forums

15 / 34

slide-30
SLIDE 30

Configure at run-time: QMP blockdev-add

Add a qcow2 block device at run-time: {"execute": "blockdev-add", "arguments": { "driver": "qcow2", "node-name": "node-A", "file": {"driver": "file", "filename": "disk-A.qcow2"}}} Command-line is 1:1 mapping of JSON (from above):

  • blockdev driver=qcow2,node-name=node-A,\

file.driver=file,file.filename=./disk-A.qcow2 → Here too, refer to previous KVM Forum talks

16 / 34

slide-31
SLIDE 31

Part IV

Live block operations

17 / 34

slide-32
SLIDE 32

blockdev-snapshot-sync: External snapshots

When invoked while the guest is running:

  • 1. the existing disk becomes the backing file; and
  • 2. a QCOW2 overlay is created to track new writes

Base image can be of any format; overlays must be QCOW2 Allows atomic live snapshot of multiple disks No guest downtime — snapshot creation is instantaneous

18 / 34

slide-33
SLIDE 33

blockdev-snapshot-sync: Example

base (Live QEMU)

Create an external snapshot (qmp-shell invocation): blockdev-snapshot-sync node-name=node-Base \ snapshot-file=overlay1.qcow2 \ snapshot-node-name=node-Overlay1

19 / 34

slide-34
SLIDE 34

blockdev-snapshot-sync: Example

base (Live QEMU)

Create an external snapshot (qmp-shell invocation): blockdev-snapshot-sync node-name=node-Base \ blockdev-snapshot-sync node-name=node-Base \ snapshot-file=overlay1.qcow2 \ snapshot-file=overlay1.qcow2 \ snapshot-node-name=node-Overlay1 snapshot-node-name=node-Overlay1

19 / 34

libvirt equivalent: $ virsh snapshot-create-as vm1 ––disk-only

slide-35
SLIDE 35

blockdev-snapshot-sync: Example

base (Live QEMU)

Create an external snapshot (qmp-shell invocation): blockdev-snapshot-sync node-name=node-Base \ snapshot-file=overlay1.qcow2 \ snapshot-node-name=node-Overlay1 Result:

base

  • verlay1

(Live QEMU)

19 / 34

slide-36
SLIDE 36

But...long image chains can get cumbersome

base

  • verlay1
  • verlay2
  • verlay3

(Live QEMU)

Problems:

  • Revert to external snapshot is non-trivial
  • Multiple files to track
  • I/O penalty with a long disk image chain

There are some solutions. . .

20 / 34

slide-37
SLIDE 37

commit: Live merge a disk image chain (1)

base

  • verlay1
  • verlay2
  • verlay3

(Live QEMU)

Problem: Shorten the chain of overlays Simplest case: Merge all of them into ’base’

base

  • verlay1
  • verlay2
  • verlay3

21 / 34

slide-38
SLIDE 38

commit: Live merge a disk image chain (2)

base

  • verlay1
  • verlay2
  • verlay3

Run-time invocation (using qmp-shell): blockdev-snapshot-sync [...] block-commit device=node-Overlay3 job-id=jobA block-job-complete device=jobA

22 / 34

slide-39
SLIDE 39

commit: Live merge a disk image chain (2)

base

  • verlay1
  • verlay2
  • verlay3

Run-time invocation (using qmp-shell): blockdev-snapshot-sync [...] blockdev-snapshot-sync [...] block-commit device=node-Overlay3 job-id=jobA block-job-complete device=jobA

22 / 34

Invoke it thrice – to create 3 overlays

slide-40
SLIDE 40

commit: Live merge a disk image chain (2)

base

  • verlay1
  • verlay2
  • verlay3

Run-time invocation (using qmp-shell): blockdev-snapshot-sync [...] block-commit device=node-Overlay3 job-id=jobA block-commit device=node-Overlay3 job-id=jobA block-job-complete device=jobA

22 / 34

Copy content from all the overlays into base

slide-41
SLIDE 41

commit: Live merge a disk image chain (2)

base

  • verlay1
  • verlay2
  • verlay3

Run-time invocation (using qmp-shell): blockdev-snapshot-sync [...] block-commit device=node-Overlay3 job-id=jobA block-job-complete device=jobA block-job-complete device=jobA

22 / 34

Gracefully end the ‘commit’ job, and pivot QEMU

slide-42
SLIDE 42

commit: Live merge a disk image chain (2)

base

  • verlay1
  • verlay2
  • verlay3

Run-time invocation (using qmp-shell): blockdev-snapshot-sync [...] blockdev-snapshot-sync [...] block-commit device=node-Overlay3 job-id=jobA block-commit device=node-Overlay3 job-id=jobA block-job-complete device=jobA block-job-complete device=jobA

22 / 34

libvirt equivalent: $ virsh blockcommit vm1 vda ––pivot

slide-43
SLIDE 43

commit: Live merge a disk image chain (3)

base

  • verlay1
  • verlay2
  • verlay3

Two phase (sync + pivot) operation = a consolidated disk:

base

  • verlay1
  • verlay2
  • verlay3

(Live QEMU) (invalid) (invalid) (invalid)

23 / 34

slide-44
SLIDE 44

stream: Copy from backing files to overlays

A bit similar to ‘commit’, but in the opposite direction Specifics: ‘stream’ operation is safe – data is being pulled forward Intermediate overlays remain valid – unlike ‘commit’ Intermediate image streaming (from QEMU 2.8+)

24 / 34

slide-45
SLIDE 45

mirror: Synchronize active disk to another image

base

  • verlay1
  • verlay2
  • verlay3

copy (Live QEMU)

Synchronization modes: ‘full’ – copy the entire chain ‘top’ – only from the topmost (active) image ‘none’ – copy only new writes from now on

25 / 34

slide-46
SLIDE 46

mirror: Operation

base

  • verlay1
  • verlay2
  • verlay3

copy (Live QEMU)

drive-mirror [...] target=copy1.qcow2 sync=full query-block-jobs block-job-complete device=virtio0

26 / 34

slide-47
SLIDE 47

QEMU NBD server

Network Block Device server — built into QEMU – Lets you export images while in use QMP commands: nbd-server-start addr={"type":"unix", "data":{"path":"./nbd-sock"}}} nbd-server-add device=TargetDisk writable=true nbd-server-stop → External program for offline use: qemu-nbd

27 / 34

slide-48
SLIDE 48

Combining ‘mirror’ and NBD

Use case: Live VM migration with non-shared storage

  • 1. Destination QEMU sets up the NBD server
  • 2. Source QEMU issues drive-mirror to synchronize disk(s):

{"execute": "drive-mirror",\ "arguments":{ "device": "disk0",\ "target": "nbd:dest:49153:exportname=disk0",\ "sync": "top", "mode": "existing" }

Details: qemu/docs/interop/live-block-operations.rst

28 / 34

slide-49
SLIDE 49

mirror + NBD: libvirt automation NBD-based live storage migration as done by libvirt: $ virsh migrate \ –-live \ ––verbose \ ––p2p \ ––copy-storage-all \ vm1 \ qemu+ssh://root@desthost/system Higher layers, such as OpenStack, use the equivalent libvirt Python APIs: migrateToURI[2,3]()

29 / 34

slide-50
SLIDE 50

backup: Point-in-time copy of a block device

Point-in-time: – For backup: when you start the operation – For mirror: when you end the sync (via block-job-complete) Synchronization modes for backup: – ‘top’, ‘full’, ‘none’ – ‘incremental’ տ (For incremental backups; WIP as of 2.12) qemu/docs/interop/bitmaps.rst

30 / 34

slide-51
SLIDE 51

Combining ‘backup’ and NBD

Use case: Examine guest I/O patterns

base

  • verlay1
  • verlay2
  • verlay3

copy (sync=none) (Live QEMU)

{"execute":"blockdev-backup", "arguments":{"device": "node-Overlay3", "target": "copy", "job-id": "job0", "sync": "none"}} [...] # Start NBD server and export the ‘copy’

31 / 34

slide-52
SLIDE 52

Summary commit: Move data from overlays into backing files stream: Move data from backing files into overlays mirror: Live VM migration with non-shared storage backup: Point-in-time copy

qemu/docs/interop/live-block-operations.rst qemu/docs/interop/bitmaps.rst

32 / 34

slide-53
SLIDE 53

References

Detailed docs with examples: Live Block Device Operations https://kashyapc.fedorapeople.org/QEMU-Docs/_build/html/docs/ live-block-operations.html "Incremental Backups - Good things come in small packages!" by John Snow https://fosdem.org/2017/schedule/event/backup_dr_incr_backups/ "Managing the New Block Layer" by Kevin Wolf & Max Reitz https://events.static.linuxfound.org/sites/events/files/slides/ talk_11.pdf "Backing Chain Management in libvirt and qemu" by Eric Blake http://events.linuxfoundation.org/sites/events/files/slides/ 2015-qcow2-expanded.pdf "Towards a more expressive and introspectable QEMU command line" https://events.static.linuxfound.org/sites/events/files/slides/ armbru-qapi-cmdline_1.pdf

33 / 34

slide-54
SLIDE 54

Questions?

34 / 34