Linux is a registered trademark of Linus Torvalds.
Evaluating storage APIs for QEMU
Anthony Liguori – aliguori@us.ibm.com Open Virtualization IBM Linux Technology Center Linux Plumbers Conference 2009
Evaluating storage APIs for QEMU Anthony Liguori - - PowerPoint PPT Presentation
Evaluating storage APIs for QEMU Anthony Liguori aliguori@us.ibm.com Open Virtualization IBM Linux Technology Center Linux Plumbers Conference 2009 Linux is a registered trademark of Linus Torvalds. The V-Word QEMU is used by Xen and
Linux is a registered trademark of Linus Torvalds.
Anthony Liguori – aliguori@us.ibm.com Open Virtualization IBM Linux Technology Center Linux Plumbers Conference 2009
– this is not a virtualization talk
– Should Just Work
– Workload cannot run while processing I/O
– I/O performance is terrible – Because workload doesn't run while waiting
– No more horrendous CPU overhead
– Can batch requests – Supports async notification via signals
– New APIs must be accepted by POSIX before
– Supports scatter/gather requests – Can submit multiple requests at once
– Must use special blocking function – Signal support added – Eventfd support added – Neither mechanism is probe-able in software
– Libaio spent a good period of time in an
– Usually, O_DIRECT
– Require a user to enable linux-aio – Be extremely conversation and limit
– Thread1: lseek -> readv – Thread2: lseek -> (race) -> writev
– We now have zero copy and simultaneous
– Tagging semantics don't map very well
– Each thread is considered a different IO
– Fixable with CLONE_IO – not exposed
– Some attempts at improving upstream
– Gives us better performance – Only use with block devices – Lose features such as host page cache
– For certain configurations, like c _ _ _ d,
– Most users use file backed images
– Good compromise of performance and
– But we know we can do better
– Avoid thread creation when request can
– Lighter weight threads – Potentially better thread pool management
– No clear benefit today over userspace thread
– Seems easier to merge upstream
– Complex ability to chain system calls without
– Seems to have lost merge momentum
– Neither are very useful for our workloads
– Neither help request tagging as request
– Still not obvious how to extend
– Both have clear benefits though
– Using confusing terms like “in-kernel
– It would be much better to fix the generic
– It's a battle we're losing so far
Linux is a registered trademark of Linus Torvalds.
Evaluating storage APIs for QEMU
Anthony Liguori – aliguori@us.ibm.com Open Virtualization IBM Linux Technology Center Linux Plumbers Conference 2009
The V-Word
– this is not a virtualization talk
that can run a variety of “workloads”
about how they access storage
demands
performance by default
– Should Just Work
We want
requirements
processing
Hello World
Posix read()/write()
read()/write()
– Workload cannot run while processing I/O
request
– I/O performance is terrible – Because workload doesn't run while waiting
for I/O, CPU performance is terrible too
Worker thread
First improvement
– No more horrendous CPU overhead
posix-aio
Upstream solution
– Can batch requests – Supports async notification via signals
Posix-aio shortcomings
– New APIs must be accepted by POSIX before
implementing in glibc (or so I was told)
to start another thread since this new thread would fight with the running thread for the resources.”
Ulrich about removing this restriction
Other posix-aio's
that's supported by a kernel module
you get a SEGV
kernel module is not loaded, and then a fallback mechanism that isn't posix-aio since a non-privileged user cannot load kernel modules
linux-aio: tux saves the day!
linux-aio
interface
– Supports scatter/gather requests – Can submit multiple requests at once
linux-aio shortcomings
– Must use special blocking function – Signal support added – Eventfd support added – Neither mechanism is probe-able in software
so you have to guess at compile time
– Libaio spent a good period of time in an
unmaintained state making eventfd support unavailable in even modern distros (SLES11)
– Usually, O_DIRECT
get no error, io_submit() just blocks
!@#!@@$#!!#@#!#@
linux-maybe-sometimes-aio
actually care about asynchronous IO requests
– Require a user to enable linux-aio – Be extremely conversation and limit
yourselves to things you know work today like O_DIRECT on a physical device
benchmarking tools
Let's fix posix-aio
Our own thread pool
arbitrary limits
descriptor because of seek/read race
– Thread1: lseek -> readv – Thread2: lseek -> (race) -> writev
– We now have zero copy and simultaneous
request processing
Shortcomings
the kernel
– Tagging semantics don't map very well
– Each thread is considered a different IO
context, CFQ waits for each thread to submit more requests resulting in long delays
– Fixable with CLONE_IO – not exposed
through pthreads
– Some attempts at improving upstream
Compromise
What we do today
– Gives us better performance – Only use with block devices – Lose features such as host page cache
sharing
– For certain configurations, like c _ _ _ d,
making use of the host page cache is absolutely critical
– Most users use file backed images
– Good compromise of performance and
features
– But we know we can do better
What's coming
acall/syslets
– Avoid thread creation when request can
complete immediately (nice)
– Lighter weight threads – Potentially better thread pool management
– No clear benefit today over userspace thread
pool other than introducing interfaces
– Seems easier to merge upstream
– Complex ability to chain system calls without
returning to userspace
– Seems to have lost merge momentum
acall/syslet shortcomings
semantic mapping issues
– Neither are very useful for our workloads
without preadv/pwritev
– Neither help request tagging as request
pool
– Still not obvious how to extend
preadv/pwritev paradigm to support tagging
– Both have clear benefits though
Overall uncertainty
acall/syslets
difficult though to begin
block IO interfaces to avoid these problems
– Using confusing terms like “in-kernel
paravirtual block device backend” to avoid real review
– It would be much better to fix the generic
interfaces so everyone benefits
– It's a battle we're losing so far
Questions