Latest evolution of Linux IO stack, explained for database people - - PowerPoint PPT Presentation

latest evolution of linux io stack explained for database
SMART_READER_LITE
LIVE PREVIEW

Latest evolution of Linux IO stack, explained for database people - - PowerPoint PPT Presentation

Latest evolution of Linux IO stack, explained for database people Ilya Kosmodemiansky (ik@dataegret.com) Why this talk 2 Linux is a most common OS for databases Fast IO is essential for many workloads DBAs often run into IO problems


slide-1
SLIDE 1

Latest evolution

  • f Linux IO stack,

explained for database people

Ilya Kosmodemiansky (ik@dataegret.com)

slide-2
SLIDE 2

Why this talk 2

  • Linux is a most common OS for databases
  • Fast IO is essential for many workloads
  • DBAs often run into IO problems
  • Most of the information on topic is written by kernel developers (for

kernel developers) or is checklist-style

  • Last years Linux IO stack (re)development is very fast

dataegret.com

slide-3
SLIDE 3

Bird eye view 3

  • How a generic database or PostgreSQL interacts with IO
  • Linux IO as we used to understand it
  • What is new?

dataegret.com

slide-4
SLIDE 4

Well, typical database 4

DRAM Disks Shared memory

Database Linux

Page cache

User space Kernel space

WAL buffer

WAL Datafile

dataegret.com

slide-5
SLIDE 5

It is easy, while read only 5

DRAM Disks shared_buffers work_mem work_mem work_mem

PostgreSQL Linux

Page cache

single worker select foo from bar where foo=3 dataegret.com

slide-6
SLIDE 6

Writes add complexity 6

DRAM Disks shared_buffers

PostgreSQL Linux

Page cache

Page Dirty page

datafile WAL

update foo set bar=buzz WAL buffer

worker

dataegret.com

slide-7
SLIDE 7

Key things about modern database workload 7

  • Shared memory segment can be very large
  • Keeping in-memory pages synchronized with disk generates huge IO
  • WAL should be written fast and safe
  • One and every layer of OS IO stack involved

dataegret.com

slide-8
SLIDE 8

What generates most of IO in case of PostgreSQL 8

  • Keeping pages synchronized: checkpoints and other sync mechanisms
  • Autovacuum can generate a lot of IO
  • Cache refill
  • Worker IO (Sorts and hashing, as well as worst-case fsyncs)

dataegret.com

slide-9
SLIDE 9

The main IO problem for databases for a long time was 9

  • How to maximize page throughput between memory and disks
  • Things involved:

◮ Disks ◮ Memory ◮ CPU ◮ IO Schedulers ◮ Filesystems ◮ Database itself

  • IO problems for databases are not always only about disks

dataegret.com

slide-10
SLIDE 10

The main IO problem for databases for a long time was 10

  • How to maximize page throughput between memory and disks
  • Things involved:

◮ Disks - because latency of this part was very significant ◮ Memory ◮ CPU ◮ IO Schedulers ◮ Filesystems ◮ Database itself

  • IO problems for databases are not always only about disks

dataegret.com

slide-11
SLIDE 11

Throughput and latency 11

  • Maximizing IO performance through maximizing throughput is easy up to

certain moment

  • Minimizing latency of IO usually is tricky
  • With large adoption of proper SSDs, hardware latency dropped

dramatically

dataegret.com

slide-12
SLIDE 12

Because of high latency of rotating disks 12

  • Database development was concentrated around maximization of

throughput

  • So did Linux kernel development
  • Many rotating disks era IO optimization techniques are not that good for

SSDs

dataegret.com

slide-13
SLIDE 13

IO stack (as it used to look like) 13

Database memory Page cache VFS EXT4 Block device interface Disks Direct IO BIO Layer Block IO Request Layer

Elevator/IO Scheduler

dataegret.com

slide-14
SLIDE 14

IO stack (as it used to look like) 14

Database memory Page cache VFS EXT4 Block device interface Disks Direct IO BIO Layer Block IO Request Layer

Elevator/IO Scheduler

  • /
  • dataegret.com
slide-15
SLIDE 15

Elevators: before 2.6 kernel 15

  • Linus Elevator - the only one in times of 2.4
  • merging and sorting request queues
  • Had lots of problems

dataegret.com

slide-16
SLIDE 16

Elevators: between 2.6 and early 3.* 16

  • CFQ - universal, default one
  • deadline - rotating disks
  • noop or none - then disks throughput is so high, that it can not benefit

from keen scheduling

◮ PCIe SSDs ◮ SAN disk arrays

dataegret.com

slide-17
SLIDE 17

Elevators: 3.13 and newer 17

  • Effectiveness of noop clearly shows ineffectiveness of others, or

ineffectiveness of smart sorting as an approach

  • blk-mq scheduler was merged into 3.13 kernel
  • Much better deals with parallelism of modern SSD - basically separate IO

queue for each CPU

  • The best option for good SSDs right now
  • blk-mq and NVMe driver is actually more than scheduler, but a

system aimed to substitute whole request layer

dataegret.com

slide-18
SLIDE 18

Old approach to elevators 18

Disks CPU

Elevator Queue

Disks CPU1

Elevator Queue

CPU2

Elevator Queue Elevator Queue

dataegret.com

slide-19
SLIDE 19

New approach to elevators 19

Disks

sw queue

CPU 1

sw queue

CPU 2

sw queue

CPU 3

sw queue

CPU 4

hw queue hw queue

dataegret.com

slide-20
SLIDE 20

IO stack (with blk-mq) 20

Database memory Page cache VFS EXT4 Disks Direct IO BIO Layer Block IO

Kyber/BFQ IO schedullers

blk-mq NVMe driver

dataegret.com

slide-21
SLIDE 21

Good diagram on Linux IO stack 21

  • https://www.thomas-

krenn.com/en/wiki/Linux_Storage_Stack_Diagram

  • Regular updates
  • Some things are difficult to draw, but it is a complex topic

dataegret.com

slide-22
SLIDE 22

Non Volatile Memory Express or NVMe 22

  • Sets of standards, which helps to use modern SSDs more effectively
  • For Linux it is first of all NVMe driver (or subsystem)
  • Most common example of NVMe SSDs are PCIe NAND drives
  • With NVMe v.5 (currently 3 is ready for production) can work up to

32GB/sec

  • Are databases NVMe ready?

dataegret.com

slide-23
SLIDE 23

Latest development on new block layer 23

  • IO polling
  • New IO schedulers Kyber and BFQ (Kernel 4.12)
  • IO tagging
  • Direct IO improvements

dataegret.com

slide-24
SLIDE 24

Notes on Direct IO 24

  • Currently PostgreSQL supports DirectIO only for WAL, but it is unusable
  • n practice
  • Requires a lots of development
  • Very OS specific
  • Allows to use specific things, like O_ATOMIC
  • PostgreSQL is the only database, which is not using Direct IO

dataegret.com

slide-25
SLIDE 25

Questions? 25

ik@dataegret.com

dataegret.com