MySQL and ZFS Yves Trudeau Yves Trudeau Percona Percona Who am - - PowerPoint PPT Presentation

mysql and zfs
SMART_READER_LITE
LIVE PREVIEW

MySQL and ZFS Yves Trudeau Yves Trudeau Percona Percona Who am - - PowerPoint PPT Presentation

MySQL and ZFS Yves Trudeau Yves Trudeau Percona Percona Who am I? Principal architect at Percona since 2009 (10 years already) Principal architect at Percona since 2009 (10 years already) With Sun Microsystems and MySQL


slide-1
SLIDE 1

MySQL and ZFS

Yves Trudeau Percona Yves Trudeau Percona

slide-2
SLIDE 2

2

Who am I?

  • Principal architect at Percona since 2009 (10 years already…)
  • With Sun Microsystems and MySQL before Percona
  • Physicist by training
  • I like to understand how things work
  • Principal architect at Percona since 2009 (10 years already…)
  • With Sun Microsystems and MySQL before Percona
  • Physicist by training
  • I like to understand how things work
slide-3
SLIDE 3

3

Why a talk on MySQL and ZFS?

  • I like both and I couldn’t decide…
  • They go along well
  • They share many points in common
  • I like both and I couldn’t decide…
  • They go along well
  • They share many points in common
slide-4
SLIDE 4

4

Plan

  • A quick tour of ZFS
  • Configuration guidelines for MySQL/ZFS
  • A real world example
  • A quick tour of ZFS
  • Configuration guidelines for MySQL/ZFS
  • A real world example
slide-5
SLIDE 5

A tour of ZFS

Click to add text Click to add text

slide-6
SLIDE 6

6

ZFS Highlights

  • Developed by Sun for Solaris
  • Now in many platforms
  • B-tree file storage, not just the directories
  • 128 bits pointers!!!
  • Files are split in records (b-tree leaves)
  • Records can be compressed
  • Copy-On-Write
  • Native encryption
  • Checksums and self-healing
  • Developed by Sun for Solaris
  • Now in many platforms
  • B-tree file storage, not just the directories
  • 128 bits pointers!!!
  • Files are split in records (b-tree leaves)
  • Records can be compressed
  • Copy-On-Write
  • Native encryption
  • Checksums and self-healing
slide-7
SLIDE 7

7

ZPOOL

  • Base unit of storage
  • Made of block devices or even just files
  • Disks, files, LV, mirror of disks, stripping, raidz, raidz2, raidz3…
  • Filesystems from zpool
  • A server → many zpools
  • SLOG: Separated log device
  • Cache devices, L2ARC
  • Base unit of storage
  • Made of block devices or even just files
  • Disks, files, LV, mirror of disks, stripping, raidz, raidz2, raidz3…
  • Filesystems from zpool
  • A server → many zpools
  • SLOG: Separated log device
  • Cache devices, L2ARC
slide-8
SLIDE 8

8

ZFS Filesystems

  • A filesystem is:
  • 1. a profile of settings
  • 2. a mount point
  • 3. a snapshotable entity
  • Settings adapted → expected workload
  • Can be nested
  • Can be based on a snapshot (clone)
  • A filesystem is:
  • 1. a profile of settings
  • 2. a mount point
  • 3. a snapshotable entity
  • Settings adapted → expected workload
  • Can be nested
  • Can be based on a snapshot (clone)
slide-9
SLIDE 9

9

ZVols

  • A block device from ZFS
  • Uber cool for virtual images
  • Steps for a 3 nodes cluster:
  • 1. Create a base image on a Zvol
  • 2. Snapshot the ZVol
  • 3. Clone snapshot 3 times (yields 3 new ZVols)
  • 4. Start 3 VMs using the new Zvols

<disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='native'/> <source dev='/dev/zvol/data/vms/kvm_PXC2'/>

  • A block device from ZFS
  • Uber cool for virtual images
  • Steps for a 3 nodes cluster:
  • 1. Create a base image on a Zvol
  • 2. Snapshot the ZVol
  • 3. Clone snapshot 3 times (yields 3 new ZVols)
  • 4. Start 3 VMs using the new Zvols

<disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='native'/> <source dev='/dev/zvol/data/vms/kvm_PXC2'/>

slide-10
SLIDE 10

1

The COW Magic

  • ZFS never overwrites directly
  • How ZFS overwrites a record?
  • 1. Writes it somewhere else
  • 2. De-references the old record → new record
  • 3. GC frees up the old record
  • Easy snapshot (think InnoDB MVCC)
  • Easy cloning
  • Wonderful for backups
  • Transactional!
  • ZFS never overwrites directly
  • How ZFS overwrites a record?
  • 1. Writes it somewhere else
  • 2. De-references the old record → new record
  • 3. GC frees up the old record
  • Easy snapshot (think InnoDB MVCC)
  • Easy cloning
  • Wonderful for backups
  • Transactional!
slide-11
SLIDE 11

1 1

ARC for Adaptive Replacement Cache

  • Sophisticated file cache
  • Configurable
  • Can store compressed data
  • Can be layered to disk (SSD/Flash) → L2ARC
  • Sophisticated file cache
  • Configurable
  • Can store compressed data
  • Can be layered to disk (SSD/Flash) → L2ARC
slide-12
SLIDE 12

1 2

Kernel Modules

  • Many configuration parameters (ls /sys/modules/zfs/parameters/)
  • Version 0.7.5 has 169…
  • Examples:

➔ zfs_arc_max: max size the ARC can be ➔ zfs_arc_meta_limit: Caps the metadata limit in ARC ➔ zfs_free_max_blocks: How fast the GC is going (InnoDB purge batch) ➔ l2arc_write_max: how fast you allow writes to L2ARC ➔ zfs_txg_timeout:max time span of a trx (think async writes)

  • Many configuration parameters (ls /sys/modules/zfs/parameters/)
  • Version 0.7.5 has 169…
  • Examples:

➔ zfs_arc_max: max size the ARC can be ➔ zfs_arc_meta_limit: Caps the metadata limit in ARC ➔ zfs_free_max_blocks: How fast the GC is going (InnoDB purge batch) ➔ l2arc_write_max: how fast you allow writes to L2ARC ➔ zfs_txg_timeout:max time span of a trx (think async writes)

slide-13
SLIDE 13

Configuration Guidelines for MySQL/ZFS

Click to add text Click to add text

slide-14
SLIDE 14

1 4

When Should You Use MySQL/ZFS?

  • For large compressible datasets
  • Challenges with backup (mix of engines)
  • Spare CPU capacity (compression)
  • Not IO bound
  • Active dataset fits L2ARC (compressed)
  • To save your flash devices...
  • For large compressible datasets
  • Challenges with backup (mix of engines)
  • Spare CPU capacity (compression)
  • Not IO bound
  • Active dataset fits L2ARC (compressed)
  • To save your flash devices...
slide-15
SLIDE 15

1 5

ZFS Configuration

  • 2 file systems for easy snapshots

➔ /var/lib/mysql → The parent, configured for sequential ops

✔ recordsize = 128KB ✔ compression can be more aggressive (gzip)

➔ /var/lib/mysql/data → The dataset

✔ recordsize = InnoDB page size (likely 16KB) ✔ fast compressor like lz4

  • Cache device (L2ARC) are great
  • SLOG devices help with high durability requirements
  • 2 file systems for easy snapshots

➔ /var/lib/mysql → The parent, configured for sequential ops

✔ recordsize = 128KB ✔ compression can be more aggressive (gzip)

➔ /var/lib/mysql/data → The dataset

✔ recordsize = InnoDB page size (likely 16KB) ✔ fast compressor like lz4

  • Cache device (L2ARC) are great
  • SLOG devices help with high durability requirements
slide-16
SLIDE 16

1 6

MySQL Configuration

  • innodb_doublewrite = 0
  • O_Direct?
  • InnoDB buffer pool? leave some Ram for the ARC

➔ no L2ARC → target ARC 0.5% of the data set ➔ 1TB of data ~ 5GB ARC ➔ Not a hard rule

  • Datadir = /var/lib/mysql/data
  • innodb_log_group_home_dir, log-bin, slow-log, relay-log to /var/lib/mysql
  • innodb_doublewrite = 0
  • O_Direct?
  • InnoDB buffer pool? leave some Ram for the ARC

➔ no L2ARC → target ARC 0.5% of the data set ➔ 1TB of data ~ 5GB ARC ➔ Not a hard rule

  • Datadir = /var/lib/mysql/data
  • innodb_log_group_home_dir, log-bin, slow-log, relay-log to /var/lib/mysql
slide-17
SLIDE 17

Real World Examples

Click to add text Click to add text

slide-18
SLIDE 18

1 8

A DR MySQL Replica in Google Cloud

XFS

  • n1-standard-2 (~68$/month)
  • 1TB SSD (~175$/month)

Total: 243$/month XFS

  • n1-standard-2 (~68$/month)
  • 1TB SSD (~175$/month)

Total: 243$/month ZFS

  • n1-standard-2 (~68$/month)
  • local 375GB Nvme (30$/month)
  • 500GB standard disk (20$/month)

Total: 118$/month ZFS

  • n1-standard-2 (~68$/month)
  • local 375GB Nvme (30$/month)
  • 500GB standard disk (20$/month)

Total: 118$/month Dataset 700GB (2.5x compressible), fair replication traffic, all dataset is active (random primary keys) Dataset 700GB (2.5x compressible), fair replication traffic, all dataset is active (random primary keys) ZFS saves 125$/month ZFS saves 125$/month

slide-19
SLIDE 19

1 9

A PXC Cluster in AWS

XFS/i3

  • 3x i3.4xlarge: $2700/month

XFS/EBS/io1

  • 3x r5.2xlarge: $1080/month
  • 3x 3TB 20k piops: $3900/month

XFS/i3

  • 3x i3.4xlarge: $2700/month

XFS/EBS/io1

  • 3x r5.2xlarge: $1080/month
  • 3x 3TB 20k piops: $3900/month

ZFS/i3

  • 3x i3.2xlarge: $1350/month
  • 2TB SC1: $50/month

ZFS/i3

  • 3x i3.2xlarge: $1350/month
  • 2TB SC1: $50/month

Dataset 2TB (2.5x compressible), needs more than 20k iops Dataset 2TB (2.5x compressible), needs more than 20k iops ZFS saves 1300$/month ZFS saves 1300$/month

slide-20
SLIDE 20

2

Will ZFS Really Perform Well?

Sysbench TPC-C workload emulation, GCE n1-standard-2 with local 375GB, Scale 300, 2 threads Sysbench TPC-C workload emulation, GCE n1-standard-2 with local 375GB, Scale 300, 2 threads XFS

  • 110 Trx/s
  • 3100 Qps
  • 284 GB on disk
  • 76% used

XFS

  • 110 Trx/s
  • 3100 Qps
  • 284 GB on disk
  • 76% used

ZFS/Lz4

  • 69 Trx/s
  • 1954 Qps
  • 102 GB on disk
  • 39% used

ZFS/Lz4

  • 69 Trx/s
  • 1954 Qps
  • 102 GB on disk
  • 39% used

ZFS/Gzip

  • 59 Trx/s
  • 1551 Qps
  • 85 GB on disk
  • 26% used

ZFS/Gzip

  • 59 Trx/s
  • 1551 Qps
  • 85 GB on disk
  • 26% used
slide-21
SLIDE 21

2 1

Will ZFS Really Perform Well With L2ARC?

Sysbench TPC-C workload emulation, GCE n1-standard-2 with 500GB normal disk, 375GB local disk, Scale 300, 2 threads Sysbench TPC-C workload emulation, GCE n1-standard-2 with 500GB normal disk, 375GB local disk, Scale 300, 2 threads XFS

  • 3 TRX/s
  • 87 QPS
  • 284 GB on disk
  • 70% used

XFS

  • 3 TRX/s
  • 87 QPS
  • 284 GB on disk
  • 70% used

ZFS/Lz4/L2ARC

  • 29 TRX/s (l2arc warm)
  • 830 QPS
  • 102 GB on disk
  • 21% used

ZFS/Lz4/L2ARC

  • 29 TRX/s (l2arc warm)
  • 830 QPS
  • 102 GB on disk
  • 21% used
slide-22
SLIDE 22

2 2

Conclusion

  • MySQL and ZFS are great together
  • Try, it is pretty easy
  • Careful, you’ll get addicted
  • MySQL and ZFS are great together
  • Try, it is pretty easy
  • Careful, you’ll get addicted
slide-23
SLIDE 23

Thank You to Our Sponsors

slide-24
SLIDE 24

24

Rate My Session