[PPT] - Continuous MySQL Restores Divij Rajkumar (divij@fb.com) Production PowerPoint Presentation

SLIDE 1

SLIDE 2

Divij Rajkumar (divij@fb.com)

Production Engineer, MySQL Infrastructure, Facebook

Continuous MySQL Restores

SLIDE 3

Verify backup integrity
Haven’t tested your backups? You don’t have them
Understand resource requirements for restores
Time, server capacity, network capacity
Test restore orchestration
Don’t panic during a disaster event

Continuous Restores – Why?

SLIDE 4

Backups

SLIDE 5

Everything, every day

Backups

SLIDE 6

server

shar d shar d shar d shar d shar d shar d

mysq l

shar d shar d shar d shar d shar d shar d

mysq l

Shard 86 User 1 User 2 . . User 1000

SLIDE 7

Everything, every day
Tens of thousands of servers, many, many shards
Stored in HDFS
3 types:
Full dumps
Differential backups
Binary Logs

Backups

SLIDE 8

mysqldump
--single-transaction
--set-gtid-purged=COMMENTED, stored in metadata
github.com/facebook/mysql-5.6
Per shard, not per instance
Shards aren’t pinned to an instance
Easier to locate

Backups – Full dumps

SLIDE 9

Logical, not physical
Easy single-table restore
No indexes backed up, reduced size
Easy to debug
Better compression
Every 5 days

Backups – Full dumps

SLIDE 10

Diff between 2 full dumps
2 files generated - Rows deleted, Rows inserted
Based off of existing dump in HDFS
Take full dump -> compute diff -> upload diff

Backups – Differential backups

SLIDE 11

Backups – Differential backups

SLIDE 12

Diff between 2 full dumps
2 files generated - Rows deleted, Rows inserted
Based off of existing dump in HDFS
Take full dump -> compute diff -> upload diff
Possible because of logical full dumps

Backups – Differential backups

SLIDE 13

Raw binlogs from mysql
Record Previous-GTIDs in metadata
Using mysqlbinlog:
--start-position=4
--stop-position=121
Continuously uploaded
Binlog Server to interface

Backups – Binary Logs

SLIDE 14

Everything, every day
Tens of thousands of servers, many, many shards
Stored in HDFS
3 types:
Full dumps
Differential backups
Binary Logs

Backups

SLIDE 15

Restore Infrastructure

SLIDE 16

Warchief

OR C

ORC DB Peon

MySQ L

Syn c Schedul e

CRT

Reque st

Restore Infrastructure

Peon

MySQ L

Peon

MySQ L

SLIDE 17

ORC - Peons

SLIDE 18

Restore Infrastructure - Peons

server

restored db restored db restored db

mysq l

restored db restored db restored db restored db

mysq l

peo n peo n

Python process
Exposes Thrift

interface

Manages MySQL

Instance

Multiple peons per host
Work on restore jobs
1 database == 1 job

SLIDE 19

Restore Jobs – State Machine

1. SELECT – Select backup to restore
2. DOWNLOAD – Download backup to disk
3. LOAD – Load tables in parallel
4. VERIFY – Verify loaded full/diff dump
5. REPLAY – Replay binlogs

+ FAILED states

SLIDE 20

Restore Jobs – mysqldump index

Custom index added to mysqldump output
Locate tables using per-table byte offsets
Preserved across diff backups

SLIDE 21

Restore Jobs – SELECT

Pre-defined search range per job
1 day == restore most recent backup only
Use metadata to ignore “bad” backups
Only full/diff backups selected here

SLIDE 22

Restore Jobs – DOWNLOAD

For full dumps:
Download mysqldump from HDFS to disk
For diff backups:
Stream diffs + full dump to perform 3-way merge
Store recreated mysqldump to disk
Single mysqldump output at the end

SLIDE 23

Restore Jobs – LOAD

Split backup into per-table streams using index
Load tables in parallel

SLIDE 24

Restore Jobs – VERIFY

Perform sanity checks on loaded data
Possible verification methods:
Compare checksums
Compare number of tables loaded
Best verification: replay binlogs

SLIDE 25

Restore Jobs – REPLAY

Replay X seconds/minutes/hours worth of binlogs
Enables point in time restores
Verify loaded data can be written to

SLIDE 26

Restore Jobs – State Machine

SELECT DOWNLOAD LOAD TABLE LOAD TABLE LOAD TABLE VERIFY REPLAY LOAD TABLE LOAD TABLE

SLIDE 27

Restore Jobs – Pipeline

Allow processing many jobs concurrently
Peon has 1 handler per job state
Limit per-state concurrency

SLIDE 28

Restore Jobs – Pipeline

DOWNLOA D LOA D LOAD TABLE VERIF Y REPLA Y SELEC T

SLOT 1 SLOT 2 SLOT 3 SLOT 4 SLOT 5

SLIDE 29

db1340 db2362 db142 db7 – t1 db7 – t2 db982 – t3 db982 – t9 db7 – t5 db7 db982

DOWNLOA D LOA D LOAD TABLE VERIF Y REPLA Y

db1963

SELEC T

db3794 db165 SLOT 1 SLOT 2 SLOT 3 SLOT 4 SLOT 5 db651 db1000

Restore Jobs – Pipeline

SLIDE 30

Restore Jobs – Pipeline

Allow processing many jobs concurrently
Peon has 1 handler per job state
LOAD -> LOAD + LOAD_TABLE
Limit per-state concurrency
Slot capacity can be tuned

SLIDE 31

Restore Jobs – Selecting Binlogs

Full/diff backups can be taken from slaves
Binlogs always taken from master
Need to uniquely identify transactions…
GTIDs!

SLIDE 32

Restore Jobs – Selecting Binlogs

Use GTIDs purged from mysqldump:
--set-gtid-purged=COMMENTED
Use GTIDs purged from mysqlbinlog:
--start-position=4
--stop-position=121
These are GTIDs purged, not GTIDs contained

SLIDE 33

Restore Jobs – Selecting Binlogs

GTIDs purged from LOAD state (dump_gtids):
GTIDs purged in binlogs (binlog_gtids):

Contains 451-529 Contains 530-774 Contains 775-?

First superset First binlog to replay

SLIDE 34

Restore Jobs – Replaying Binlogs

Filter binlog events with mysqlbinlog:
--database
--skip-gtids
--skip-empty-trans
github.com/facebook/mysql-5.6
Last transaction to replay:
--stop-datetime

SLIDE 35

db100 – t1 db100 – t2 db100 –t3

Restore Job – Lifecycle

DOWNLOA D LOA D LOAD TABLE VERIF Y REPLA Y SELEC T

SLOT 1 SLOT 2 SLOT 3 SLOT 4 SLOT 5 db100

SLIDE 36

CRT

SLIDE 37

Warchief

OR C

ORC DB Peon

MySQ L

Syn c Schedul e

CRT

Reque st

Restore Infrastructure

Peon

MySQ L

Peon

MySQ L

SLIDE 38

CRT – Continuous Restore Tier

Create restore jobs for shards
Periodically poll backup metadata to find backups
Thrift call to Warchief to issue restore job
Monitor restore progress
Peons expose job and health stats via Thrift
Detailed stats written to Scuba

SLIDE 39

CRT – Continuous Restore Tier

Failure handling
Retry on transient errors
Mark “unstable”
Notify oncall
Orchestration component
Resolve dependency graph
Request restores in order of dependency

SLIDE 40

On-demand Restores

SLIDE 41

“oops, I accidentally ran UPDATE without WHERE”

SLIDE 42

“oops, I thought I was on my dev database”

SLIDE 43

“oops”

SLIDE 44

Warchief

OR C

ORC DB Peon

MySQ L

Syn c Schedul e

CRT

Reque st

Restore Infrastructure

Peon

MySQ L

Peon

MySQ L

UI/CLI

Reque st

SLIDE 45

Discover Edge Cases

SLIDE 46

War Stories

SLIDE 47

War Stories – Collation Change

COLLATE=latin1_bin
Required for MyRocks
Schema change deployed
Most diff backup restores fail overnight
Thanks CRT Monitoring!

SLIDE 48

200x increase in DOWNLOAD_FAILED !

SLIDE 49

War Stories – Collation Change

DifferentialBackup used default collation
Sorting order broken in diff files
Consistent sorting is required for 3-way merge
3-way merges failed for affected backups
Wouldn’t have been caught without continuous

restores

DifferentialBackup fixed to understand table collation

SLIDE 50

War Stories – RBR Binlogs

Row-based replication deployed
Lots of REPLAY failures overnight

“The database used for the current transaction

has changed since BEGIN. This is not supported!”

2 common patterns:
Shards belonged to same database tier
Failure only happened after master promotion

SLIDE 51

War Stories – RBR Binlogs

SBR binlogs:
Each BEGIN contains session database
RBR binlogs:
On master, each BEGIN contains session database
On slave, BEGIN has no session database

SLIDE 52

War Stories – RBR Binlogs

With mysqlbinlog --skip-gtids --skip-empty-trans
Expected: session databases don’t change across events
Binlogs not rotated after master promotion
For some part of binlog, instance is a slave
For some part of binlog, session database is empty
For some part of binlog, session databases don’t match

“The database used for the current transaction has changed since BEGIN. This is not supported!”

SLIDE 53

War Stories – RBR Binlogs

Possible solutions:
Rotate binlogs during promotion
Fix MySQL server behaviour
Patch mysqlbinlog
???

SLIDE 54

Divij Rajkumar (divij@fb.com)

Thank s!

SLIDE 55