Continuous MySQL Restores Divij Rajkumar (divij@fb.com) Production - - PowerPoint PPT Presentation
Continuous MySQL Restores Divij Rajkumar (divij@fb.com) Production - - PowerPoint PPT Presentation
Continuous MySQL Restores Divij Rajkumar (divij@fb.com) Production Engineer, MySQL Infrastructure, Facebook Continuous Restores Why? Verify backup integrity Havent tested your backups? You dont have them Understand resource
Divij Rajkumar (divij@fb.com)
Production Engineer, MySQL Infrastructure, Facebook
Continuous MySQL Restores
- Verify backup integrity
- Haven’t tested your backups? You don’t have them
- Understand resource requirements for restores
- Time, server capacity, network capacity
- Test restore orchestration
- Don’t panic during a disaster event
Continuous Restores – Why?
Backups
- Everything, every day
Backups
server
shar d shar d shar d shar d shar d shar d
mysq l
shar d shar d shar d shar d shar d shar d
mysq l
Shard 86 User 1 User 2 . . User 1000
- Everything, every day
- Tens of thousands of servers, many, many shards
- Stored in HDFS
- 3 types:
- Full dumps
- Differential backups
- Binary Logs
Backups
- mysqldump
- --single-transaction
- --set-gtid-purged=COMMENTED, stored in metadata
- github.com/facebook/mysql-5.6
- Per shard, not per instance
- Shards aren’t pinned to an instance
- Easier to locate
Backups – Full dumps
- Logical, not physical
- Easy single-table restore
- No indexes backed up, reduced size
- Easy to debug
- Better compression
- Every 5 days
Backups – Full dumps
- Diff between 2 full dumps
- 2 files generated - Rows deleted, Rows inserted
- Based off of existing dump in HDFS
- Take full dump -> compute diff -> upload diff
Backups – Differential backups
Backups – Differential backups
- Diff between 2 full dumps
- 2 files generated - Rows deleted, Rows inserted
- Based off of existing dump in HDFS
- Take full dump -> compute diff -> upload diff
- Possible because of logical full dumps
Backups – Differential backups
- Raw binlogs from mysql
- Record Previous-GTIDs in metadata
- Using mysqlbinlog:
- --start-position=4
- --stop-position=121
- Continuously uploaded
- Binlog Server to interface
Backups – Binary Logs
- Everything, every day
- Tens of thousands of servers, many, many shards
- Stored in HDFS
- 3 types:
- Full dumps
- Differential backups
- Binary Logs
Backups
Restore Infrastructure
Warchief
OR C
ORC DB Peon
MySQ L
Syn c Schedul e
CRT
Reque st
Restore Infrastructure
Peon
MySQ L
Peon
MySQ L
ORC - Peons
Restore Infrastructure - Peons
server
restored db restored db restored db
mysq l
restored db restored db restored db restored db
mysq l
peo n peo n
- Python process
- Exposes Thrift
interface
- Manages MySQL
Instance
- Multiple peons per host
- Work on restore jobs
- 1 database == 1 job
Restore Jobs – State Machine
- 1. SELECT – Select backup to restore
- 2. DOWNLOAD – Download backup to disk
- 3. LOAD – Load tables in parallel
- 4. VERIFY – Verify loaded full/diff dump
- 5. REPLAY – Replay binlogs
+ FAILED states
Restore Jobs – mysqldump index
- Custom index added to mysqldump output
- Locate tables using per-table byte offsets
- Preserved across diff backups
Restore Jobs – SELECT
- Pre-defined search range per job
- 1 day == restore most recent backup only
- Use metadata to ignore “bad” backups
- Only full/diff backups selected here
Restore Jobs – DOWNLOAD
- For full dumps:
- Download mysqldump from HDFS to disk
- For diff backups:
- Stream diffs + full dump to perform 3-way merge
- Store recreated mysqldump to disk
- Single mysqldump output at the end
Restore Jobs – LOAD
- Split backup into per-table streams using index
- Load tables in parallel
Restore Jobs – VERIFY
- Perform sanity checks on loaded data
- Possible verification methods:
- Compare checksums
- Compare number of tables loaded
- Best verification: replay binlogs
Restore Jobs – REPLAY
- Replay X seconds/minutes/hours worth of binlogs
- Enables point in time restores
- Verify loaded data can be written to
Restore Jobs – State Machine
SELECT DOWNLOAD LOAD TABLE LOAD TABLE LOAD TABLE VERIFY REPLAY LOAD TABLE LOAD TABLE
Restore Jobs – Pipeline
- Allow processing many jobs concurrently
- Peon has 1 handler per job state
- Limit per-state concurrency
Restore Jobs – Pipeline
DOWNLOA D LOA D LOAD TABLE VERIF Y REPLA Y SELEC T
SLOT 1 SLOT 2 SLOT 3 SLOT 4 SLOT 5
db1340 db2362 db142 db7 – t1 db7 – t2 db982 – t3 db982 – t9 db7 – t5 db7 db982
DOWNLOA D LOA D LOAD TABLE VERIF Y REPLA Y
db1963
SELEC T
db3794 db165 SLOT 1 SLOT 2 SLOT 3 SLOT 4 SLOT 5 db651 db1000
Restore Jobs – Pipeline
Restore Jobs – Pipeline
- Allow processing many jobs concurrently
- Peon has 1 handler per job state
- LOAD -> LOAD + LOAD_TABLE
- Limit per-state concurrency
- Slot capacity can be tuned
Restore Jobs – Selecting Binlogs
- Full/diff backups can be taken from slaves
- Binlogs always taken from master
- Need to uniquely identify transactions…
- GTIDs!
Restore Jobs – Selecting Binlogs
- Use GTIDs purged from mysqldump:
- --set-gtid-purged=COMMENTED
- Use GTIDs purged from mysqlbinlog:
- --start-position=4
- --stop-position=121
- These are GTIDs purged, not GTIDs contained
Restore Jobs – Selecting Binlogs
- GTIDs purged from LOAD state (dump_gtids):
- GTIDs purged in binlogs (binlog_gtids):
Contains 451-529 Contains 530-774 Contains 775-?
First superset First binlog to replay
Restore Jobs – Replaying Binlogs
- Filter binlog events with mysqlbinlog:
- --database
- --skip-gtids
- --skip-empty-trans
- github.com/facebook/mysql-5.6
- Last transaction to replay:
- --stop-datetime
db100 – t1 db100 – t2 db100 –t3
Restore Job – Lifecycle
DOWNLOA D LOA D LOAD TABLE VERIF Y REPLA Y SELEC T
SLOT 1 SLOT 2 SLOT 3 SLOT 4 SLOT 5 db100
CRT
Warchief
OR C
ORC DB Peon
MySQ L
Syn c Schedul e
CRT
Reque st
Restore Infrastructure
Peon
MySQ L
Peon
MySQ L
CRT – Continuous Restore Tier
- Create restore jobs for shards
- Periodically poll backup metadata to find backups
- Thrift call to Warchief to issue restore job
- Monitor restore progress
- Peons expose job and health stats via Thrift
- Detailed stats written to Scuba
CRT – Continuous Restore Tier
- Failure handling
- Retry on transient errors
- Mark “unstable”
- Notify oncall
- Orchestration component
- Resolve dependency graph
- Request restores in order of dependency
On-demand Restores
“oops, I accidentally ran UPDATE without WHERE”
“oops, I thought I was on my dev database”
“oops”
Warchief
OR C
ORC DB Peon
MySQ L
Syn c Schedul e
CRT
Reque st
Restore Infrastructure
Peon
MySQ L
Peon
MySQ L
UI/CLI
Reque st
Discover Edge Cases
War Stories
War Stories – Collation Change
- COLLATE=latin1_bin
- Required for MyRocks
- Schema change deployed
- Most diff backup restores fail overnight
- Thanks CRT Monitoring!
200x increase in DOWNLOAD_FAILED !
War Stories – Collation Change
- DifferentialBackup used default collation
- Sorting order broken in diff files
- Consistent sorting is required for 3-way merge
- 3-way merges failed for affected backups
- Wouldn’t have been caught without continuous
restores
- DifferentialBackup fixed to understand table collation
War Stories – RBR Binlogs
- Row-based replication deployed
- Lots of REPLAY failures overnight
“The database used for the current transaction
has changed since BEGIN. This is not supported!”
- 2 common patterns:
- Shards belonged to same database tier
- Failure only happened after master promotion
War Stories – RBR Binlogs
- SBR binlogs:
- Each BEGIN contains session database
- RBR binlogs:
- On master, each BEGIN contains session database
- On slave, BEGIN has no session database
War Stories – RBR Binlogs
- With mysqlbinlog --skip-gtids --skip-empty-trans
- Expected: session databases don’t change across events
- Binlogs not rotated after master promotion
- For some part of binlog, instance is a slave
- For some part of binlog, session database is empty
- For some part of binlog, session databases don’t match
“The database used for the current transaction has changed since BEGIN. This is not supported!”
War Stories – RBR Binlogs
- Possible solutions:
- Rotate binlogs during promotion
- Fix MySQL server behaviour
- Patch mysqlbinlog
- ???