Write Amplification: An Analysis of In-Memory Database Durability Techniques
Jaemyung Kim, Kenneth Salem, Khuzaima Daudjee
University of Waterloo
IMDM 2015
Write Amplification: An Analysis of In-Memory Database Durability - - PowerPoint PPT Presentation
Write Amplification: An Analysis of In-Memory Database Durability Techniques Jaemyung Kim , Kenneth Salem, Khuzaima Daudjee University of Waterloo IMDM 2015 Durability Matters OLTP IMDB Orders of magnitude faster! I/O is unavoidable! (ACID,
Jaemyung Kim, Kenneth Salem, Khuzaima Daudjee
University of Waterloo
IMDM 2015
OLTP IMDB
I/O is unavoidable! (ACID, Durability) Orders of magnitude faster! Write I/O efficiency of in-memory DBMS is an important issue.
cliparts from openclipart.org 2
In-memory Database Persistent Storage On-disk Database
λ P λ
Application
3
Quantify and compare the I/O efficiency of the persistent storage management schemes Provide us with some insight into the different natures of update-in-place and copy-on-write storage managers Lower cost for operating a database management system (contributed by improved I/O efficiency) Lead to better system performance in situations that I/O capacity is constrained (restart recovery) The following is not our goals:
Emulate a specific storage manager implementation Compare specific implemenations: e.g., Hekaton is better than H-Store
4
Two broad classes: Update In-Place and Copy-On-Write Update In-Place (UIP)
UIP: conventional page-based (e.g., Shore-MT)
random writes for checkpointing device sensitive: e.g., HDD vs. SSD
UIP-S: snapshot checkpointing (e.g., H-Store, SiloR)
Copy-On-Write (COW)
COW-D: logging only (log-structured) database (e.g., Hekaton) COW-M: log-structured memory and disk datbases (e.g., RAMCloud)
5
Example Space Constraint (α) = 1.2×DBSize, PageSize=2 Example Update Sequence: I,D,B A B C D E F G H I J A B C D E F G H I J 0 0
DB LOG
Memory: Disk: LOG I/O History: DB I/O History: I/O Per Update = #DBIO+#LogIO
#Updates
=
6
Example Space Constraint (α) = 1.2×DBSize, PageSize=2 Example Update Sequence: I,D,B A B C D E F G H I J A B C D E F G H I J I
DB LOG
Memory: Disk: LOG I/O History: I DB I/O History: I/O Per Update = #DBIO+#LogIO
#Updates
= 0+1
1
= 1
6
Example Space Constraint (α) = 1.2×DBSize, PageSize=2 Example Update Sequence: I,D,B A B C D E F G H I J A B C D E F G H I J I D
DB LOG
Memory: Disk: LOG I/O History: I,D DB I/O History: I/O Per Update = #DBIO+#LogIO
#Updates
= 0+2
2
= 1
6
Example Space Constraint (α) = 1.2×DBSize, PageSize=2 Example Update Sequence: I,D,B A B C D E F G H I J A B C D E F G H I J 0 0
DB LOG
Memory: Disk: LOG I/O History: I,D DB I/O History: C,D,I,J I/O Per Update = #DBIO+#LogIO
#Updates
= 4+2
2
= 3
6
Example Space Constraint (α) = 1.2×DBSize, PageSize=2 Example Update Sequence: I,D,B A B C D E F G H I J A B C D E F G H I J B 0
DB LOG
Memory: Disk: LOG I/O History: I,D,B DB I/O History: C,D,I,J I/O Per Update = #DBIO+#LogIO
#Updates
= 4+3
3
≈ 2.33
6
Example Space Constraint (α) = 1.2×DBSize, PageSize=2 Example Update Sequence: I,D,B A B C D E F G H I J A B C D E F G H I J B 0
DB LOG
Memory: Disk: LOG I/O History: I,D,B DB I/O History: C,D,I,J I/O Per Update = #DBIO+#LogIO
#Updates
= 4+3
3
≈ 2.33
6
Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B A B C D E F G H I J A B C D E F G H I J I D
DB LOG
Memory: Disk: LOG I/O History: I,D DB I/O History: I/O Per Update = #DBIO+#LogIO
#Updates
= 0+2
2
= 1
7
Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B A B C D E F G H I J A B C D E F G H I J 0 0
DB LOG
Memory: Disk: LOG I/O History: I,D DB I/O History: A,B,C,D,E,F,G,H,I,J I/O Per Update = #DBIO+#LogIO
#Updates
= 10+2
2
= 6
7
Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B A B C D E F G H I J A B C D E F G H I J B 0
DB LOG
Memory: Disk: LOG I/O History: I,D,B DB I/O History: A,B,C,D,E,F,G,H,I,J I/O Per Update = #DBIO+#LogIO
#Updates
= 10+3
3
≈ 4.33
7
Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B A B C D E F G H I J A B C D E F G H I J B 0
DB LOG
Memory: Disk: LOG I/O History: I,D,B DB I/O History: A,B,C,D,E,F,G,H,I,J I/O Per Update = #DBIO+#LogIO
#Updates
= 10+3
3
≈ 4.33
7
Two broad classes: Update In-Place and Copy-On-Write Update In-Place (UIP)
UIP: conventional page-based (e.g., Shore-MT)
random writes for checkpointing device sensitive: e.g., HDD vs. SSD
UIP-S: snapshot checkpointing (e.g., H-Store, SiloR)
Copy-On-Write (COW)
COW-D: logging only (log-structured) database (e.g., Hekaton) COW-M: log-structured memory and disk datbases (e.g., RAMCloud)
8
Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B A B C D E F G H I J H A E I F G D C B J 0 0
Log-structured DB
Memory: Disk: DB I/O History:
Read: Write:
I/O Per Update = #ReadIO+#WriteIO
#Updates
=
9
Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B A B C D E F G H I J H A E I F G D C B J I
Log-structured DB
Memory: Disk: DB I/O History:
Read: Write: I
I/O Per Update = #ReadIO+#WriteIO
#Updates
= 0+1
1
= 1
9
Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B A B C D E F G H I J H A E I F G D C B J I D
Log-structured DB
Memory: Disk: DB I/O History:
Read: Write: I,D
I/O Per Update = #ReadIO+#WriteIO
#Updates
= 0+2
2
= 1
9
Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B A B C D E F G H I J I F G D C B J I D H A E
Log-structured DB
Memory: Disk: DB I/O History:
Read: H,A,E Write: I,D,H,A,E
I/O Per Update = #ReadIO+#WriteIO
#Updates
= 3+5
2
= 4
9
Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B A B C D E F G H I J F G D C B J I D H A E 0
Log-structured DB
Memory: Disk: DB I/O History:
Read: H,A,E,I Write: I,D,H,A,E
I/O Per Update = #ReadIO+#WriteIO
#Updates
= 4+5
2
= 4.5
9
Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B A B C D E F G H I J F G D C B J I D H A E B
Log-structured DB
Memory: Disk: DB I/O History:
Read: H,A,E,I Write: I,D,H,A,E,B
I/O Per Update = #ReadIO+#WriteIO
#Updates
= 4+6
3
≈ 3.33
9
Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B A B C D E F G H I J F G D C B J I D H A E B
Log-structured DB
Memory: Disk: DB I/O History:
Read: H,A,E,I Write: I,D,H,A,E,B
I/O Per Update = #ReadIO+#WriteIO
#Updates
= 4+6
3
≈ 3.33
9
Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B H A E I F G D C B J I H A E I F G D C B J 0 0
Log-structured IMDB Log-structured DB
Memory: Disk: DB I/O History:
Write:
I/O Per Update = #WriteIO
#Updates =
10
Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B H A E F G D C B J I D H A E I F G D C B J I
Log-structured IMDB Log-structured DB
Memory: Disk: DB I/O History:
Write: I
I/O Per Update = #WriteIO
#Updates = 1 1 = 1
10
Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B H A E F G C B J I D B H A E I F G D C B J I D
Log-structured IMDB Log-structured DB
Memory: Disk: DB I/O History:
Write: I,D
I/O Per Update = #WriteIO
#Updates = 2 2 = 1
10
Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B F G C B J I D H A E B F G D C B J I D H A E 0
Log-structured IMDB Log-structured DB
Memory: Disk: DB I/O History:
Write: I,D,H,A,E
I/O Per Update = #WriteIO
#Updates = 5 2 = 2.5
10
Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B F G C J I D H A E B F G D C B J I D H A E B
Log-structured IMDB Log-structured DB
Memory: Disk: DB I/O History:
Write: I,D,H,A,E,B
I/O Per Update = #WriteIO
#Updates = 6 3 = 2
10
Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B F G C J I D H A E B F G D C B J I D H A E B
Log-structured IMDB Log-structured DB
Memory: Disk: DB I/O History:
Write: I,D,H,A,E,B
I/O Per Update = #WriteIO
#Updates = 6 3 = 2
10
Compare UIP and COW Analyze the effect of persistent storage utilization (α) Analyze the effect of update workload (uniform vs. skew) Analyze the effect of page size on UIP
Find best page size for storage-specific performance characteristic
Analyze the effect of SSD vs. HDD persistent storage
Weight random and sequential I/O differently, depending on the device type
In-memory Database Persistent Storage On-disk Database
λ P λ
Application
I/O Per Update = λP
λ
11
1.2 1.4 1.6 1.8 2.0 1 2 5 10 20 α IO Per Update
UIP(SSD) UIP(HDD)−SKEW UIP(SSD)−SKEW UIP−S
HDD SSD 12
1.2 1.4 1.6 1.8 2.0 1 2 5 10 20 α IO Per Update
UIP(SSD) UIP(HDD)−SKEW UIP(SSD)−SKEW UIP−S
HDD SSD skew 12
1.2 1.4 1.6 1.8 2.0 1 2 5 10 20 α IO Per Update
UIP(SSD) UIP(HDD)−SKEW UIP(SSD)−SKEW UIP−S
HDD SSD skew skew 12
1.2 1.4 1.6 1.8 2.0 1 2 5 10 20 α IO Per Update
UIP(SSD) UIP(HDD)−SKEW UIP(SSD)−SKEW UIP−S
UIP-S 12
1.2 1.4 1.6 1.8 2.0 1 2 5 10 20 α IO Per Update
COW−M UIP−S
COW-D 13
1.2 1.4 1.6 1.8 2.0 1 2 5 10 20 α IO Per Update
COW−M UIP−S
COW-D COW-M 13
1.2 1.4 1.6 1.8 2.0 1 2 5 10 20 α IO Per Update
COW−M UIP−S
COW-D COW-M UIP-S 14
Write amplification model for in-memory database storage managers Quantify and compare the I/O efficiency of two broad classes
UIP (page-level)
commonly used in disk-based database effective when the capacity of persistent storage (or recovery time) is tightly constrained and/or highly skewed (e.g., 99:1)
UIP-S
appealing option among UIPs (only sequential writes) insensitive to update skew
COW-M
the most I/O efficient (lowest write amplification) in many settings require a log structure in memory that mirrors the log structure in persistent storage
15
16