Write Amplification: An Analysis of In-Memory Database Durability - - PowerPoint PPT Presentation

write amplification an analysis of in memory database
SMART_READER_LITE
LIVE PREVIEW

Write Amplification: An Analysis of In-Memory Database Durability - - PowerPoint PPT Presentation

Write Amplification: An Analysis of In-Memory Database Durability Techniques Jaemyung Kim , Kenneth Salem, Khuzaima Daudjee University of Waterloo IMDM 2015 Durability Matters OLTP IMDB Orders of magnitude faster! I/O is unavoidable! (ACID,


slide-1
SLIDE 1

Write Amplification: An Analysis of In-Memory Database Durability Techniques

Jaemyung Kim, Kenneth Salem, Khuzaima Daudjee

University of Waterloo

IMDM 2015

slide-2
SLIDE 2

Durability Matters

OLTP IMDB

I/O is unavoidable! (ACID, Durability) Orders of magnitude faster! Write I/O efficiency of in-memory DBMS is an important issue.

cliparts from openclipart.org 2

slide-3
SLIDE 3

Write Amplification?

In-memory Database Persistent Storage On-disk Database

λ P λ

Application

3

slide-4
SLIDE 4

Goal of Write Amplification Model

Quantify and compare the I/O efficiency of the persistent storage management schemes Provide us with some insight into the different natures of update-in-place and copy-on-write storage managers Lower cost for operating a database management system (contributed by improved I/O efficiency) Lead to better system performance in situations that I/O capacity is constrained (restart recovery) The following is not our goals:

Emulate a specific storage manager implementation Compare specific implemenations: e.g., Hekaton is better than H-Store

4

slide-5
SLIDE 5

Architectural Diversity in IMDB SM

Two broad classes: Update In-Place and Copy-On-Write Update In-Place (UIP)

UIP: conventional page-based (e.g., Shore-MT)

random writes for checkpointing device sensitive: e.g., HDD vs. SSD

UIP-S: snapshot checkpointing (e.g., H-Store, SiloR)

Copy-On-Write (COW)

COW-D: logging only (log-structured) database (e.g., Hekaton) COW-M: log-structured memory and disk datbases (e.g., RAMCloud)

5

slide-6
SLIDE 6

UIP: Page-level Checkpoint

Example Space Constraint (α) = 1.2×DBSize, PageSize=2 Example Update Sequence: I,D,B A B C D E F G H I J A B C D E F G H I J 0 0

DB LOG

Memory: Disk: LOG I/O History: DB I/O History: I/O Per Update = #DBIO+#LogIO

#Updates

=

6

slide-7
SLIDE 7

UIP: Page-level Checkpoint

Example Space Constraint (α) = 1.2×DBSize, PageSize=2 Example Update Sequence: I,D,B A B C D E F G H I J A B C D E F G H I J I

DB LOG

Memory: Disk: LOG I/O History: I DB I/O History: I/O Per Update = #DBIO+#LogIO

#Updates

= 0+1

1

= 1

6

slide-8
SLIDE 8

UIP: Page-level Checkpoint

Example Space Constraint (α) = 1.2×DBSize, PageSize=2 Example Update Sequence: I,D,B A B C D E F G H I J A B C D E F G H I J I D

DB LOG

Memory: Disk: LOG I/O History: I,D DB I/O History: I/O Per Update = #DBIO+#LogIO

#Updates

= 0+2

2

= 1

6

slide-9
SLIDE 9

UIP: Page-level Checkpoint

Example Space Constraint (α) = 1.2×DBSize, PageSize=2 Example Update Sequence: I,D,B A B C D E F G H I J A B C D E F G H I J 0 0

DB LOG

Memory: Disk: LOG I/O History: I,D DB I/O History: C,D,I,J I/O Per Update = #DBIO+#LogIO

#Updates

= 4+2

2

= 3

6

slide-10
SLIDE 10

UIP: Page-level Checkpoint

Example Space Constraint (α) = 1.2×DBSize, PageSize=2 Example Update Sequence: I,D,B A B C D E F G H I J A B C D E F G H I J B 0

DB LOG

Memory: Disk: LOG I/O History: I,D,B DB I/O History: C,D,I,J I/O Per Update = #DBIO+#LogIO

#Updates

= 4+3

3

≈ 2.33

6

slide-11
SLIDE 11

UIP: Page-level Checkpoint

Example Space Constraint (α) = 1.2×DBSize, PageSize=2 Example Update Sequence: I,D,B A B C D E F G H I J A B C D E F G H I J B 0

DB LOG

Memory: Disk: LOG I/O History: I,D,B DB I/O History: C,D,I,J I/O Per Update = #DBIO+#LogIO

#Updates

= 4+3

3

≈ 2.33

6

slide-12
SLIDE 12

UIP-S: Snapshot Checkpoint

Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B A B C D E F G H I J A B C D E F G H I J I D

DB LOG

Memory: Disk: LOG I/O History: I,D DB I/O History: I/O Per Update = #DBIO+#LogIO

#Updates

= 0+2

2

= 1

7

slide-13
SLIDE 13

UIP-S: Snapshot Checkpoint

Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B A B C D E F G H I J A B C D E F G H I J 0 0

DB LOG

Memory: Disk: LOG I/O History: I,D DB I/O History: A,B,C,D,E,F,G,H,I,J I/O Per Update = #DBIO+#LogIO

#Updates

= 10+2

2

= 6

7

slide-14
SLIDE 14

UIP-S: Snapshot Checkpoint

Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B A B C D E F G H I J A B C D E F G H I J B 0

DB LOG

Memory: Disk: LOG I/O History: I,D,B DB I/O History: A,B,C,D,E,F,G,H,I,J I/O Per Update = #DBIO+#LogIO

#Updates

= 10+3

3

≈ 4.33

7

slide-15
SLIDE 15

UIP-S: Snapshot Checkpoint

Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B A B C D E F G H I J A B C D E F G H I J B 0

DB LOG

Memory: Disk: LOG I/O History: I,D,B DB I/O History: A,B,C,D,E,F,G,H,I,J I/O Per Update = #DBIO+#LogIO

#Updates

= 10+3

3

≈ 4.33

7

slide-16
SLIDE 16

Architectural Diversity in IMDB SM

Two broad classes: Update In-Place and Copy-On-Write Update In-Place (UIP)

UIP: conventional page-based (e.g., Shore-MT)

random writes for checkpointing device sensitive: e.g., HDD vs. SSD

UIP-S: snapshot checkpointing (e.g., H-Store, SiloR)

Copy-On-Write (COW)

COW-D: logging only (log-structured) database (e.g., Hekaton) COW-M: log-structured memory and disk datbases (e.g., RAMCloud)

8

slide-17
SLIDE 17

COW-D: Log-structured Disk

Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B A B C D E F G H I J H A E I F G D C B J 0 0

Log-structured DB

Memory: Disk: DB I/O History:

Read: Write:

I/O Per Update = #ReadIO+#WriteIO

#Updates

=

9

slide-18
SLIDE 18

COW-D: Log-structured Disk

Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B A B C D E F G H I J H A E I F G D C B J I

Log-structured DB

Memory: Disk: DB I/O History:

Read: Write: I

I/O Per Update = #ReadIO+#WriteIO

#Updates

= 0+1

1

= 1

9

slide-19
SLIDE 19

COW-D: Log-structured Disk

Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B A B C D E F G H I J H A E I F G D C B J I D

Log-structured DB

Memory: Disk: DB I/O History:

Read: Write: I,D

I/O Per Update = #ReadIO+#WriteIO

#Updates

= 0+2

2

= 1

9

slide-20
SLIDE 20

COW-D: Log-structured Disk

Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B A B C D E F G H I J I F G D C B J I D H A E

Log-structured DB

Memory: Disk: DB I/O History:

Read: H,A,E Write: I,D,H,A,E

I/O Per Update = #ReadIO+#WriteIO

#Updates

= 3+5

2

= 4

9

slide-21
SLIDE 21

COW-D: Log-structured Disk

Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B A B C D E F G H I J F G D C B J I D H A E 0

Log-structured DB

Memory: Disk: DB I/O History:

Read: H,A,E,I Write: I,D,H,A,E

I/O Per Update = #ReadIO+#WriteIO

#Updates

= 4+5

2

= 4.5

9

slide-22
SLIDE 22

COW-D: Log-structured Disk

Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B A B C D E F G H I J F G D C B J I D H A E B

Log-structured DB

Memory: Disk: DB I/O History:

Read: H,A,E,I Write: I,D,H,A,E,B

I/O Per Update = #ReadIO+#WriteIO

#Updates

= 4+6

3

≈ 3.33

9

slide-23
SLIDE 23

COW-D: Log-structured Disk

Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B A B C D E F G H I J F G D C B J I D H A E B

Log-structured DB

Memory: Disk: DB I/O History:

Read: H,A,E,I Write: I,D,H,A,E,B

I/O Per Update = #ReadIO+#WriteIO

#Updates

= 4+6

3

≈ 3.33

9

slide-24
SLIDE 24

COW-M: Log-structured Memory

Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B H A E I F G D C B J I H A E I F G D C B J 0 0

Log-structured IMDB Log-structured DB

Memory: Disk: DB I/O History:

Write:

I/O Per Update = #WriteIO

#Updates =

10

slide-25
SLIDE 25

COW-M: Log-structured Memory

Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B H A E F G D C B J I D H A E I F G D C B J I

Log-structured IMDB Log-structured DB

Memory: Disk: DB I/O History:

Write: I

I/O Per Update = #WriteIO

#Updates = 1 1 = 1

10

slide-26
SLIDE 26

COW-M: Log-structured Memory

Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B H A E F G C B J I D B H A E I F G D C B J I D

Log-structured IMDB Log-structured DB

Memory: Disk: DB I/O History:

Write: I,D

I/O Per Update = #WriteIO

#Updates = 2 2 = 1

10

slide-27
SLIDE 27

COW-M: Log-structured Memory

Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B F G C B J I D H A E B F G D C B J I D H A E 0

Log-structured IMDB Log-structured DB

Memory: Disk: DB I/O History:

Write: I,D,H,A,E

I/O Per Update = #WriteIO

#Updates = 5 2 = 2.5

10

slide-28
SLIDE 28

COW-M: Log-structured Memory

Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B F G C J I D H A E B F G D C B J I D H A E B

Log-structured IMDB Log-structured DB

Memory: Disk: DB I/O History:

Write: I,D,H,A,E,B

I/O Per Update = #WriteIO

#Updates = 6 3 = 2

10

slide-29
SLIDE 29

COW-M: Log-structured Memory

Example Space Constraint (α) = 1.2× DB Size Example Update Sequence: I,D,B F G C J I D H A E B F G D C B J I D H A E B

Log-structured IMDB Log-structured DB

Memory: Disk: DB I/O History:

Write: I,D,H,A,E,B

I/O Per Update = #WriteIO

#Updates = 6 3 = 2

10

slide-30
SLIDE 30

What We Can Do Using WAF Model

Compare UIP and COW Analyze the effect of persistent storage utilization (α) Analyze the effect of update workload (uniform vs. skew) Analyze the effect of page size on UIP

Find best page size for storage-specific performance characteristic

Analyze the effect of SSD vs. HDD persistent storage

Weight random and sequential I/O differently, depending on the device type

In-memory Database Persistent Storage On-disk Database

λ P λ

Application

I/O Per Update = λP

λ

11

slide-31
SLIDE 31

Result (Upate In-Place)

  • 1.0

1.2 1.4 1.6 1.8 2.0 1 2 5 10 20 α IO Per Update

  • UIP(HDD)

UIP(SSD) UIP(HDD)−SKEW UIP(SSD)−SKEW UIP−S

HDD SSD 12

slide-32
SLIDE 32

Result (Upate In-Place)

  • 1.0

1.2 1.4 1.6 1.8 2.0 1 2 5 10 20 α IO Per Update

  • UIP(HDD)

UIP(SSD) UIP(HDD)−SKEW UIP(SSD)−SKEW UIP−S

HDD SSD skew 12

slide-33
SLIDE 33

Result (Upate In-Place)

  • 1.0

1.2 1.4 1.6 1.8 2.0 1 2 5 10 20 α IO Per Update

  • UIP(HDD)

UIP(SSD) UIP(HDD)−SKEW UIP(SSD)−SKEW UIP−S

HDD SSD skew skew 12

slide-34
SLIDE 34

Result (Upate In-Place)

  • 1.0

1.2 1.4 1.6 1.8 2.0 1 2 5 10 20 α IO Per Update

  • UIP(HDD)

UIP(SSD) UIP(HDD)−SKEW UIP(SSD)−SKEW UIP−S

UIP-S 12

slide-35
SLIDE 35

Result (Copy-On-Write)

  • 1.0

1.2 1.4 1.6 1.8 2.0 1 2 5 10 20 α IO Per Update

  • COW−D

COW−M UIP−S

COW-D 13

slide-36
SLIDE 36

Result (Copy-On-Write)

  • 1.0

1.2 1.4 1.6 1.8 2.0 1 2 5 10 20 α IO Per Update

  • COW−D

COW−M UIP−S

COW-D COW-M 13

slide-37
SLIDE 37

Result (COW vs. UIP)

  • 1.0

1.2 1.4 1.6 1.8 2.0 1 2 5 10 20 α IO Per Update

  • COW−D

COW−M UIP−S

COW-D COW-M UIP-S 14

slide-38
SLIDE 38

Summary and Discussion

Write amplification model for in-memory database storage managers Quantify and compare the I/O efficiency of two broad classes

  • f storage managers (UIP,COW)

UIP (page-level)

commonly used in disk-based database effective when the capacity of persistent storage (or recovery time) is tightly constrained and/or highly skewed (e.g., 99:1)

UIP-S

appealing option among UIPs (only sequential writes) insensitive to update skew

COW-M

the most I/O efficient (lowest write amplification) in many settings require a log structure in memory that mirrors the log structure in persistent storage

15

slide-39
SLIDE 39

Mahalo!

16