[PPT] - Big and Fast Anti-Caching in OLTP Systems Justin DeBrabant Online PowerPoint Presentation

SLIDE 1

Justin DeBrabant

Big and Fast 

Anti-Caching in OLTP Systems

SLIDE 2

Online Transaction Processing

2

transaction-oriented small footprint write-intensive

SLIDE 3

3

A bit of history…

SLIDE 4

4

1972 relational model 1993 OLAP rise of the web Ingres/System R 2015 “end of an era”

OLTP Through the Years

SLIDE 5

Modern OLTP Requirements

5

1. web-scale (big)
2. high-throughput (fast)

SLIDE 6

Thesis Motivation

▸traditional disk-based architectures aren’t fast enough

6

▸newer main memory architectures aren’t big enough

SLIDE 7

7

Can we have main- memory performance for larger-than-memory datasets?

SLIDE 8

Thesis Overview: Contributions

1. anti-caching architecture
larger than memory datasets in main

memory DBMS

2. anti-caching + persistent memory
exploring next-generation hardware and

OLTP systems

8

SLIDE 9

Outline

▸Introduction

9

▸Overview and Motivation ▸Anti-Caching Architecture ▸Memory Optimizations ▸Anti-Caching on NVM ▸Future Work and Conclusions

SLIDE 10

Disk-Oriented Architectures

▸assumption: data won’t fit in memory ▸disk-resident data, main memory buffer pool for execution ▸concurrency is a must ▸ transaction serialization and locks

10

SLIDE 11

11 price per GB ($) 1E+00 1E+05 1E+10 1970 1973 1976 1979 1982 1985 1988 1991 1994 1997 2000 2003 2006 2009 2012

Memory Costs

SLIDE 12

Now What?

1. DBMS buffer pool

12

2. distributed cache
3. in-memory DBMS

SLIDE 13

Buffer Pool

▸must still…

maintain buffer pool
lock/latch data
maintain ARIES-style recovery logs

▸question: What is the overhead of all these things?

13

SLIDE 14

14

OLTP Through the Looking Glass, and What We Found There 

SIGMOD ‘08

12%

26% 31% 31%

Buffer Pool Locking Recovery Real Work

SLIDE 15

Now What?

1. DBMS buffer pool

15

2. distributed cache
3. in-memory DBMS

SLIDE 16

16

Cache Layer Persistence Layer

SLIDE 17

Main Memory Cache

▸ fast and scalable, but…

17

▸ key-value interface ▸ not ACID (AI, not CD)

SLIDE 18

Consistency and Durability

▸ reads are easy, writes are not

▸ multiple copies of data ▸ application’s responsibility

18

▸ for OLTP, writes are common and consistency is essential

SLIDE 19

Now What?

1. DBMS buffer pool

19

2. distributed cache
3. in-memory DBMS

SLIDE 20

20

SLIDE 21

H-Store Architecture

▸partitioned, shared-nothing ▸single-threaded main memory execution

no need for locks and latches

▸lightweight recovery

snapshots + command log

21

SLIDE 22

22

SLIDE 23

23

data > memory? virtual memory!

SLIDE 24

24

persistent storage

SLIDE 25

Big and Fast

big: disk-oriented

25

fast: memory-oriented big and fast: anti-caching

SLIDE 26

26

OLTP workloads are skewed

SLIDE 27

Design Principles

▸ asynchronous disk fetches

don’t block

▸ maintain ordering of evicted data accesses

ensures transactional consistency

▸ single copy of data

consistency is free

▸ efficient memory use, no swizzling

27

SLIDE 28

Outline

▸Introduction

28

▸Anti-Caching Architecture ▸Overview and Motivation ▸Memory Optimizations ▸Anti-Caching on NVM ▸Future Work and Conclusions

SLIDE 29

Architectural Overview

▸memory is primary storage, cold data is evicted to disk-based anti- cache ▸reading data from the anti-cache is done in 3 phases

avoids blocking, ensures consistency

29

SLIDE 30

Anti-Caching Phases

▸evict ▸pre-pass ▸fetch ▸merge

30

SLIDE 31

Evict

1. data > anti-cache threshold
2. dynamically construct anti-

cache blocks of coldest tuples

3. asynchronously write to disk

31

SLIDE 32

Pre-Pass

1. a transaction enters pre-pass when

evicted data is accessed

2. continues execution, creating list
f evicted blocks
3. abort, queue blocks to be fetched

32

SLIDE 33

Fetch

1. data is fetched asynchronously

from disk

avoids blocking
2. moved into merge buffer

33

SLIDE 34

Merge

1. data is moved from in-memory

merge buffer to in-memory table

2. previously aborted transaction is

restarted

3. transaction executes normally

34

SLIDE 35

Anti-Caching Phase: Evict Anti-Caching Phase: Pre-Pass Anti-Caching Phase: Fetch Anti-Caching Phase: Merge

anti-cache

SLIDE 36

Tracking Access Patterns

▸done online, more responsive to changes in workload ▸goal is low CPU and memory

verhead

▸approximate ordering is OK

36

SLIDE 37

Approximate LRU (aLRU)

▸maintain LRU chain embedded in tuple headers ▸per-partition ▸transactions that update LRU chain are sampled randomly ▸ configurable sample rate

37

SLIDE 38

Anti-Caching vs. Swapping

▸ fine-grained eviction ▸ blocks constructed dynamically ▸ asynchronous batched fetches ▸ possible because of transactions

38

SLIDE 39

Anti-Caching vs. Caching

▸data exists in exactly one location

caching architectures have multiple

copies, must maintain consistency

data is moved, not copied

▸goal is increased data size, not throughput

39

SLIDE 40

Benchmarking

▸YCSB ▸Zipfian skew ▸data > memory ▸read/write mix ▸MySQL, MySQL + memcached

40

SLIDE 41

YCSB, read-only, data 8X memory

41

throughput (txn/s)

30000 60000 90000 120000

workload skew (high —> low)

1.5 1.25 1 0.75 0.5

anti-cache MySQL MySQL + memcached

SLIDE 42

YCSB, read-heavy, data 8X memory

42

throughput (txn/s)

30000 60000 90000 120000

workload skew (high —> low)

1.5 1.25 1 0.75 0.5

anti-cache MySQL MySQL + memcached

SLIDE 43

Tracking Accesses Revisited

▸approximate ordering is OK ▸original implementation ▸ aLRU (linked list) ▸compute vs. memory

43

Can we reduce the memory overhead?

SLIDE 44

Timestamp-Based Eviction

▸ use relative timestamps to track accesses ▸ to evict, take subset of tuples and evict based on timestamp age ▸ questions: ▸ timestamp granularity ▸ sample size (power of two)

44

SLIDE 45

Timestamp Granularity

▸4 byte timestamps ▸ use instruction counter ▸2 byte timestamps ▸use epochs, set the timestamp to the current epoch

45

SLIDE 46

YCSB, read-heavy, data 8X

46

throughput (txn/s)

22500 45000 67500 90000

workload skew (high —> low)

1.5 1.25 1 0.75 0.5

aLRU chain timestamp-low timestamp-high

SLIDE 47

Key Take-Aways

▸8-17X improvement for skewed workloads at larger- than-memory data sizes ▸disk becomes the bottleneck for lower skew

47

SLIDE 48

Hardware Assumptions are Key

▸heavily influence system architectures ▸many factors ▸ capacity ▸ latency ▸ volatility

48

SLIDE 49

49

What’s next for OLTP?

SLIDE 50

50

Non-Volatile Memory

SLIDE 51

Properties of NVM

▸ non-volatile ▸ random-access ▸ high write endurance

except flash

▸ byte-addressable

except flash

51

SLIDE 52

The NVM Arms Race

▸FeRAM

high write endurance

▸MRAM

DRAM-like latency

▸PCM (PRAM)

DRAM-like capacity

52

SLIDE 53

Looking Forward…

▸OLTP architectures and NVM

anti-cache architecture
disk-based architecture

▸open questions

Which architecture is best suited for NVM?
What adaptations are needed?

53

SLIDE 54

NVM Emulation

▸goal: provide product-independent analysis ▸test wide range of latency profiles ▸automatically add specified latency ▸built by collaborators at Intel

54

SLIDE 55

Anti-Caching on NVM

▸replace disk with NVM ▸several adaptations necessary ▸lightweight array-based anti-cache ▸utilizes mmap interface ▸fine-grained block and tuple eviction interface

55

SLIDE 56

Disk-Oriented Architectures on NVM

▸must adapt both storage and log files to be use NVM mmap interface ▸configure to use fine-grained buffer pool pages

56

SLIDE 57

YCSB, read-only, data 8X

57

throughput (txn/s)

45000 90000 135000 180000

workload skew (high —> low)

1.5 1.25 1 0.75 0.5

anti-caching MySQL

SLIDE 58

YCSB, read-heavy, data 8X

58

throughput (txn/s)

45000 90000 135000 180000

workload skew (high —> low)

1.5 1.25 1 0.75 0.5

anti-caching MySQL

SLIDE 59

59

Future Work

SLIDE 60

Multi-Tier Architectures

▸DRAM -> NVM -> Disk/SSD ▸open questions ▸ indexing structures ▸synchronous/asynchronous fetches

60

SLIDE 61

Anti-Caching Indexes

▸index size can be significant ▸can cold index ranges be evicted to an anti-cache? ▸open questions ▸ how/what to evict ▸ execution changes

61

SLIDE 62

Semantic Anti-Caching

▸current implementation makes no assumption about types of skew ▸skew typically as semantic meaning ▸ e.g., temporal, spatial ▸can we leverage these domain semantics?

62

SLIDE 63

Conclusions

▸anti-caching architecture outperforms and outscales previous OLTP architectures ▸well-suited for next-generation NVM- based architectures

63

SLIDE 64

64

SLIDE 65

Questions?    

debrabant@cs.brown.edu 

65