Getting M g Mor ore P e Per erformance w ce with Pol olym - - PowerPoint PPT Presentation

getting m g mor ore p e per erformance w ce with pol olym
SMART_READER_LITE
LIVE PREVIEW

Getting M g Mor ore P e Per erformance w ce with Pol olym - - PowerPoint PPT Presentation

Getting M g Mor ore P e Per erformance w ce with Pol olym ymorphism sm f from om E Emer erging g Mem emory T y Tec echnol ologies es Iyswarya Narayanan, Aishwarya Ganesan, Anirudh Badam, Sriram Govindan, Bikash Sharma, Anand


slide-1
SLIDE 1

Getting M g Mor

  • re P

e Per erformance w ce with Pol

  • lym

ymorphism sm f from

  • m E

Emer erging g Mem emory T y Tec echnol

  • logies

es

Iyswarya Narayanan, Aishwarya Ganesan, Anirudh Badam, Sriram Govindan, Bikash Sharma, Anand Sivasubramaniam

slide-2
SLIDE 2

Resource needs of cloud storage applications span multiple aspects

2

Latency PDF Trimmed tail

High performance High Capacity

SSD SSD

Volatile and persistent accesses

slide-3
SLIDE 3

Cloud applications are diverse!

3

In terms of their capacity needs for both volatile reads and persistent writes

0.2 0.4 0.6 0.8 Cloud Storage Map Reduce Search-Index Search-Serve Unique pages accessed within a time window Reads Writes

Write intensive Read intensive Read and write intensive

slide-4
SLIDE 4

Cloud applications are diverse!

4

In terms of volume of read and write accesses

Search-Serve

Across different applications Temporally within same applications

How to effectively provision memory and storage resources for diverse cloud storage applications?

slide-5
SLIDE 5

DRAM and SSD are memory and storage resources

5

Volatile Persistent

Latency Cost Capacity

Low latency Low capacity High latency High capacity

slide-6
SLIDE 6

They are rigid in their performance characteristics

6

Volatile Persistent In Function: Memory

  • r Storage

In Latency: fast DRAM

  • vs. slow SSD

In Capacity: Based on server SKU

Latency Cost Capacity

Low latency Low capacity High latency High capacity

slide-7
SLIDE 7

Can emerging memories help meet diverse resource needs for cloud storage apps across several dimensions?

7

Non-volatile Lower latencies w.r.t SSD

+

  • +
  • Battery Backed DRAM

3D XPoint Compressed

Volatile Low capacity Low latency Persistent High capacity High latency Larger and flexible: capacity and latency

Latency Cost Capacity

Can we exploit these emerging memory technologies to overcome drawbacks of existing resources?

slide-8
SLIDE 8

What are the design choices to integrate emerging memory technologies in cloud servers?

8

Persistent memory programming

Intrusive code changes to applications! Benefit volatile and persistent accesses

NVM based file systems

High NVM provisioning cost for entire storage needs! Intrusive code changes to OS and FS! No changes to applications Benefits reads or writes and not both! Low cost and transparent

Transparent cache (memory or storage)

slide-9
SLIDE 9

Emerging memory technologies are polymorphic

volatile persistent Persistent write cache Volatile memory cache Memory extension (direct access via loads, stores) Transparent write cache above SSD (via block interface)

They can function as both memory and storage!

Can we exploit functional polymorphism knob?

slide-10
SLIDE 10

Functional polymorphism can benefit applications with competing volatile and persistent flows

10

5 10 15 50 100

Tail latency

% NVM used as write cache

Partitioning NVM between memory and storage reduces latency

MySQL TPC-C

dm-cache to use a part of NVM as write cache Rest – additional memory accessible via load/stores

What if the working set exceeds physical memory/write-cache capacity?

slide-11
SLIDE 11

Impact of insufficient physical capacity + fixed resource characteristics on application performance

11

Access Latency Probability Application’s working set split between two fixed latency tiers

95th percentile DRAM

SSD

Tail latency is determined by the slowest tier

slide-12
SLIDE 12

Representational polymorphism knob to tune latency and capacity

12

Access Latency Probability Tail latency reduces Application’s working set split between two fixed latency tiers Faster tier morphs to hold more working set

95th percentile DRAM

SSD

95th percentile

slide-13
SLIDE 13

Representational polymorphism can benefit applications

13

2 4 6 8 10 12 4096 2048 1024 512 Access Latency (us) Compressed Access Granularity (bytes) Write Access Read Access

Our goal: Effectively serve diverse cloud applications using polymorphic emerging memory based cache

200 400 600 800 1000 MapReduce SearchServe

% Increase in capacity

Much lower latency compare to SSD! 2X to 8X increase in effective capacity

slide-14
SLIDE 14

NVM NVM

PolyEMT: Polymorphic Emerging Memory Technology based cache

14

Unmodified Application DRAM ns us 100 us 10 us Memory SSD Storage Memory Interface Block Interface

Functional Polymorphism: Memory vs. Storage ?

1 1

Cloud applications are diverse: One partition size does not fit all!

NVM can be Battery-backed DRAM, 3D-Xpoint, etc.

slide-15
SLIDE 15

PolyEMT: Polymorphic Emerging Memory Technology based cache

15

Unmodified Application DRAM ns us 100 us 10 us Memory SSD Compressed Storage Memory Interface Block Interface Compressed

Functional Polymorphism: Memory vs. Storage ? Representational Polymorphism: Capacity vs. Latency ?

1 2 1 NVM NVM 2 2

We need to navigate performance trade-off across capacity, latency, and persistence dimensions!

NVM can be Battery-backed DRAM, 3D-Xpoint, etc.

slide-16
SLIDE 16

Key idea of PolyEMT cache

16

  • Address the most significant bottleneck first using the

emerging memory based cache

  • Then gradually morph its characteristics to further

improve performance

What is the most significant bottleneck for a generic application with mixes of reads and writes ?

slide-17
SLIDE 17

1 2 3 4 5 Reads Writes Latency (ms) Avg. 95

Persistent writes (file writes, flushes, msyncs) incurs high latency in existing systems

17

Persistent tier is much slower And, SSDs are asymmetric in their read/write latency

Read Misses Persistent writes DRAM Block File System

SSD

SSD reads SSD writes

Use BB-DRAM as Write-Cache to SSD

Read Misses Persistent writes DRAM Block File System

SSD

BB-DRAM Write Cache SSD Reads SSD Writes

slide-18
SLIDE 18

DRAM Persistent writes Block File System

SSD

BB-DRAM Write Cache SSD Reads SSD Writes BB-DRAM Read Misses

As write-cache and memory extension

EMT entirely in Write-Cache is inefficient usage for read accesses as they are byte addressible

18

As write-cache

Read Misses Persistent writes DRAM Block File System

SSD

BB-DRAM Write Cache SSD Reads SSD Writes

How to apportion NVM capacity between memory and Storage functions?

Resource is byte addressable!

slide-19
SLIDE 19

Tuning write-cache capacity in the presence of competing read and write flows

19

Persistent Write Latency % BB-DRAM in Storage 100 1 50 75 25

slide-20
SLIDE 20

Tuning write-cache capacity in the presence of competing read and write flows

20

% EMT in Memory Volatile Latency 100 50 25 75 1 Persistent Write Latency % EMT in Storage 100 1 50 75 25

slide-21
SLIDE 21

Balance the overall impact of read and write accesses

21

Application Performance % EMT in Storage 100 % EMT in Memory 50 100 50 25 75 75 25 1 % EMT in Memory Volatile Latency 100 50 25 75 1 Persistent Write Latency % EMT in Storage 100 1 50 75 25

Incrementally repurpose Write-Cache blocks as memory pages to balance read/write performance.

slide-22
SLIDE 22

When the physical capacity is insufficient, exploit representational polymorphism

22

DRAM Persistent writes Block File System

SSD

BB-DRAM Write Cache SSD Reads SSD Writes BB-DRAM Read Misses

Functional polymorphic cache Functional + Representational polymorphic cache

DRAM Block File System

SSD

BB-DRAM Write Cache SSD Reads SSD Writes BB-DRAM Read Misses Compressed-BB-DRAM Compressed BB-DRAM Persistent writes

No latency benefits by separating memory and storage functions!

slide-23
SLIDE 23

When the physical capacity is insufficient, exploit representational polymorphism

23

DRAM Persistent writes Block File System

SSD

BB-DRAM Write Cache SSD Reads SSD Writes BB-DRAM Read Misses

Functional polymorphic cache Functional + Representational polymorphic cache

DRAM Block File System

SSD

BB-DRAM Write Cache SSD Reads SSD Writes BB-DRAM Read Misses Compressed-BB-DRAM Compressed BB-DRAM Persistent writes

No latency benefits by separating memory and storage functions!

DRAM Block File System

SSD

Battery-backed DRAM SSD Reads SSD Writes BB-DRAM Read Misses Shared-Compressed BB-DRAM Persistent writes

Shared compression layer reduces compute requirements too!

Shared compressed representation

slide-24
SLIDE 24

PolyEMT optimization steps at a glance

24

  • 4. LRU based

capacity management

  • 1. EMT as persistent Write-Back Cache
  • 2. Exploit functional polymorphism
  • 3. Exploit representational

polymorphism On scheduling a new application On dynamic phase changes within an application

slide-25
SLIDE 25

PolyEMT prototype

25

  • PolyEMT library and runtime
  • mmap(): native load/store access
  • msync(): persist dirty data to NVM write cache

persist data to SSD in background

  • More implementation details in the paper
slide-26
SLIDE 26

Evaluation Setup

26

  • Azure VM
  • DRAM (26GB)
  • Battery Backed –DRAM (6GB)
  • SSD
  • CPU based compression
  • Redis Key-Value store with persistence capability
  • Data set size:
  • 38GB much higher than DRAM+BB-DRAM capacity
  • YCSB benchmarks
slide-27
SLIDE 27

Transparent integration policies under evaluation

  • Dram-Extension
  • Write-Cache
  • Write-Cache + Functional polymorphism
  • Write-Cache + Functional polymorphism + Representational polymorphism
slide-28
SLIDE 28

Performance benefits of PolyEMT on throughput

2 4 6 8 a b c d e f Mean Normalized Throughput wrt DRAM-Extension Write-Cache Functional Functional+Representational

2.5X 4.55X 5X

Addressing the most significant bottleneck improves performance by 2.5X Exploiting polymorphisms further improves performance by 70% and 90%

slide-29
SLIDE 29

Performance benefits of PolyEMT on tail latency

0.2 0.4 0.6 0.8 1 Normalized Update Latency wrt DRAM-Ext Write-Cache Functional

0.2 0.4 0.6 0.8 1 Normalize Read Latency wrt DRAM-Ext Functional+Representational

Functional polymorphism reduces write and read tail latency by 60% and 80% EMT based write cache reduces write and read tail latency by 30% and 40% Combining morphing reduces write and read tail latency by 85% and 78%

0.7 0.6 0.4 0.2 0.15 0.22

slide-30
SLIDE 30

PolyEMT achieves performance by apportioning polymorphic resource across multiple dimensions

20 40 60 80 100 F-only F+R F-only F+R F-only F+R F-only F+R F-only F+R F-only F+R a b c d e f EMT alloction in % Persistent Compresed Volatile

PolyEMT benefits diverse cloud applications via careful apportioning of polymorphic cache across three dimensions!

slide-31
SLIDE 31

Diverse storage applications + Polymorphic EMT cache = High performance

To conclude,

  • Explore emerging memory technologies to augment SSD performance
  • For diverse cloud applications
  • In a cost efficient and transparent way
  • Our contributions:
  • Functional and representational polymorphism knobs of emerging memories
  • EMT design as a cache for SSD
  • Transparent mechanism to integrate this cache
  • Policy to morph this cache across to improve performance
  • Software defined memory and storage resource provisioning to extract better

performance per cost