Warped Mirrors for Flash Yiying Zhang Andrea C. Arpaci-Dusseau - - PowerPoint PPT Presentation

warped mirrors for flash
SMART_READER_LITE
LIVE PREVIEW

Warped Mirrors for Flash Yiying Zhang Andrea C. Arpaci-Dusseau - - PowerPoint PPT Presentation

Warped Mirrors for Flash Yiying Zhang Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau 2 3 Flash-based SSDs in Storage Systems Using commercial SSDs in storage layer Good performance Easy to use Relatively cheap Usage


slide-1
SLIDE 1

Yiying Zhang

Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau

Warped Mirrors for Flash

slide-2
SLIDE 2

2

slide-3
SLIDE 3

3

  • Using commercial SSDs in storage layer

▫ Good performance ▫ Easy to use ▫ Relatively cheap

  • Usage

▫ MySpace, Facebook, Amazon, etc. ▫ All-flash storage, e.g., Pure Storage

  • What about reliability?

Flash-based SSDs in Storage Systems

slide-4
SLIDE 4

4

  • Flash wears out with erases

▫ More writes => more erases ▫ FTL and wear leveling help

  • One way to improve SSD reliability
  • Redundancy or RAID

Assume failure independence

Flash-based SSD Reliability

slide-5
SLIDE 5

5

Correlated failure !

What About Flash-based Array?

SSD SSD

Write Write

Data Loss

Replace Time

$$$

Write Write

slide-6
SLIDE 6

6

  • Write more to one SSD to induce earlier failure
  • Focus on mirrors (RAID1)

WaM - Warped Mirrors for Flash

SSD SSD

Write Write Write Write Write Write Write Write

No Data Loss

Replace

slide-7
SLIDE 7

7

  • Reliability achieved by failure separation
  • Configurable

▫ Approximated model + correcting method

  • Low monetary cost

▫ 1-2 cents per hour for mirrors using WaM ▫ 47-94% of fixed-time replacement every one year

  • Small performance overhead

▫ 10% more resp time for 52hr-159day separation

WaM Benefits

slide-8
SLIDE 8

8

  • Introduction
  • WaM design and model
  • Evaluation results
  • Conclusion

Outline

slide-9
SLIDE 9

9

Basic Solution - Adding Dummy Writes

SSDearly SSDlate

Write Write Dummy Write Write Write

FSI

Failure-Separation Interval

Dummy Write from RAID controller: Write the existing content From last write or a random page

slide-10
SLIDE 10

10

  • FSI: window for detection and reconstruction

▫ Set by administrator at initialization time ▫ Can be adjusted

  • Choosing FSI

▫ Long enough for recovery ▫ Short to avoid high performance cost

Failure Separation Interval

How many dummy writes to add given an FSI?

slide-11
SLIDE 11

11

  • Subverting FTL

▫ No knowledge of underlying FTL

  • Achieving near-perfect FSI

▫ FSI cannot be shorter than target (reliability) ▫ Performance overhead should be minimized

Challenges

slide-12
SLIDE 12

12

  • Model based on

▫ Target FSI length ▫ SSD properties ▫ Workload properties

  • Goal

▫ Find dummy write percentage for a target FSI

WaM Model

slide-13
SLIDE 13

13

  • Ratio of erases between two mirrored SSDs
  • Dummy write percentage Pdummy

WaM Model – Dummy Write Percentage

early erases erase late erases

N R N 

1 1

erase dummy dummy erase

R P P R    

Number of erases issued by SSDearly Number of erases issued by SSDlate

slide-14
SLIDE 14

14

WaM Model – Num Erases Remaining

late late remaining worn erases

N N N  

late worn erases erase

N N R 

late worn remaining worn erase

N N N R  

Maximum number of erases of an SSD block (SSDlate) Number of erases with SSDlate when SSDearly dies SSDearly

slide-15
SLIDE 15

15

WaM Model – Num Erases during Time

/ I Os r i

T N T T   ( )

page total erases writes r i block

S T N T P T T S    

1 ( )

page perblock erases writes r i block ssd

S T N T P T T S N     

Workload dependent Knowledge of SSD parameters Perfect wear leveling

Avg Response Time Avg Idle Time Write Percentage Flash Page Size Flash Erase Block Size Num of Erase Blocks in SSD

slide-16
SLIDE 16

16

WaM Model – Final Steps

(FSI)

late perblock remaining erase

N N 

1

page worn worn writes erase r i block ssd

N N FSI N P R T T N N      

1

worn erase page worn writes r i block ssd

N R N FSI N P T T N N      

1

dummy erase

P R  

slide-17
SLIDE 17

17

  • Device parameters

▫ From device vendor or detect with tool

  • Workload changes

▫ Adjust model as workloads change

  • Imperfect or no wear leveling
  • Incorrect SSD lifetime

Violations: FSI too short or too long

Assumptions and Limitations

slide-18
SLIDE 18

18

  • If FSI too short

▫ Delay writes to the surviving SSD

  • If FSI too long

▫ Performance cost ▫ Adjust in future WaM modeling

Achieving Target FSI

Target FSI

Write Write

_ _ late remaining target delay late remaining actual

N R N 

SSDearly SSDlate

slide-19
SLIDE 19

19

  • When the first SSD (SSDearly) fails

▫ Replace with a new SSD ▫ Reconstruct the data

  • Replacing the second SSD (SSDlate)

▫ At the same time when first SSD fails (no reliability risk, slightly higher cost) ▫ When it fails (higher reliability risk, slightly low cost)

Recovery

slide-20
SLIDE 20

20

  • Introduction
  • WaM design and model
  • Evaluation results
  • Conclusion

Outline

slide-21
SLIDE 21

21

  • Simulation based on Disksim + SSD extension
  • A mirror pair of two 80GB SSDs
  • Workloads

▫ Microbenchmark ▫ Macrobenchmark ▫ Trace ▫ No idle time

Evaluation Environment

slide-22
SLIDE 22

22

Can Failures Be Separated with Dummy Writes? And How?

2000 4000 6000 8000 20 40 60 80 100 FSI (h) Dummy Write Percentage (%) Random Write 66% Random Write 33% Random Write Sequential Write

Failures can be separated with dummy writes More dummy writes -> longer separation Wear leveling homogenize workloads

slide-23
SLIDE 23

23

What Is the Performance Overhead?

20 40 60 80 100 20 40 60 80 100 Avg Response Time Increase (%) Dummy Write Percentage (%) Random Write 66% Random Write 33% Random Write Sequential Write

More dummy writes -> worse performance

slide-24
SLIDE 24
  • Sequential workload

24

Can the Correct FSI Be Achieved?

10 20 30 40 50 60 10 20 30 40 50 FSI Delivered (h) FSI Target (h) Target WaM Without Delay WaM With Delay

slide-25
SLIDE 25
  • Random workload

25

Can the Correct FSI Be Achieved?

10 20 30 40 50 60 10 20 30 40 50 FSI Delivered (h) FSI Target (h) Target WaM Without Delay WaM With Delay

WaM model can be inaccurate Target FSI can be delivered with delaying

slide-26
SLIDE 26

26

How about Real Workloads? - FSI

50 100 150 200 250 300 20 40 60 80 100 FSI (h) Dummy Write Percentage (%) Postmark 500 1000 1500 2000 20 40 60 80 100 FSI (h) Dummy Write Percentage (%) TPC-C 5000 10000 15000 20000 25000 20 40 60 80 100 FSI (h) Dummy Write Percentage (%) WebSearch

FSI and dummy write relationship as expected Larger FSI with read-intensive workloads

slide-27
SLIDE 27

27

How about Real Workloads? - Performance

20 40 60 80 100 20 40 60 80 100 Avg Response Time Increase (%) Dummy Write Percentage (%) Postmark TPC-C WebSearch

50-5000 hours of FSI

Higher overhead with write-intensive workloads Performance overhead is small for typical FSI

slide-28
SLIDE 28
  • WaM: cost of SSD + sys-admin check each FSI interval
  • Fixed replacement: replace SSD after one year

28

What is the Monetary Cost?

0.01 0.02 0.03 20 40 60 80 100 Cost (dollar/h) Dummy Write Percentage (%)

Cost with fixed replacement

WaM costs lower than fixed-time replacement

3 years total ownership cost: Fixed replacement - $594 WaM - $275 - $366

slide-29
SLIDE 29

29

  • Failures are separated with desired FSI
  • Model is approximated
  • Achieves desired FSI with delaying
  • Small performance overhead
  • Low monetary cost

Summary of Results

slide-30
SLIDE 30

30

  • Introduction
  • WaM design and model
  • Evaluation results
  • Conclusion

Outline

slide-31
SLIDE 31

31

  • Correlated failure of flash-based RAID
  • Separate failures by carefully adding dummy

writes and delaying writes

  • Other techniques for failure separation

▫ Wear our one SSD to some extent before using ▫ Stagger SSDs with different ages in a RAID ▫ Vendor control when SSDs in RAID fail

Conclusion

slide-32
SLIDE 32

32

  • Applying existing solutions directly to new

devices may not work

  • WaM is a simple solution to guarantee failure

separation and pushes aggressive use of SSDs

  • Other techniques may work well
  • WaM model can be useful

Conclusion

slide-33
SLIDE 33

Thank You Questions?

33

http://wisdom.cs.wisc.edu/home http://research.cs.wisc.edu/adsl