Flash-based SSDs [ Material based on slides from Tyler - - PowerPoint PPT Presentation

flash based ssds
SMART_READER_LITE
LIVE PREVIEW

Flash-based SSDs [ Material based on slides from Tyler - - PowerPoint PPT Presentation

Flash-based SSDs [ Material based on slides from Tyler Caraza-Harter ] www: https://tyler.caraza-harter.com Cost: HDD vs. SSD Source: http://www.tomshardware.com/news/ssd-hdd-solid-state-drive-hard-disk-drive-prices,14336.html Cost: HDD vs.


slide-1
SLIDE 1

Flash-based SSDs

[ Material based on slides from Tyler Caraza-Harter ] www: https://tyler.caraza-harter.com

slide-2
SLIDE 2

Source: http://www.tomshardware.com/news/ssd-hdd-solid-state-drive-hard-disk-drive-prices,14336.html

Cost: HDD vs. SSD

slide-3
SLIDE 3

Source: http://www.tomshardware.com/news/ssd-hdd-solid-state-drive-hard-disk-drive-prices,14336.html

Cost: HDD vs. SSD

Note: These are trends, not the most up-to-date data. There are different classes

  • f HDDs and SSDs which

complicate this graph, but the thing to note is that there is a gap, but it is narrowing and all costs are trending downward.

slide-4
SLIDE 4

Disk Overview

I/O requires: seek, rotate, transfer

  • Inherently:
  • not parallel (only one head)
  • slow (mechanical)
  • poor random I/O (locality around disk head)
  • Random requests take 10ms+
slide-5
SLIDE 5

Flash

Hold charge in cells. No moving parts!

  • Inherently parallel.
  • No seeks!
slide-6
SLIDE 6

SLC: Single-Level Cell

charge

NAND Cell

slide-7
SLIDE 7

SLC: Single-Level Cell

charge

NAND Cell

1

slide-8
SLIDE 8

SLC: Single-Level Cell

charge

NAND Cell

slide-9
SLIDE 9

MLC: Multi-Level Cell

charge

NAND Cell

00

slide-10
SLIDE 10

MLC: Multi-Level Cell

charge

NAND Cell

01

slide-11
SLIDE 11

MLC: Multi-Level Cell

charge

NAND Cell

10

slide-12
SLIDE 12

MLC: Multi-Level Cell

charge

NAND Cell

11

slide-13
SLIDE 13

SLC

charge charge

MLC

Single- vs. Multi- Level Cell

slide-14
SLIDE 14

Single- vs. Multi- Level Cell

SLC

charge charge

MLC expensive robust cheap sensitive

slide-15
SLIDE 15

Wearout

Problem: flash cells wear out after being

  • verwritten too many times.
  • MLC: ~10K times

SLC: ~100K times

  • Usage strategy:
slide-16
SLIDE 16

Wearout

Problem: flash cells wear out after being

  • verwritten too many times.
  • MLC: ~10K times

SLC: ~100K times

  • Usage strategy: wear leveling.
  • prevents some cells from wearing out while
  • thers still fresh.
slide-17
SLIDE 17

Banks

Flash devices are divided into banks (aka, planes).

  • Banks can be accessed in parallel.

Bank 0 Bank 1 Bank 2 Bank 3

slide-18
SLIDE 18

Banks

Flash devices are divided into banks (aka, planes).

  • Banks can be accessed in parallel.

Bank 0 Bank 1 Bank 2 Bank 3

read read

slide-19
SLIDE 19

Banks

Flash devices are divided into banks (aka, planes).

  • Banks can be accessed in parallel.

Bank 0 Bank 1 Bank 2 Bank 3

slide-20
SLIDE 20

Banks

Flash devices are divided into banks (aka, planes).

  • Banks can be accessed in parallel.

Bank 0 Bank 1 Bank 2 Bank 3

data data

slide-21
SLIDE 21

Banks

Flash devices are divided into banks (aka, planes).

  • Banks can be accessed in parallel.

Bank 0 Bank 1 Bank 2 Bank 3

slide-22
SLIDE 22

Flash Writes

Writing 0’s:

  • fast, fine-grained
  • Writing 1’s:
  • slow, course-grained
slide-23
SLIDE 23

Flash Writes

Writing 0’s:

  • fast, fine-grained
  • called “program”
  • Writing 1’s:
  • slow, course-grained
  • called “erase”
slide-24
SLIDE 24

Flash Writes

Writing 0’s:

  • fast, fine-grained [pages]
  • called “program”
  • Writing 1’s:
  • slow, course-grained [blocks]
  • called “erase”
slide-25
SLIDE 25

Bank 0 Bank 2 Bank 3 Bank 1

A Bank Consists of Blocks

slide-26
SLIDE 26

Bank 0 Bank 2 Bank 3

each bank contains many “blocks”

A Bank Consists of Blocks

slide-27
SLIDE 27

1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111

A Block Consists of Pages

slide-28
SLIDE 28

1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111

  • ne block

A Block Consists of Pages

slide-29
SLIDE 29

Block

1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111

  • ne page
slide-30
SLIDE 30

The Heirarchy of SSD components:

One NAND flash Chip Is made of up several Banks Is made up of several blocks Is made up of several pages

slide-31
SLIDE 31

Block

1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111

slide-32
SLIDE 32

Block

1111 1111 1111 1111 1111 1111 1001 1111 1111 1111 1111 1111 1111 1111 1111 1111 program

slide-33
SLIDE 33

Block

1111 1111 1111 1111 1111 1111 1001 1111 1111 1111 1111 1111 1111 1111 1111 1111

slide-34
SLIDE 34

Block

1111 1111 1111 1111 1111 1111 1001 1100 1111 1111 1111 1111 1111 1111 1111 1111 program

slide-35
SLIDE 35

Block

1111 1111 1111 1111 1111 1111 1001 1100 1111 1111 1111 1111 1111 1111 1111 1111

slide-36
SLIDE 36

Block

1111 1111 1111 1111 1111 1111 1001 1100 1111 1111 1111 1111 1110 0001 1111 1111 program

slide-37
SLIDE 37

Block

1111 1111 1111 1111 1111 1111 1001 1100 1111 1111 1111 1111 1110 0001 1111 1111

slide-38
SLIDE 38

Block

1111 1111 1111 1111 1111 1111 1001 1100 1111 1111 1111 1111 1110 0001 1111 1111 erase

slide-39
SLIDE 39

Block

1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 erase

slide-40
SLIDE 40

Block

1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111

slide-41
SLIDE 41

APIs

disk flash read write

slide-42
SLIDE 42

APIs

disk flash read

read sector read page

write

slide-43
SLIDE 43

APIs

disk flash read

read sector read page write sector

write

program page (0’s) erase block (1’s)

slide-44
SLIDE 44

Flash Hierarchy

Plane: 1024 to 4096 blocks

  • planes accessed in parallel
  • Block: 64 to 256 pages
  • unit of erase
  • Page: 2 to 8 KB
  • unit of read and program
slide-45
SLIDE 45

Disk vs. Flash Performance

Throughput:

  • disk: ~130 MB/s (sequential)
  • flash: ~200 MB/s - 550 MB/s
slide-46
SLIDE 46

Disk vs. Flash Performance

Throughput:

  • disk: ~130 MB/s (sequential)
  • flash: ~200 MB/s
  • Latency
  • disk: ~10 ms (one op)
  • flash
  • read:

10-50 us

  • program: 200-500 us
  • erase:

2 ms

  • 550 MB/s
slide-47
SLIDE 47

Traditional File Systems

File System Storage Device

Traditional API:

  • read sector
  • write sector
slide-48
SLIDE 48

Traditional File Systems

File System Storage Device

Traditional API:

  • read sector
  • write sector

not same as flash

slide-49
SLIDE 49

Options

  • 1. Build/use new file systems for flash
  • JFFS, YAFFS
  • lot of work!
  • 2. Build traditional API over flash API.
  • use FFS, LFS, whatever we want
slide-50
SLIDE 50

Traditional API with Flash

read(addr): return flash_read(addr)

  • write(addr, data):

block_copy = flash_read(block of addr) modify block_copy with data flash_erase(block of addr) flash_program(block of addr, block_copy)

slide-51
SLIDE 51

Memory: Flash:

00 00 00 11 11 00 11 11 11 11 11 11 11 11 11 11 00 01 11 11 11 11 11 11

block 0 block 1 block 2

slide-52
SLIDE 52

Flash:

00 00 00 11 11 00 11 11 11 11 11 11 11 11 11 11 00 01 11 11 11 11 11 11

block 0 block 1 block 2

FS wants to write 0001

Memory:

slide-53
SLIDE 53

Flash:

00 00 00 11 11 00 11 11 11 11 11 11 11 11 11 11 00 01 11 11 11 11 11 11

block 0 block 1 block 2 Memory:

slide-54
SLIDE 54

Flash:

00 00 00 11 11 00 11 11 11 11 11 11 11 11 11 11 00 01 11 11 11 11 11 11

block 0 block 1 block 2 Memory:

00 00 00 11 11 00 11 11 read all other pages in block

slide-55
SLIDE 55

Flash:

00 00 00 11 11 00 11 11 11 11 11 11 11 11 11 11 00 01 11 11 11 11 11 11

block 0 block 1 block 2 Memory:

00 00 00 11 11 00 11 11

slide-56
SLIDE 56

Flash:

00 00 00 11 11 00 11 11 11 11 11 11 11 11 11 11 00 01 11 11 11 11 11 11

block 0 block 1 block 2 Memory:

00 01 00 11 11 00 11 11 modify target page in memory

slide-57
SLIDE 57

Flash:

00 00 00 11 11 00 11 11 11 11 11 11 11 11 11 11 00 01 11 11 11 11 11 11

block 0 block 1 block 2 Memory:

00 01 00 11 11 00 11 11

slide-58
SLIDE 58

Flash:

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 00 01 11 11 11 11 11 11

block 0 block 1 block 2 Memory:

00 01 00 11 11 00 11 11 erase block

slide-59
SLIDE 59

Flash:

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 00 01 11 11 11 11 11 11

block 0 block 1 block 2 Memory:

00 01 00 11 11 00 11 11

slide-60
SLIDE 60

Flash:

11 11 11 11 11 11 11 11 00 01 11 11 11 11 11 11

block 0 block 1 block 2 Memory:

00 01 00 11 11 00 11 11 program all pages in block 00 01 00 11 11 00 11 11

slide-61
SLIDE 61

Flash:

11 11 11 11 11 11 11 11 00 01 11 11 11 11 11 11

block 0 block 1 block 2 Memory:

00 01 00 11 11 00 11 11

slide-62
SLIDE 62

Write Amplification

Random writes are extremely expensive!

  • Writing one 2KB page may cause:
  • read, erase, and program of 256KB block.
slide-63
SLIDE 63

Write Amplification

Random writes are extremely expensive!

  • Writing one 2KB page may cause:
  • read, erase, and program of 256KB block.
  • Would FFS or LFS be better with flash?
slide-64
SLIDE 64

File Systems over Flash

Copy-On-Write FS may prevent some expensive random writes.

  • What about wear leveling? LFS won’t do this.
  • What if we want to use some other FS?

Perhaps some other FS has features or APIs our applications rely on, so we must use it

slide-65
SLIDE 65

Better Solution

Add copy-on-write layer between FS and flash. Avoids RMW (read-modify-write) cycle.

  • Translate logical device addrs to physical addrs.
  • FTL: Flash Translation Layer.
  • Should translation use math or data structure?
slide-66
SLIDE 66

Flash Translation Layer

00 01

block 0

00 10 00 11 00 00 10 01

block 1

11 11 11 11 11 11

1 2 3 4 5 6 7 physical: logical:

slide-67
SLIDE 67

Flash Translation Layer

00 01

block 0

00 10 00 11 00 00 10 01

block 1

11 11 11 11 11 11

1 2 3 4 5 6 7 physical: logical: write 1101

slide-68
SLIDE 68

Flash Translation Layer

00 01

block 0

00 10 00 11 00 00 10 01

block 1

11 01 11 11 11 11

1 2 3 4 5 6 7 physical: logical: write 1101

slide-69
SLIDE 69

Flash Translation Layer

00 01

block 0

00 10 00 11 00 00 10 01

block 1

11 01 11 11 11 11

1 2 3 4 5 6 7 physical: logical: write 1101

slide-70
SLIDE 70

Flash Translation Layer

00 01

block 0

00 10 00 11 00 00 10 01

block 1

11 01 11 11 11 11

1 2 3 4 5 6 7 physical: logical:

slide-71
SLIDE 71

Flash Translation Layer

00 01

block 0

00 10 00 11 00 00 10 01

block 1

11 01 11 11 11 11

1 2 3 4 5 6 7 physical: logical:

must eventually be garbage collected

slide-72
SLIDE 72

FTL

Could be implemented as device driver or in firmware (usually the latter).

  • Where to store mapping? SRAM.
  • Physical pages can be in three states:
  • valid, invalid, free
slide-73
SLIDE 73

States

free valid invalid

slide-74
SLIDE 74

States

free valid invalid

program erase

slide-75
SLIDE 75

States

free valid invalid

program erase relocate

  • r TRIM
slide-76
SLIDE 76

SSD Architecture

FTL SRAM: mapping tbl SSD: looks like disk

(Traditional block API)

slide-77
SLIDE 77

Problem: Big Mapping Table

Assume 200GB device, 2KB pages, 4-byte entries.

  • SRAM needed: (200GB / 2KB) * 4 bytes = 400 MB.
  • Too big, SRAM is expensive!
slide-78
SLIDE 78

Page Translations

00 01

block 0

00 10 00 11 00 00 10 01

block 1

01 01 11 11 11 11

1 2 3 4 5 6 7 physical: logical:

slide-79
SLIDE 79

2-Page Translations

00 01

block 0

00 10 00 11 00 00 10 01

block 1

01 01 11 11 11 11

1 2 3 4 5 6 7 physical: logical:

slide-80
SLIDE 80

Larger Mappings

Advantage: larger mappings decrease table size.

  • Disadvantage?
slide-81
SLIDE 81

2-Page Translations

00 01

block 0

00 10 00 11 00 00 10 01

block 1

01 01 11 11 11 11

1 2 3 4 5 6 7 physical: logical:

slide-82
SLIDE 82

2-Page Translations

00 01

block 0

00 10 00 11 00 00 10 01

block 1

01 01 11 11 11 11

1 2 3 4 5 6 7 physical: logical: write 1011

slide-83
SLIDE 83

2-Page Translations

00 01

block 0

00 10 00 11 00 00 10 01

block 1

01 01 10 11 01 01

1 2 3 4 5 6 7 physical: logical: write 1011

copy

slide-84
SLIDE 84

2-Page Translations

00 01

block 0

00 10 00 11 00 00 10 01

block 1

01 01 10 11 01 01

1 2 3 4 5 6 7 physical: logical: write 1011

slide-85
SLIDE 85

2-Page Translations

00 01

block 0

00 10 00 11 00 00 10 01

block 1

01 01 10 11 01 01

1 2 3 4 5 6 7 physical: logical: write 1011

slide-86
SLIDE 86

Larger Mappings

Advantage: larger mappings decrease table size.

  • Disadvantages?
  • more read-modify-write updates
  • more garbage
  • less flexibility for placement
slide-87
SLIDE 87

Hybrid FTL

Use course-grained mapping for most (e.g., 95%)

  • f data. Map at block level.
  • Use fine-grained mapping for recent data.

Map at page level.

slide-88
SLIDE 88

Log Blocks

Write changed pages to designated log blocks.

  • After blocks become full, merge changes with old

data.

  • Eventually garbage collect old pages.
slide-89
SLIDE 89

Merging

Merging technique depends on I/O pattern.

  • Three merge types:
  • full merge
  • partial merge
  • switch merge
slide-90
SLIDE 90

Merging

Merging technique depends on I/O pattern.

  • Three merge types:
  • full merge
  • partial merge
  • switch merge
slide-91
SLIDE 91

A

block 0

B C D 11 11

block 1 (log)

11 11 11 11 11 11

1 2 3 physical: logical: …

11 11

block 2

11 11 11 11 11 11

slide-92
SLIDE 92

A

block 0

B C D 11 11

block 1 (log)

11 11 11 11 11 11

1 2 3 physical: logical: …

11 11

block 2

11 11 11 11 11 11

write D2

slide-93
SLIDE 93

A

block 0

B C D D2

block 1 (log)

11 11 11 11 11 11

1 2 3 physical: logical: …

11 11

block 2

11 11 11 11 11 11

write D2

slide-94
SLIDE 94

A

block 0

B C D D2

block 1 (log)

11 11 11 11 11 11

1 2 3 physical: logical: …

11 11

block 2

11 11 11 11 11 11

slide-95
SLIDE 95

A

block 0

B C D D2

block 1 (log)

11 11 11 11 11 11

1 2 3 physical: logical: …

11 11

block 2

11 11 11 11 11 11

eventually, we need to get rid of red arrows, as these represent expensive mappings

slide-96
SLIDE 96

A

block 0

B C D D2

block 1 (log)

11 11 11 11 11 11

1 2 3 physical: logical: …

11 11

block 2

11 11 11 11 11 11

slide-97
SLIDE 97

A

block 0

B C D D2

block 1 (log)

11 11 11 11 11 11

1 2 3 physical: logical: …

A

block 2

B C D2

slide-98
SLIDE 98

A

block 0

B C D D2

block 1 (log)

11 11 11 11 11 11

1 2 3 physical: logical: …

A

block 2

B C D2

slide-99
SLIDE 99

A

block 0

B C D D2

block 1 (log)

11 11 11 11 11 11

1 2 3 physical: logical: …

A

block 2

B C D2

slide-100
SLIDE 100

A

block 0

B C D D2

block 1 (log)

11 11 11 11 11 11

1 2 3 physical: logical: …

A

block 2

B C D2 garbage

slide-101
SLIDE 101

Merging

Merging technique depends on I/O pattern.

  • Three merge types:
  • full merge
  • partial merge
  • switch merge
slide-102
SLIDE 102

A

block 0

B C D 11 11

block 1 (log)

11 11 11 11 11 11

1 2 3 physical: logical: …

11 11

block 2

11 11 11 11 11 11

slide-103
SLIDE 103

A

block 0

B C D 11 11

block 1 (log)

11 11 11 11 11 11

1 2 3 physical: logical: …

11 11

block 2

11 11 11 11 11 11

write D2

slide-104
SLIDE 104

A

block 0

B C D 11 11

block 1 (log)

11 11 11 11 D2

1 2 3 physical: logical: …

11 11

block 2

11 11 11 11 11 11

write D2

slide-105
SLIDE 105

A

block 0

B C D 11 11

block 1 (log)

11 11 11 11 D2

1 2 3 physical: logical: …

11 11

block 2

11 11 11 11 11 11

slide-106
SLIDE 106

A

block 0

B C D A

block 1 (log)

B C D2

1 2 3 physical: logical: …

11 11

block 2

11 11 11 11 11 11

slide-107
SLIDE 107

A

block 0

B C D A

block 1 (log)

B C D2

1 2 3 physical: logical: …

11 11

block 2

11 11 11 11 11 11

slide-108
SLIDE 108

A

block 0

B C D A

block 1

B C D2

1 2 3 physical: logical: …

11 11

block 2

11 11 11 11 11 11

slide-109
SLIDE 109

A

block 0

B C D A

block 1

B C D2

1 2 3 physical: logical: …

11 11

block 2

11 11 11 11 11 11 garbage

slide-110
SLIDE 110

Merging

Merging technique depends on I/O pattern.

  • Three merge types:
  • full merge
  • partial merge
  • switch merge
slide-111
SLIDE 111

A

block 0

B C D 11 11

block 1 (log)

11 11 11 11 11 11

1 2 3 physical: logical: …

11 11

block 2

11 11 11 11 11 11

slide-112
SLIDE 112

A

block 0

B C D 11 11

block 1 (log)

11 11 11 11 11 11

1 2 3 physical: logical: …

11 11

block 2

11 11 11 11 11 11

write A2

slide-113
SLIDE 113

A

block 0

B C D A2

block 1 (log)

11 11 11 11 11 11

1 2 3 physical: logical: …

11 11

block 2

11 11 11 11 11 11

write A2

slide-114
SLIDE 114

A

block 0

B C D A2

block 1 (log)

B2 11 11 11 11

1 2 3 physical: logical: …

11 11

block 2

11 11 11 11 11 11

write B2

slide-115
SLIDE 115

A

block 0

B C D A2

block 1 (log)

B2 C2 11 11

1 2 3 physical: logical: …

11 11

block 2

11 11 11 11 11 11

write C2

slide-116
SLIDE 116

A

block 0

B C D A2

block 1 (log)

B2 C2 D2

1 2 3 physical: logical: …

11 11

block 2

11 11 11 11 11 11

write D2

slide-117
SLIDE 117

A

block 0

B C D A2

block 1 (log)

B2 C2 D2

1 2 3 physical: logical: …

11 11

block 2

11 11 11 11 11 11 garbage

slide-118
SLIDE 118

Merging

Merging technique depends on I/O pattern.

  • Three merge types:
  • full merge
  • partial merge
  • switch merge
slide-119
SLIDE 119

Summary

Flash is much faster than disk, but…

  • It is more expensive.
  • It’s not a drop-in replacement beneath an FS

without a complex layer for emulating hard disk API.