SLIDE 1 Flash-based SSDs
[ Material based on slides from Tyler Caraza-Harter ] www: https://tyler.caraza-harter.com
SLIDE 2 Source: http://www.tomshardware.com/news/ssd-hdd-solid-state-drive-hard-disk-drive-prices,14336.html
Cost: HDD vs. SSD
SLIDE 3 Source: http://www.tomshardware.com/news/ssd-hdd-solid-state-drive-hard-disk-drive-prices,14336.html
Cost: HDD vs. SSD
Note: These are trends, not the most up-to-date data. There are different classes
complicate this graph, but the thing to note is that there is a gap, but it is narrowing and all costs are trending downward.
SLIDE 4 Disk Overview
I/O requires: seek, rotate, transfer
- Inherently:
- not parallel (only one head)
- slow (mechanical)
- poor random I/O (locality around disk head)
- Random requests take 10ms+
SLIDE 5 Flash
Hold charge in cells. No moving parts!
- Inherently parallel.
- No seeks!
SLIDE 6 SLC: Single-Level Cell
charge
NAND Cell
SLIDE 7 SLC: Single-Level Cell
charge
NAND Cell
1
SLIDE 8 SLC: Single-Level Cell
charge
NAND Cell
SLIDE 9 MLC: Multi-Level Cell
charge
NAND Cell
00
SLIDE 10 MLC: Multi-Level Cell
charge
NAND Cell
01
SLIDE 11 MLC: Multi-Level Cell
charge
NAND Cell
10
SLIDE 12 MLC: Multi-Level Cell
charge
NAND Cell
11
SLIDE 13 SLC
charge charge
MLC
Single- vs. Multi- Level Cell
SLIDE 14 Single- vs. Multi- Level Cell
SLC
charge charge
MLC expensive robust cheap sensitive
SLIDE 15 Wearout
Problem: flash cells wear out after being
- verwritten too many times.
- MLC: ~10K times
SLC: ~100K times
SLIDE 16 Wearout
Problem: flash cells wear out after being
- verwritten too many times.
- MLC: ~10K times
SLC: ~100K times
- Usage strategy: wear leveling.
- prevents some cells from wearing out while
- thers still fresh.
SLIDE 17 Banks
Flash devices are divided into banks (aka, planes).
- Banks can be accessed in parallel.
Bank 0 Bank 1 Bank 2 Bank 3
SLIDE 18 Banks
Flash devices are divided into banks (aka, planes).
- Banks can be accessed in parallel.
Bank 0 Bank 1 Bank 2 Bank 3
read read
SLIDE 19 Banks
Flash devices are divided into banks (aka, planes).
- Banks can be accessed in parallel.
Bank 0 Bank 1 Bank 2 Bank 3
SLIDE 20 Banks
Flash devices are divided into banks (aka, planes).
- Banks can be accessed in parallel.
Bank 0 Bank 1 Bank 2 Bank 3
data data
SLIDE 21 Banks
Flash devices are divided into banks (aka, planes).
- Banks can be accessed in parallel.
Bank 0 Bank 1 Bank 2 Bank 3
SLIDE 22 Flash Writes
Writing 0’s:
- fast, fine-grained
- Writing 1’s:
- slow, course-grained
SLIDE 23 Flash Writes
Writing 0’s:
- fast, fine-grained
- called “program”
- Writing 1’s:
- slow, course-grained
- called “erase”
SLIDE 24 Flash Writes
Writing 0’s:
- fast, fine-grained [pages]
- called “program”
- Writing 1’s:
- slow, course-grained [blocks]
- called “erase”
SLIDE 25 Bank 0 Bank 2 Bank 3 Bank 1
A Bank Consists of Blocks
SLIDE 26 Bank 0 Bank 2 Bank 3
each bank contains many “blocks”
A Bank Consists of Blocks
SLIDE 27
1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111
A Block Consists of Pages
SLIDE 28 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111
A Block Consists of Pages
SLIDE 29 Block
1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111
SLIDE 30 The Heirarchy of SSD components:
One NAND flash Chip Is made of up several Banks Is made up of several blocks Is made up of several pages
SLIDE 31
Block
1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111
SLIDE 32
Block
1111 1111 1111 1111 1111 1111 1001 1111 1111 1111 1111 1111 1111 1111 1111 1111 program
SLIDE 33
Block
1111 1111 1111 1111 1111 1111 1001 1111 1111 1111 1111 1111 1111 1111 1111 1111
SLIDE 34
Block
1111 1111 1111 1111 1111 1111 1001 1100 1111 1111 1111 1111 1111 1111 1111 1111 program
SLIDE 35
Block
1111 1111 1111 1111 1111 1111 1001 1100 1111 1111 1111 1111 1111 1111 1111 1111
SLIDE 36
Block
1111 1111 1111 1111 1111 1111 1001 1100 1111 1111 1111 1111 1110 0001 1111 1111 program
SLIDE 37
Block
1111 1111 1111 1111 1111 1111 1001 1100 1111 1111 1111 1111 1110 0001 1111 1111
SLIDE 38
Block
1111 1111 1111 1111 1111 1111 1001 1100 1111 1111 1111 1111 1110 0001 1111 1111 erase
SLIDE 39
Block
1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 erase
SLIDE 40
Block
1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111
SLIDE 41
APIs
disk flash read write
SLIDE 42 APIs
disk flash read
read sector read page
write
SLIDE 43 APIs
disk flash read
read sector read page write sector
write
program page (0’s) erase block (1’s)
SLIDE 44 Flash Hierarchy
Plane: 1024 to 4096 blocks
- planes accessed in parallel
- Block: 64 to 256 pages
- unit of erase
- Page: 2 to 8 KB
- unit of read and program
SLIDE 45 Disk vs. Flash Performance
Throughput:
- disk: ~130 MB/s (sequential)
- flash: ~200 MB/s - 550 MB/s
SLIDE 46 Disk vs. Flash Performance
Throughput:
- disk: ~130 MB/s (sequential)
- flash: ~200 MB/s
- Latency
- disk: ~10 ms (one op)
- flash
- read:
10-50 us
- program: 200-500 us
- erase:
2 ms
SLIDE 47 Traditional File Systems
File System Storage Device
Traditional API:
SLIDE 48 Traditional File Systems
File System Storage Device
Traditional API:
not same as flash
SLIDE 49 Options
- 1. Build/use new file systems for flash
- JFFS, YAFFS
- lot of work!
- 2. Build traditional API over flash API.
- use FFS, LFS, whatever we want
SLIDE 50 Traditional API with Flash
read(addr): return flash_read(addr)
block_copy = flash_read(block of addr) modify block_copy with data flash_erase(block of addr) flash_program(block of addr, block_copy)
SLIDE 51 Memory: Flash:
00 00 00 11 11 00 11 11 11 11 11 11 11 11 11 11 00 01 11 11 11 11 11 11
block 0 block 1 block 2
SLIDE 52 Flash:
00 00 00 11 11 00 11 11 11 11 11 11 11 11 11 11 00 01 11 11 11 11 11 11
block 0 block 1 block 2
FS wants to write 0001
Memory:
SLIDE 53 Flash:
00 00 00 11 11 00 11 11 11 11 11 11 11 11 11 11 00 01 11 11 11 11 11 11
block 0 block 1 block 2 Memory:
SLIDE 54 Flash:
00 00 00 11 11 00 11 11 11 11 11 11 11 11 11 11 00 01 11 11 11 11 11 11
block 0 block 1 block 2 Memory:
00 00 00 11 11 00 11 11 read all other pages in block
SLIDE 55 Flash:
00 00 00 11 11 00 11 11 11 11 11 11 11 11 11 11 00 01 11 11 11 11 11 11
block 0 block 1 block 2 Memory:
00 00 00 11 11 00 11 11
SLIDE 56 Flash:
00 00 00 11 11 00 11 11 11 11 11 11 11 11 11 11 00 01 11 11 11 11 11 11
block 0 block 1 block 2 Memory:
00 01 00 11 11 00 11 11 modify target page in memory
SLIDE 57 Flash:
00 00 00 11 11 00 11 11 11 11 11 11 11 11 11 11 00 01 11 11 11 11 11 11
block 0 block 1 block 2 Memory:
00 01 00 11 11 00 11 11
SLIDE 58 Flash:
11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 00 01 11 11 11 11 11 11
block 0 block 1 block 2 Memory:
00 01 00 11 11 00 11 11 erase block
SLIDE 59 Flash:
11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 00 01 11 11 11 11 11 11
block 0 block 1 block 2 Memory:
00 01 00 11 11 00 11 11
SLIDE 60 Flash:
11 11 11 11 11 11 11 11 00 01 11 11 11 11 11 11
block 0 block 1 block 2 Memory:
00 01 00 11 11 00 11 11 program all pages in block 00 01 00 11 11 00 11 11
SLIDE 61 Flash:
11 11 11 11 11 11 11 11 00 01 11 11 11 11 11 11
block 0 block 1 block 2 Memory:
00 01 00 11 11 00 11 11
SLIDE 62 Write Amplification
Random writes are extremely expensive!
- Writing one 2KB page may cause:
- read, erase, and program of 256KB block.
SLIDE 63 Write Amplification
Random writes are extremely expensive!
- Writing one 2KB page may cause:
- read, erase, and program of 256KB block.
- Would FFS or LFS be better with flash?
SLIDE 64 File Systems over Flash
Copy-On-Write FS may prevent some expensive random writes.
- What about wear leveling? LFS won’t do this.
- What if we want to use some other FS?
Perhaps some other FS has features or APIs our applications rely on, so we must use it
SLIDE 65 Better Solution
Add copy-on-write layer between FS and flash. Avoids RMW (read-modify-write) cycle.
- Translate logical device addrs to physical addrs.
- FTL: Flash Translation Layer.
- Should translation use math or data structure?
SLIDE 66 Flash Translation Layer
00 01
block 0
00 10 00 11 00 00 10 01
block 1
11 11 11 11 11 11
1 2 3 4 5 6 7 physical: logical:
SLIDE 67 Flash Translation Layer
00 01
block 0
00 10 00 11 00 00 10 01
block 1
11 11 11 11 11 11
1 2 3 4 5 6 7 physical: logical: write 1101
SLIDE 68 Flash Translation Layer
00 01
block 0
00 10 00 11 00 00 10 01
block 1
11 01 11 11 11 11
1 2 3 4 5 6 7 physical: logical: write 1101
SLIDE 69 Flash Translation Layer
00 01
block 0
00 10 00 11 00 00 10 01
block 1
11 01 11 11 11 11
1 2 3 4 5 6 7 physical: logical: write 1101
SLIDE 70 Flash Translation Layer
00 01
block 0
00 10 00 11 00 00 10 01
block 1
11 01 11 11 11 11
1 2 3 4 5 6 7 physical: logical:
SLIDE 71 Flash Translation Layer
00 01
block 0
00 10 00 11 00 00 10 01
block 1
11 01 11 11 11 11
1 2 3 4 5 6 7 physical: logical:
must eventually be garbage collected
SLIDE 72 FTL
Could be implemented as device driver or in firmware (usually the latter).
- Where to store mapping? SRAM.
- Physical pages can be in three states:
- valid, invalid, free
SLIDE 73
States
free valid invalid
SLIDE 74 States
free valid invalid
program erase
SLIDE 75 States
free valid invalid
program erase relocate
SLIDE 76 SSD Architecture
FTL SRAM: mapping tbl SSD: looks like disk
(Traditional block API)
SLIDE 77 Problem: Big Mapping Table
Assume 200GB device, 2KB pages, 4-byte entries.
- SRAM needed: (200GB / 2KB) * 4 bytes = 400 MB.
- Too big, SRAM is expensive!
SLIDE 78 Page Translations
00 01
block 0
00 10 00 11 00 00 10 01
block 1
01 01 11 11 11 11
1 2 3 4 5 6 7 physical: logical:
SLIDE 79 2-Page Translations
00 01
block 0
00 10 00 11 00 00 10 01
block 1
01 01 11 11 11 11
1 2 3 4 5 6 7 physical: logical:
SLIDE 80 Larger Mappings
Advantage: larger mappings decrease table size.
SLIDE 81 2-Page Translations
00 01
block 0
00 10 00 11 00 00 10 01
block 1
01 01 11 11 11 11
1 2 3 4 5 6 7 physical: logical:
SLIDE 82 2-Page Translations
00 01
block 0
00 10 00 11 00 00 10 01
block 1
01 01 11 11 11 11
1 2 3 4 5 6 7 physical: logical: write 1011
SLIDE 83 2-Page Translations
00 01
block 0
00 10 00 11 00 00 10 01
block 1
01 01 10 11 01 01
1 2 3 4 5 6 7 physical: logical: write 1011
copy
SLIDE 84 2-Page Translations
00 01
block 0
00 10 00 11 00 00 10 01
block 1
01 01 10 11 01 01
1 2 3 4 5 6 7 physical: logical: write 1011
SLIDE 85 2-Page Translations
00 01
block 0
00 10 00 11 00 00 10 01
block 1
01 01 10 11 01 01
1 2 3 4 5 6 7 physical: logical: write 1011
SLIDE 86 Larger Mappings
Advantage: larger mappings decrease table size.
- Disadvantages?
- more read-modify-write updates
- more garbage
- less flexibility for placement
SLIDE 87 Hybrid FTL
Use course-grained mapping for most (e.g., 95%)
- f data. Map at block level.
- Use fine-grained mapping for recent data.
Map at page level.
SLIDE 88 Log Blocks
Write changed pages to designated log blocks.
- After blocks become full, merge changes with old
data.
- Eventually garbage collect old pages.
SLIDE 89 Merging
Merging technique depends on I/O pattern.
- Three merge types:
- full merge
- partial merge
- switch merge
SLIDE 90 Merging
Merging technique depends on I/O pattern.
- Three merge types:
- full merge
- partial merge
- switch merge
SLIDE 91 A
block 0
B C D 11 11
block 1 (log)
11 11 11 11 11 11
1 2 3 physical: logical: …
11 11
block 2
11 11 11 11 11 11
SLIDE 92 A
block 0
B C D 11 11
block 1 (log)
11 11 11 11 11 11
1 2 3 physical: logical: …
11 11
block 2
11 11 11 11 11 11
write D2
SLIDE 93 A
block 0
B C D D2
block 1 (log)
11 11 11 11 11 11
1 2 3 physical: logical: …
11 11
block 2
11 11 11 11 11 11
write D2
SLIDE 94 A
block 0
B C D D2
block 1 (log)
11 11 11 11 11 11
1 2 3 physical: logical: …
11 11
block 2
11 11 11 11 11 11
SLIDE 95 A
block 0
B C D D2
block 1 (log)
11 11 11 11 11 11
1 2 3 physical: logical: …
11 11
block 2
11 11 11 11 11 11
eventually, we need to get rid of red arrows, as these represent expensive mappings
SLIDE 96 A
block 0
B C D D2
block 1 (log)
11 11 11 11 11 11
1 2 3 physical: logical: …
11 11
block 2
11 11 11 11 11 11
SLIDE 97 A
block 0
B C D D2
block 1 (log)
11 11 11 11 11 11
1 2 3 physical: logical: …
A
block 2
B C D2
SLIDE 98 A
block 0
B C D D2
block 1 (log)
11 11 11 11 11 11
1 2 3 physical: logical: …
A
block 2
B C D2
SLIDE 99 A
block 0
B C D D2
block 1 (log)
11 11 11 11 11 11
1 2 3 physical: logical: …
A
block 2
B C D2
SLIDE 100 A
block 0
B C D D2
block 1 (log)
11 11 11 11 11 11
1 2 3 physical: logical: …
A
block 2
B C D2 garbage
SLIDE 101 Merging
Merging technique depends on I/O pattern.
- Three merge types:
- full merge
- partial merge
- switch merge
SLIDE 102 A
block 0
B C D 11 11
block 1 (log)
11 11 11 11 11 11
1 2 3 physical: logical: …
11 11
block 2
11 11 11 11 11 11
SLIDE 103 A
block 0
B C D 11 11
block 1 (log)
11 11 11 11 11 11
1 2 3 physical: logical: …
11 11
block 2
11 11 11 11 11 11
write D2
SLIDE 104 A
block 0
B C D 11 11
block 1 (log)
11 11 11 11 D2
1 2 3 physical: logical: …
11 11
block 2
11 11 11 11 11 11
write D2
SLIDE 105 A
block 0
B C D 11 11
block 1 (log)
11 11 11 11 D2
1 2 3 physical: logical: …
11 11
block 2
11 11 11 11 11 11
SLIDE 106 A
block 0
B C D A
block 1 (log)
B C D2
1 2 3 physical: logical: …
11 11
block 2
11 11 11 11 11 11
SLIDE 107 A
block 0
B C D A
block 1 (log)
B C D2
1 2 3 physical: logical: …
11 11
block 2
11 11 11 11 11 11
SLIDE 108 A
block 0
B C D A
block 1
B C D2
1 2 3 physical: logical: …
11 11
block 2
11 11 11 11 11 11
SLIDE 109 A
block 0
B C D A
block 1
B C D2
1 2 3 physical: logical: …
11 11
block 2
11 11 11 11 11 11 garbage
SLIDE 110 Merging
Merging technique depends on I/O pattern.
- Three merge types:
- full merge
- partial merge
- switch merge
SLIDE 111 A
block 0
B C D 11 11
block 1 (log)
11 11 11 11 11 11
1 2 3 physical: logical: …
11 11
block 2
11 11 11 11 11 11
SLIDE 112 A
block 0
B C D 11 11
block 1 (log)
11 11 11 11 11 11
1 2 3 physical: logical: …
11 11
block 2
11 11 11 11 11 11
write A2
SLIDE 113 A
block 0
B C D A2
block 1 (log)
11 11 11 11 11 11
1 2 3 physical: logical: …
11 11
block 2
11 11 11 11 11 11
write A2
SLIDE 114 A
block 0
B C D A2
block 1 (log)
B2 11 11 11 11
1 2 3 physical: logical: …
11 11
block 2
11 11 11 11 11 11
write B2
SLIDE 115 A
block 0
B C D A2
block 1 (log)
B2 C2 11 11
1 2 3 physical: logical: …
11 11
block 2
11 11 11 11 11 11
write C2
SLIDE 116 A
block 0
B C D A2
block 1 (log)
B2 C2 D2
1 2 3 physical: logical: …
11 11
block 2
11 11 11 11 11 11
write D2
SLIDE 117 A
block 0
B C D A2
block 1 (log)
B2 C2 D2
1 2 3 physical: logical: …
11 11
block 2
11 11 11 11 11 11 garbage
SLIDE 118 Merging
Merging technique depends on I/O pattern.
- Three merge types:
- full merge
- partial merge
- switch merge
SLIDE 119 Summary
Flash is much faster than disk, but…
- It is more expensive.
- It’s not a drop-in replacement beneath an FS
without a complex layer for emulating hard disk API.