Fault Isolation and Quick Recovery in Isolation File Systems Lanyue - - PowerPoint PPT Presentation

fault isolation and quick recovery in isolation file
SMART_READER_LITE
LIVE PREVIEW

Fault Isolation and Quick Recovery in Isolation File Systems Lanyue - - PowerPoint PPT Presentation

Fault Isolation and Quick Recovery in Isolation File Systems Lanyue Lu Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau University of Wisconsin - Madison 1 File-System Availability Is Critical 2 File-System Availability Is Critical Main


slide-1
SLIDE 1

Fault Isolation and Quick Recovery in Isolation File Systems

Lanyue Lu Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau University of Wisconsin - Madison

1

slide-2
SLIDE 2

File-System Availability Is Critical

2

slide-3
SLIDE 3

File-System Availability Is Critical

Main data access interface

➡ desktop, laptop, mobile devices, file servers

2

slide-4
SLIDE 4

File-System Availability Is Critical

Main data access interface

➡ desktop, laptop, mobile devices, file servers

A wide range of failures

➡ resource allocation, metadata corruption ➡ failed I/O operations, incorrect system states

2

slide-5
SLIDE 5

File-System Availability Is Critical

Main data access interface

➡ desktop, laptop, mobile devices, file servers

A wide range of failures

➡ resource allocation, metadata corruption ➡ failed I/O operations, incorrect system states

A small fault can cause global failures

➡ e.g., a single bit can impact the whole file system

2

slide-6
SLIDE 6

File-System Availability Is Critical

Main data access interface

➡ desktop, laptop, mobile devices, file servers

A wide range of failures

➡ resource allocation, metadata corruption ➡ failed I/O operations, incorrect system states

A small fault can cause global failures

➡ e.g., a single bit can impact the whole file system

Global failures considered harmful

➡ read-only, crash

2

slide-7
SLIDE 7

Server Virtualization

Hypervisor Shared file system Guest virtual machines

VM2 VM1 VM3 VMDK1 VMDK2 VMDK3

3

slide-8
SLIDE 8

VM2 VM1 VM3 VMDK1 VMDK2 VMDK3

4

slide-9
SLIDE 9

VM2 VM1 VM3 VMDK1 VMDK2 VMDK3

e.g., metadata corruption

4

slide-10
SLIDE 10

VM2 VM1 VM3 VMDK1 VMDK2 VMDK3

5

slide-11
SLIDE 11

VM2 VM1 VM3 VMDK1 VMDK2 VMDK3

e.g., metadata corruption

5

slide-12
SLIDE 12

VM2 VM1 VM3 VMDK1 VMDK2 VMDK3

ReadOnly

  • r

Crash All VMs are affected

6

slide-13
SLIDE 13

VM2 VM1 VM3 VMDK1 VMDK2 VMDK3

ReadOnly

  • r

Crash All VMs are affected

e.g., metadata corruption

6

slide-14
SLIDE 14

Our Solution

7

slide-15
SLIDE 15

Our Solution

A new abstraction for fault isolation

➡ support multiple independent fault domains ➡ protect a group of files for a domain

7

slide-16
SLIDE 16

Our Solution

A new abstraction for fault isolation

➡ support multiple independent fault domains ➡ protect a group of files for a domain

Isolation file systems

➡ fine-grained fault isolation ➡ quick recovery

7

slide-17
SLIDE 17

Introduction Study of Failure Policies Isolation File Systems Challenges

8

slide-18
SLIDE 18

Questions to Answer

9

slide-19
SLIDE 19

Questions to Answer

What global failure policies are used ?

➡ failure types ➡ number of each type

9

slide-20
SLIDE 20

Questions to Answer

What global failure policies are used ?

➡ failure types ➡ number of each type

What are the root causes of global failures ?

➡ related data structures ➡ number of each cause

9

slide-21
SLIDE 21

Methodology

10

slide-22
SLIDE 22

Methodology

Three major file systems

➡ Ext3 (Linux 2.6.32), Ext4 (Linux 2.6.32) ➡ Btrfs (Linux 3.8)

10

slide-23
SLIDE 23

Methodology

Three major file systems

➡ Ext3 (Linux 2.6.32), Ext4 (Linux 2.6.32) ➡ Btrfs (Linux 3.8)

Analyze source code

➡ identify types of global failures ➡ count related error handling functions ➡ correlate global failures to data structures

10

slide-24
SLIDE 24

Q1:

What global failure policies are used ?

11

slide-25
SLIDE 25

Global Failure Policies

12

slide-26
SLIDE 26

Global Failure Policies

Definition

➡ a failure which impacts all users of the file system or

even the operating system

12

slide-27
SLIDE 27

Global Failure Policies

Definition

➡ a failure which impacts all users of the file system or

even the operating system

Read-Only

➡ e.g., ext3_error(): ➡ mark file system as read-only ➡ abort the journal

12

slide-28
SLIDE 28

ext3/balloc.c, 2.6.32 read_block_bitmap(...){ 1 bitmap_blk = desc->bg_block_bitmap; 2 bh = sb_getblk(sb, bitmap_blk); 3 if (!bh){ 4 ext3_error(sb, “Cannot read block bitmap”); return NULL; } }

Read-Only Example

13

slide-29
SLIDE 29

ext3/balloc.c, 2.6.32 read_block_bitmap(...){ 1 bitmap_blk = desc->bg_block_bitmap; 2 bh = sb_getblk(sb, bitmap_blk); 3 if (!bh){ 4 ext3_error(sb, “Cannot read block bitmap”); return NULL; } }

Read-Only Example

13

slide-30
SLIDE 30

ext3/balloc.c, 2.6.32 read_block_bitmap(...){ 1 bitmap_blk = desc->bg_block_bitmap; 2 bh = sb_getblk(sb, bitmap_blk); 3 if (!bh){ 4 ext3_error(sb, “Cannot read block bitmap”); return NULL; } }

Read-Only Example

13

slide-31
SLIDE 31

ext3/balloc.c, 2.6.32 read_block_bitmap(...){ 1 bitmap_blk = desc->bg_block_bitmap; 2 bh = sb_getblk(sb, bitmap_blk); 3 if (!bh){ 4 ext3_error(sb, “Cannot read block bitmap”); return NULL; } }

Read-Only Example

13

slide-32
SLIDE 32

ext3/balloc.c, 2.6.32 read_block_bitmap(...){ 1 bitmap_blk = desc->bg_block_bitmap; 2 bh = sb_getblk(sb, bitmap_blk); 3 if (!bh){ 4 ext3_error(sb, “Cannot read block bitmap”); return NULL; } }

Read-Only Example

13

slide-33
SLIDE 33

Global Failure Policies

Definition

➡ a failure which impacts users of the file system or

even the operating system

Read-Only

➡ e.g., ext3_error(): ➡ mark file system as read-only ➡ abort the journal

Crash

➡ e.g., BUG(), ASSERT(), panic() ➡ crash the file system or operating system

14

slide-34
SLIDE 34

btrfs/disk-io.c, 3.8

  • pen_ctree(...) {

1 root->node = read_tree_block(...); 2 BUG_ON(!root->node);

Crash Example

15

slide-35
SLIDE 35

btrfs/disk-io.c, 3.8

  • pen_ctree(...) {

1 root->node = read_tree_block(...); 2 BUG_ON(!root->node);

Crash Example

15

slide-36
SLIDE 36

btrfs/disk-io.c, 3.8

  • pen_ctree(...) {

1 root->node = read_tree_block(...); 2 BUG_ON(!root->node);

Crash Example

15

slide-37
SLIDE 37

btrfs/disk-io.c, 3.8

  • pen_ctree(...) {

1 root->node = read_tree_block(...); 2 BUG_ON(!root->node);

Crash Example

15

slide-38
SLIDE 38

200 400 600 800 1000

Number of Instances

Ext3 Ext4 Btrfs 193 409 829

ReadOnly Crash

16

slide-39
SLIDE 39

Read-only and crash are

common in modern file systems Over 67% of global failures will crash the system

17

slide-40
SLIDE 40

Q2:

What are the root causes

  • f global failures ?

18

slide-41
SLIDE 41

Global Failure Causes

19

slide-42
SLIDE 42

Global Failure Causes

Metadata corruption

➡ metadata inconsistency is detected ➡ e.g., a block/inode bitmap corruption

19

slide-43
SLIDE 43

ext3/dir.c, 2.6.32 ext3_check_dir_entry(...){ 1 rlen = ext3_rec_len_from_disk(); 2 if (rlen < EXT3_DIR_REC_LEN(1)){ error = “rec_len is too small”; 3 ext3_error(sb, error); }

Metadata Corruption Example

20

slide-44
SLIDE 44

ext3/dir.c, 2.6.32 ext3_check_dir_entry(...){ 1 rlen = ext3_rec_len_from_disk(); 2 if (rlen < EXT3_DIR_REC_LEN(1)){ error = “rec_len is too small”; 3 ext3_error(sb, error); }

Metadata Corruption Example

20

slide-45
SLIDE 45

ext3/dir.c, 2.6.32 ext3_check_dir_entry(...){ 1 rlen = ext3_rec_len_from_disk(); 2 if (rlen < EXT3_DIR_REC_LEN(1)){ error = “rec_len is too small”; 3 ext3_error(sb, error); }

Metadata Corruption Example

20

slide-46
SLIDE 46

ext3/dir.c, 2.6.32 ext3_check_dir_entry(...){ 1 rlen = ext3_rec_len_from_disk(); 2 if (rlen < EXT3_DIR_REC_LEN(1)){ error = “rec_len is too small”; 3 ext3_error(sb, error); }

Metadata Corruption Example

20

slide-47
SLIDE 47

Global Failure Causes

Metadata corruption

➡ metadata inconsistency is detected ➡ e.g., a block/inode bitmap corruption

I/O failure

➡ metadata I/O failure and journaling failure ➡ e.g., fail to read an inode block

21

slide-48
SLIDE 48

ext4/namei.c, 2.6.32 empty_dir(...){ 1 bh = ext4_bread(NULL, inode, &err); if (bh && err) 2 EXT4_ERROR_INODE(inode, “fail to read directory block”);

I/O Failure Example

22

slide-49
SLIDE 49

ext4/namei.c, 2.6.32 empty_dir(...){ 1 bh = ext4_bread(NULL, inode, &err); if (bh && err) 2 EXT4_ERROR_INODE(inode, “fail to read directory block”);

I/O Failure Example

22

slide-50
SLIDE 50

ext4/namei.c, 2.6.32 empty_dir(...){ 1 bh = ext4_bread(NULL, inode, &err); if (bh && err) 2 EXT4_ERROR_INODE(inode, “fail to read directory block”);

I/O Failure Example

22

slide-51
SLIDE 51

ext4/namei.c, 2.6.32 empty_dir(...){ 1 bh = ext4_bread(NULL, inode, &err); if (bh && err) 2 EXT4_ERROR_INODE(inode, “fail to read directory block”);

I/O Failure Example

22

slide-52
SLIDE 52

Global Failure Causes

Metadata corruption

➡ metadata inconsistency is detected ➡ e.g., a block/inode bitmap corruption

I/O failure

➡ metadata I/O failure and journaling failure ➡ e.g., fail to read an inode block

Software bugs

➡ unexpected states detected ➡ e.g., allocated block is not in a valid range

23

slide-53
SLIDE 53

ext3/balloc.c, 2.6.32 ext3_rsv_window_add(...){ 1 if (start < this->rsv_start) p = &(*p)->rb->left; 2 else if (start > this->rsv_end) p = &(*p)->rb->right; 3 else { rsv_window_dump(root, 1); 4 BUG(); }

Software Bug Example

24

slide-54
SLIDE 54

ext3/balloc.c, 2.6.32 ext3_rsv_window_add(...){ 1 if (start < this->rsv_start) p = &(*p)->rb->left; 2 else if (start > this->rsv_end) p = &(*p)->rb->right; 3 else { rsv_window_dump(root, 1); 4 BUG(); }

Software Bug Example

24

slide-55
SLIDE 55

ext3/balloc.c, 2.6.32 ext3_rsv_window_add(...){ 1 if (start < this->rsv_start) p = &(*p)->rb->left; 2 else if (start > this->rsv_end) p = &(*p)->rb->right; 3 else { rsv_window_dump(root, 1); 4 BUG(); }

Software Bug Example

24

slide-56
SLIDE 56

ext3/balloc.c, 2.6.32 ext3_rsv_window_add(...){ 1 if (start < this->rsv_start) p = &(*p)->rb->left; 2 else if (start > this->rsv_end) p = &(*p)->rb->right; 3 else { rsv_window_dump(root, 1); 4 BUG(); }

Software Bug Example

24

slide-57
SLIDE 57

ext3/balloc.c, 2.6.32 ext3_rsv_window_add(...){ 1 if (start < this->rsv_start) p = &(*p)->rb->left; 2 else if (start > this->rsv_end) p = &(*p)->rb->right; 3 else { rsv_window_dump(root, 1); 4 BUG(); }

Software Bug Example

24

slide-58
SLIDE 58

Data Structure MC IOF SB Shared b-bitmap 2 2 Yes i-bitmap 1 1 Yes inode 1 2 2 Yes super 1 Yes dir-entry 4 4 3 Yes gdt 3 2 Yes indir-blk 1 1 No xattr 5 2 1 No block 5 Yes/No journal 3 27 Yes journal head 31 Yes buf head 16 Yes handle 22 9 Yes transaction 28 Yes revoke 2 Yes

  • ther

1 11 Yes/No Total 19 37 137 = 193

Ext3

25

slide-59
SLIDE 59

Data Structure MC IOF SB Shared b-bitmap 2 2 Yes i-bitmap 1 1 Yes inode 1 2 2 Yes super 1 Yes dir-entry 4 4 3 Yes gdt 3 2 Yes indir-blk 1 1 No xattr 5 2 1 No block 5 Yes/No journal 3 27 Yes journal head 31 Yes buf head 16 Yes handle 22 9 Yes transaction 28 Yes revoke 2 Yes

  • ther

1 11 Yes/No Total 19 37 137 = 193

Ext3

25

slide-60
SLIDE 60

Data Structure MC IOF SB Shared b-bitmap 2 2 Yes i-bitmap 1 1 Yes inode 1 2 2 Yes super 1 Yes dir-entry 4 4 3 Yes gdt 3 2 Yes indir-blk 1 1 No xattr 5 2 1 No block 5 Yes/No journal 3 27 Yes journal head 31 Yes buf head 16 Yes handle 22 9 Yes transaction 28 Yes revoke 2 Yes

  • ther

1 11 Yes/No Total 19 37 137 = 193

Ext3

25

slide-61
SLIDE 61

Data Structure MC IOF SB Shared b-bitmap 2 2 Yes i-bitmap 1 1 Yes inode 1 2 2 Yes super 1 Yes dir-entry 4 4 3 Yes gdt 3 2 Yes indir-blk 1 1 No xattr 5 2 1 No block 5 Yes/No journal 3 27 Yes journal head 31 Yes buf head 16 Yes handle 22 9 Yes transaction 28 Yes revoke 2 Yes

  • ther

1 11 Yes/No Total 19 37 137 = 193

Ext3

25

slide-62
SLIDE 62

Data Structure MC IOF SB Shared b-bitmap 2 2 Yes i-bitmap 1 1 Yes inode 1 2 2 Yes super 1 Yes dir-entry 4 4 3 Yes gdt 3 2 Yes indir-blk 1 1 No xattr 5 2 1 No block 5 Yes/No journal 3 27 Yes journal head 31 Yes buf head 16 Yes handle 22 9 Yes transaction 28 Yes revoke 2 Yes

  • ther

1 11 Yes/No Total 19 37 137 = 193

Ext3

25

slide-63
SLIDE 63

Data Structure MC IOF SB Shared b-bitmap 2 2 Yes i-bitmap 1 1 Yes inode 1 2 2 Yes super 1 Yes dir-entry 4 4 3 Yes gdt 3 2 Yes indir-blk 1 1 No xattr 5 2 1 No block 5 Yes/No journal 3 27 Yes journal head 31 Yes buf head 16 Yes handle 22 9 Yes transaction 28 Yes revoke 2 Yes

  • ther

1 11 Yes/No Total 19 37 137 = 193

Ext3

25

slide-64
SLIDE 64

Data Structure MC IOF SB Shared b-bitmap 2 2 Yes i-bitmap 1 1 Yes inode 1 2 2 Yes super 1 Yes dir-entry 4 4 3 Yes gdt 3 2 Yes indir-blk 1 1 No xattr 5 2 1 No block 5 Yes/No journal 3 27 Yes journal head 31 Yes buf head 16 Yes handle 22 9 Yes transaction 28 Yes revoke 2 Yes

  • ther

1 11 Yes/No Total 19 37 137 = 193

Ext3

25

slide-65
SLIDE 65

26

slide-66
SLIDE 66

All global failures are caused by

metadata and system states

26

slide-67
SLIDE 67

All global failures are caused by

metadata and system states

Both local and shared metadata can cause global failures

26

slide-68
SLIDE 68

All global failures are caused by

metadata and system states

Both local and shared metadata can cause global failures

26

slide-69
SLIDE 69

Not Only Local File Systems

27

slide-70
SLIDE 70

Not Only Local File Systems

Shared-disk file systems OCFS2

➡ inspired by Ext3 design ➡ used in virtualization environment ➡ host virtual machine images ➡ allow multiple Linux guests to share a file system

27

slide-71
SLIDE 71

Not Only Local File Systems

Shared-disk file systems OCFS2

➡ inspired by Ext3 design ➡ used in virtualization environment ➡ host virtual machine images ➡ allow multiple Linux guests to share a file system

Global failures are also prevalent

➡ a single piece of corrupted metadata can fail the

whole file system on multiple nodes !

27

slide-72
SLIDE 72

Current Abstractions

28

slide-73
SLIDE 73

Current Abstractions

File and directory

➡ metadata is shared for different files or directories

28

slide-74
SLIDE 74

Current Abstractions

File and directory

➡ metadata is shared for different files or directories

Namespace

➡ virtual machines, Chroot, BSD jail, Solaris Zones ➡ multiple namespaces still share a file system

28

slide-75
SLIDE 75

Current Abstractions

File and directory

➡ metadata is shared for different files or directories

Namespace

➡ virtual machines, Chroot, BSD jail, Solaris Zones ➡ multiple namespaces still share a file system

Partitions

➡ multiple file systems on separated partitions ➡ a single panic on a partition can crash the whole

  • perating system

➡ static partitions, dynamic partitions ➡ management of many partitions

28

slide-76
SLIDE 76

29

slide-77
SLIDE 77

All files on a file system implicitly share

a single fault domain

29

slide-78
SLIDE 78

All files on a file system implicitly share

a single fault domain

29

slide-79
SLIDE 79

All files on a file system implicitly share

a single fault domain

Current file-system abstractions do not provide fine-grained fault isolation

29

slide-80
SLIDE 80

Introduction Study of Failure Policies Isolation File Systems

New Abstraction Fault Isolation Quick Recovery Preliminary Implementation on Ext3

Challenges

30

slide-81
SLIDE 81

Isolation File Systems

31

slide-82
SLIDE 82

Isolation File Systems

Fine-grained partitioned

➡ files are isolated into separated domains

31

slide-83
SLIDE 83

Isolation File Systems

Fine-grained partitioned

➡ files are isolated into separated domains

Independent

➡ faulty units will not affect healthy units

31

slide-84
SLIDE 84

Isolation File Systems

Fine-grained partitioned

➡ files are isolated into separated domains

Independent

➡ faulty units will not affect healthy units

Fine-grained recovery

➡ repair a faulty unit quickly ➡ instead of checking the whole file system

31

slide-85
SLIDE 85

Isolation File Systems

Fine-grained partitioned

➡ files are isolated into separated domains

Independent

➡ faulty units will not affect healthy units

Fine-grained recovery

➡ repair a faulty unit quickly ➡ instead of checking the whole file system

Elastic

➡ dynamically grow and shrink its size

31

slide-86
SLIDE 86

New Abstraction

32

slide-87
SLIDE 87

New Abstraction

File Pod

➡ an abstract partition ➡ contains a group of files and related metadata ➡ an independent fault domain

32

slide-88
SLIDE 88

New Abstraction

File Pod

➡ an abstract partition ➡ contains a group of files and related metadata ➡ an independent fault domain

Operations

➡ create a file pod ➡ set / get file pod’s attributes ➡ failure policy ➡ recovery policy ➡ bind / unbind a file to pod ➡ share a file between pods

32

slide-89
SLIDE 89

d1 d2 d4 d3 /

33

slide-90
SLIDE 90

d1 d2 d4 d3 /

Pod1 Pod2

34

slide-91
SLIDE 91

Introduction Study of Failure Policies Isolation File Systems

New Abstraction Fault Isolation Quick Recovery Preliminary Implementation on Ext3

Challenges

35

slide-92
SLIDE 92

Metadata Isolation

36

slide-93
SLIDE 93

Metadata Isolation

Observation

➡ metadata is organized in a shared manner ➡ hard to isolate a failure for metadata

36

slide-94
SLIDE 94

Metadata Isolation

Observation

➡ metadata is organized in a shared manner ➡ hard to isolate a failure for metadata

For example

➡ multiple inodes are stored in a single inode block

i i i i i i i i i i i i

an inode block

36

slide-95
SLIDE 95

Metadata Isolation

Observation

➡ metadata is organized in a shared manner ➡ hard to isolate a failure for metadata

For example

➡ multiple inodes are stored in a single inode block ➡ an I/O failure can affect multiple files

i i i i i i i i i i i i

an inode block a block read failure

36

slide-96
SLIDE 96

37

slide-97
SLIDE 97

Key Idea 1:

37

slide-98
SLIDE 98

Key Idea 1:

Isolate metadata for file pods

37

slide-99
SLIDE 99

Localize Failures

38

slide-100
SLIDE 100

Localize Failures

Local Failures

➡ convert global failures to local failures ➡ same failure semantics ➡ only fail the faulty pod

38

slide-101
SLIDE 101

Localize Failures

Local Failures

➡ convert global failures to local failures ➡ same failure semantics ➡ only fail the faulty pod

Read-Only

➡ mark a file pod as Read-Only

38

slide-102
SLIDE 102

Localize Failures

Local Failures

➡ convert global failures to local failures ➡ same failure semantics ➡ only fail the faulty pod

Read-Only

➡ mark a file pod as Read-Only

Crash

➡ crash a file pod instead of the whole system ➡ provide the same initial states after crash

38

slide-103
SLIDE 103

d1 d2 d4 d3 /

Pod1 Pod2

39

slide-104
SLIDE 104

d1 d2 d4 d3 /

Pod1 Pod2 e.g., corruption

40

slide-105
SLIDE 105

d1 d2 d4 d3 /

Pod1 Pod2 e.g., corruption

40

slide-106
SLIDE 106

Introduction Study of Failure Policies Isolation File Systems

New Abstraction Fault Isolation Quick Recovery Preliminary Implementation on Ext3

Challenges

41

slide-107
SLIDE 107

Quick Recovery

42

slide-108
SLIDE 108

Quick Recovery

File system recovery is slow

➡ a small error requires a full check ➡ many random read requests ➡ 7 hours to sequentially read a 2 TB disk

42

slide-109
SLIDE 109

43

slide-110
SLIDE 110

a small fault requires a full check (slow!)

43

slide-111
SLIDE 111

a small fault requires a full check (slow!)

43

slide-112
SLIDE 112

44

slide-113
SLIDE 113

Key Idea 2:

44

slide-114
SLIDE 114

Key Idea 2:

Minimize the file system checking

range during recovery

44

slide-115
SLIDE 115

Quick Recovery

45

slide-116
SLIDE 116

Quick Recovery

Metadata Isolation

➡ file pod as the unit of recovery ➡ check and recover independently ➡ both online and offline

45

slide-117
SLIDE 117

Quick Recovery

Metadata Isolation

➡ file pod as the unit of recovery ➡ check and recover independently ➡ both online and offline

When recover ?

➡ leverage internal detection mechanism

45

slide-118
SLIDE 118

Quick Recovery

Metadata Isolation

➡ file pod as the unit of recovery ➡ check and recover independently ➡ both online and offline

When recover ?

➡ leverage internal detection mechanism

How to recover more efficiently ?

➡ only check the faulty pod ➡ narrow down to certain data structures

45

slide-119
SLIDE 119

Introduction Study of Failure Policies Isolation File Systems

New Abstraction Fault Isolation Quick Recovery Preliminary Implementation on Ext3

Challenges

46

slide-120
SLIDE 120

Ext3 Layout

47

slide-121
SLIDE 121

Ext3 Layout

A disk is divided into block groups

➡ physical partition for disk locality

47

slide-122
SLIDE 122

Ext3 Layout

A disk is divided into block groups

➡ physical partition for disk locality

disk layout

47

slide-123
SLIDE 123

Ext3 Layout

A disk is divided into block groups

➡ physical partition for disk locality

SB GDTs BM Inodes IM Blocks Blocks

disk layout

  • ne block group

47

slide-124
SLIDE 124 SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks

48

slide-125
SLIDE 125 SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks

f1 f2 f3 f4

multiple files can share a single block group

48

slide-126
SLIDE 126 SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks

f1 f2 f3 f4

multiple files can share a single block group

48

slide-127
SLIDE 127 SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks

f1 f2 f3 f4

multiple files can share a single block group

48

slide-128
SLIDE 128 SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks

f1 f2 f3 f4

multiple files can share a single block group

48

slide-129
SLIDE 129 SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks

f1 f2 f3 f4

multiple files can share a single block group

48

slide-130
SLIDE 130 SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks

f1 f2 f3 f4 f5

multiple files can share a single block group

  • ne file can span

multiple block groups

48

slide-131
SLIDE 131 SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks

f1 f2 f3 f4 f5

multiple files can share a single block group

  • ne file can span

multiple block groups

48

slide-132
SLIDE 132 SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks

f1 f2 f3 f4 f5

multiple files can share a single block group

  • ne file can span

multiple block groups

48

slide-133
SLIDE 133 SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks

f1 f2 f3 f4 f5

multiple files can share a single block group

  • ne file can span

multiple block groups

48

slide-134
SLIDE 134 SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks SB GDTs BM Inodes IM Blocks Blocks

f1 f2 f3 f4 f5

multiple files can share a single block group

  • ne file can span

multiple block groups

48

slide-135
SLIDE 135

Layout

49

slide-136
SLIDE 136

Layout

A file pod contains multiple block groups

➡ one block group only maps to one file pod ➡ performance locality and fault isolation

49

slide-137
SLIDE 137

Layout

A file pod contains multiple block groups

➡ one block group only maps to one file pod ➡ performance locality and fault isolation

disk layout

POD1 POD2 POD3

49

slide-138
SLIDE 138

Data Structures

50

slide-139
SLIDE 139

Data Structures

Pod related structure

➡ no extra mapping structures

50

slide-140
SLIDE 140

Data Structures

Pod related structure

➡ no extra mapping structures ➡ embeds in group descriptors ➡ group descriptors are loaded into memory

SB GDTs BM Inodes IM Blocks Blocks

a block group

pod

50

slide-141
SLIDE 141

Algorithms

51

slide-142
SLIDE 142

Algorithms

Pod based inode and block allocation

➡ preserve original allocation’s locality ➡ allocation will not cross pod boundary

51

slide-143
SLIDE 143

POD1 POD2 POD3

52

slide-144
SLIDE 144

POD1 POD2 POD3

  • 1. within the same pod
  • 2. an empty block group

52

slide-145
SLIDE 145

Algorithms

53

slide-146
SLIDE 146

Algorithms

Pod based inode and block allocation

➡ preserve original allocation’s locality ➡ allocation will not cross pod boundary

De-fragmentation

➡ potential internal fragmentation

53

slide-147
SLIDE 147

Algorithms

Pod based inode and block allocation

➡ preserve original allocation’s locality ➡ allocation will not cross pod boundary

De-fragmentation

➡ potential internal fragmentation ➡ de-fragmentation for file pods ➡ similar solution in Ext4

53

slide-148
SLIDE 148

Journaling

54

slide-149
SLIDE 149

Journaling

Virtual transaction

➡ contains updates only from one pod

T1 T2 T3 Pod 1 On-disk journal Pod 2 Pod 3

independent transactions

54

slide-150
SLIDE 150

Journaling

Virtual transaction

➡ contains updates only from one pod ➡ better performance isolation

T1 T2 T3 Pod 1 On-disk journal Pod 2 Pod 3

independent transactions

54

slide-151
SLIDE 151

Journaling

Virtual transaction

➡ contains updates only from one pod ➡ better performance isolation ➡ commit multiple virtual transactions in parallel

T1 T2 T3 Pod 1 On-disk journal Pod 2 Pod 3

journal reservation independent transactions shared journal

54

slide-152
SLIDE 152

Introduction Study of Failure Policies Isolation File Systems

New Abstraction Fault Isolation Quick Recovery Preliminary Implementation on Ext3

Challenges

55

slide-153
SLIDE 153

Status

56

slide-154
SLIDE 154

Status

What we did

➡ a simple prototype for Ext3 ➡ provide readonly isolation

56

slide-155
SLIDE 155

Status

What we did

➡ a simple prototype for Ext3 ➡ provide readonly isolation

What we plan to do

➡ crash isolation

56

slide-156
SLIDE 156

Status

What we did

➡ a simple prototype for Ext3 ➡ provide readonly isolation

What we plan to do

➡ crash isolation ➡ quick recovery after failure

56

slide-157
SLIDE 157

Status

What we did

➡ a simple prototype for Ext3 ➡ provide readonly isolation

What we plan to do

➡ crash isolation ➡ quick recovery after failure ➡ other file systems: Ext4 and Btrfs

56

slide-158
SLIDE 158

Challenges

57

slide-159
SLIDE 159

Challenges

Metadata isolation

➡ tree-based directory structure ➡ globally shared metadata: super block, journal ➡ shared system states: block allocation tree

57

slide-160
SLIDE 160

Challenges

Metadata isolation

➡ tree-based directory structure ➡ globally shared metadata: super block, journal ➡ shared system states: block allocation tree

Local failure

➡ is it correct to continue to run ? ➡ light-weight, stateless crash for a pod

57

slide-161
SLIDE 161

Challenges

Metadata isolation

➡ tree-based directory structure ➡ globally shared metadata: super block, journal ➡ shared system states: block allocation tree

Local failure

➡ is it correct to continue to run ? ➡ light-weight, stateless crash for a pod

Performance

➡ potential overhead of managing pods ➡ better performance isolation ➡ better scalability

57

slide-162
SLIDE 162

58

slide-163
SLIDE 163

Failure is not an option.

58

slide-164
SLIDE 164

Failure is not an option.

  • - NASA

58

slide-165
SLIDE 165

59

slide-166
SLIDE 166

Global failure is not an option;

59

slide-167
SLIDE 167

Global failure is not an option; local failure with quick recovery

59

slide-168
SLIDE 168

Global failure is not an option; local failure with quick recovery

is an option.

59

slide-169
SLIDE 169

Global failure is not an option; local failure with quick recovery

is an option.

  • - Isolation File Systems

59

slide-170
SLIDE 170

60

slide-171
SLIDE 171

Questions ?

60