Fi Fingerprinting t g the C Check cker Po Policies of Pa - - PowerPoint PPT Presentation

fi fingerprinting t g the c check cker po policies of pa
SMART_READER_LITE
LIVE PREVIEW

Fi Fingerprinting t g the C Check cker Po Policies of Pa - - PowerPoint PPT Presentation

PDSW 2020:5TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Fi Fingerprinting t g the C Check cker Po Policies of Pa Parallel File Systems Runzhou Han , Duo Zhang, Mai Zheng Parallel File Systems (PFSes) PFS is the cornerstone of high


slide-1
SLIDE 1

Fi Fingerprinting t g the C Check cker Po Policies of Pa Parallel File Systems

Runzhou Han, Duo Zhang, Mai Zheng

PDSW 2020:5TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP

slide-2
SLIDE 2

Parallel File Systems (PFSes)

1

  • PFS is the cornerstone of high performance computing
  • Optimized for highly concurrent access
slide-3
SLIDE 3

PFS Failures: Real-World Cases

2

Case1: HPCC Power Outage

slide-4
SLIDE 4

PFS Failures: Real-World Cases

3

Case1: HPCC Power Outage Case2: ACCRE Storage Outage*

* Hyperion Research survey of HPC organizations done for Panasas

slide-5
SLIDE 5

4

Some statistics*: ≈half

  • f HPC sites

experience storage system failures 1/month or more frequently

* Hyperion Research survey of HPC organizations done for Panasas

PFS Failures: More Frequent/Expensive Than You Thought

The average HPC storage system failure frequency is

9.8 failures/year

slide-6
SLIDE 6

5

Some statistics*:

Downtime ranges from

1 day↓ to 1 week↑ 40%

  • f HPC sites typically took

more than 2 weeks to restore their storage systems

* Hyperion Research survey of HPC organizations done for Panasas

≈half

  • f HPC sites

experience storage system failures 1/month or more frequently The average HPC storage system failure frequency is

9.8 failures/year

PFS Failures: More Frequent/Expensive Than You Thought

slide-7
SLIDE 7

6

Some statistics*:

A single day of downtime costs from

$100K↓ to $1M↑

Average downtime cost is

$127K/day

* Hyperion Research survey of HPC organizations done for Panasas

≈half

  • f HPC sites

experience storage system failures 1/month or more frequently The average HPC storage system failure frequency is

9.8 failures/year

Downtime ranges from

1 day↓ to 1 week↑ 40%

  • f HPC sites typically took

more than 2 weeks to restore their storage systems

PFS Failures: More Frequent/Expensive Than You Thought

slide-8
SLIDE 8

PFS & PFS Checkers (FSCKs)

  • Typical PFS architecture

7 Management Server (MGS) Metadata Server (MDS) Management Target (MGT) Metadata Target (MDT) Object Storage Servers (OSSes) Object Storage Targets (OSTs)

Network

slide-9
SLIDE 9

PFS & PFS Checkers (FSCKs)

  • Typical PFS architecture
  • Many PFSes are designed with a checker component
  • e.g., LFSCK for Lustre, BeeGFS-FSCK for BeeGFS, PV2FS-FSCK for OrangeFS

8 Management Server (MGS) Metadata Server (MDS) Management Target (MGT) Metadata Target (MDT) Object Storage Servers (OSSes) Object Storage Targets (OSTs)

Network

PFS checker (FSCK) I/Oes I/Oes I/Oes

slide-10
SLIDE 10

PFS & PFS Checkers (FSCKs)

  • Typical PFS architecture
  • Many PFSes are designed with a checker component
  • e.g., LFSCK for Lustre, BeeGFS-FSCK for BeeGFS, PV2FS-FSCK for OrangeFS
  • Detect and repair inconsistencies

9 Management Server (MGS) Metadata Server (MDS) Management Target (MGT) Metadata Target (MDT) Object Storage Servers (OSSes) Object Storage Targets (OSTs)

Network

PFS checker (FSCK) I/Oes I/Oes I/Oes

slide-11
SLIDE 11

PFS & PFS Checkers (FSCKs)

  • Typical PFS architecture
  • Many PFSes are designed with a checker component
  • e.g., LFSCK for Lustre, BeeGFS-FSCK for BeeGFS, PV2FS-FSCK for OrangeFS
  • Detect and repair inconsistencies
  • FSCKs have predefined checker policies

10 Management Server (MGS) Metadata Server (MDS) Management Target (MGT) Metadata Target (MDT) Object Storage Servers (OSSes) Object Storage Targets (OSTs)

Network

PFS checker (FSCK) I/Oes I/Oes I/Oes

slide-12
SLIDE 12

Examples of PFS Checker Policies

11

MDT

FID LOV EA

… MDT-object A

OST

xattr

OST-object a

data FID Parent FID xattr OST-

  • bjects

Structures Meaning xattr inode extended attribute FID a global ID of an Lustre object LOV EA stores child object’s FID Parent FID stores parent object’s FID

  • Lustre’s LFSCK Policy: mapping between MDT-object and OST-object
  • MDT-object’s LOV EA matches to OST-object’s FID
  • OST-object’s Parent FID matches to MDT-object’s FID

MDT-

  • bjects
slide-13
SLIDE 13

Examples of PFS Checker Policies

12

MDT

FID LOV EA

… MDT-object A

OST

xattr

OST-object a

data FID Parent FID xattr OST-

  • bjects

Structures Meaning xattr inode extended attribute FID a global ID of an Lustre object LOV EA stores child object’s FID Parent FID stores parent object’s FID

  • Lustre’s LFSCK Policy: mapping between MDT-object and OST-object
  • MDT-object’s LOV EA matches to OST-object’s FID
  • OST-object’s Parent FID matches to MDT-object’s FID

MDT-

  • bjects
slide-14
SLIDE 14

Examples of PFS Checker Policies

13

MDT

FID LOV EA

… MDT-object A

OST

xattr

OST-object a

data FID Parent FID xattr OST-

  • bjects

Structures Meaning xattr inode extended attribute FID a global ID of an Lustre object LOV EA stores child object’s FID Parent FID stores parent object’s FID

  • Lustre’s LFSCK Policy: mapping between MDT-object and OST-object
  • MDT-object’s LOV EA matches to OST-object’s FID
  • OST-object’s Parent FID matches to MDT-object’s FID

MDT-

  • bjects
slide-15
SLIDE 15

Examples of PFS Checker Policies

14

MDT

FID LOV EA

… MDT-object A

OST

xattr

OST-object a

data FID Parent FID xattr OST-

  • bjects

Structures Meaning xattr inode extended attribute FID a global ID of an Lustre object LOV EA stores child object’s FID Parent FID stores parent object’s FID

  • Lustre’s LFSCK Policy: mapping between MDT-object and OST-object
  • MDT-object’s LOV EA matches to OST-object’s FID
  • OST-object’s Parent FID matches to MDT-object’s FID

MDT-

  • bjects
slide-16
SLIDE 16

Examples of PFS Checker Policies

15

MDT

FID LOV EA

… MDT-object A

OST

xattr

OST-object a

data FID Parent FID xattr OST-

  • bjects

Structures Meaning xattr inode extended attribute FID a global ID of an Lustre object LOV EA stores child object’s FID Parent FID stores parent object’s FID

  • Lustre’s LFSCK Policy: mapping between MDT-object and OST-object
  • MDT-object’s LOV EA matches to OST-object’s FID
  • OST-object’s Parent FID matches to MDT-object’s FID

MDT-

  • bjects
slide-17
SLIDE 17

Examples of PFS Checker Policies

16

MDT

FID LOV EA

… MDT-object A

OST

xattr

OST-object a

data FID Parent FID xattr OST-

  • bjects

Structures Meaning xattr inode extended attribute FID a global ID of an Lustre object LOV EA stores child object’s FID Parent FID stores parent object’s FID

  • Lustre’s LFSCK Policy: mapping between MDT-object and OST-object
  • MDT-object’s LOV EA matches to OST-object’s FID
  • OST-object’s Parent FID matches to MDT-object’s FID

MDT-

  • bjects
slide-18
SLIDE 18

Examples of PFS Checker Policies

17

  • Lustre’s LFSCK Policy: mapping between MDT-object and OST-object
  • MDT-object’s LOV EA matches to OST-object’s FID
  • OST-object’s Parent FID matches to MDT-object’s FID

MDT

FID LOV EA

… MDT-object A

OST

xattr

OST-object a

data FID Parent FID xattr OST-

  • bjects

MDT-

  • bjects

corruption

Corruption 1

Structures Meaning xattr inode extended attribute FID a global ID of an Lustre object LOV EA stores child object’s FID Parent FID stores parent object’s FID

slide-19
SLIDE 19

Examples of PFS Checker Policies

18

  • Lustre’s LFSCK Policy: mapping between MDT-object and OST-object
  • MDT-object’s LOV EA matches to OST-object’s FID
  • OST-object’s Parent FID matches to MDT-object’s FID

MDT

FID LOV EA

… MDT-object A

OST

xattr

OST-object a

data FID Parent FID xattr OST-

  • bjects

MDT-

  • bjects

LFSCK

✔ Corruption 1

Fixed!

Structures Meaning xattr inode extended attribute FID a global ID of an Lustre object LOV EA stores child object’s FID Parent FID stores parent object’s FID

slide-20
SLIDE 20

Examples of PFS Checker Policies

19

  • Lustre’s LFSCK Policy: mapping between MDT-object and OST-object
  • MDT-object’s LOV EA matches to OST-object’s FID
  • OST-object’s Parent FID matches to MDT-object’s FID

MDT

FID LOV EA

… MDT-object A

OST

xattr

OST-object a

data FID Parent FID xattr OST-

  • bjects

MDT-

  • bjects

corruption

Corruption 2

Structures Meaning xattr inode extended attribute FID a global ID of an Lustre object LOV EA stores child object’s FID Parent FID stores parent object’s FID

slide-21
SLIDE 21

Examples of PFS Checker Policies

20

  • Lustre’s LFSCK Policy: mapping between MDT-object and OST-object
  • MDT-object’s LOV EA matches to OST-object’s FID
  • OST-object’s Parent FID matches to MDT-object’s FID

MDT

FID LOV EA

… MDT-object A

OST

xattr

OST-object a

data FID Parent FID xattr OST-

  • bjects

MDT-

  • bjects

LFSCK

Cannot be fixed!

Corruption 2

Structures Meaning xattr inode extended attribute FID a global ID of an Lustre object LOV EA stores child object’s FID Parent FID stores parent object’s FID

slide-22
SLIDE 22

Structures Meaning xattr inode extended attribute FID a global ID of an Lustre object LOV EA stores child object’s FID Parent FID stores parent object’s FID

Examples of PFS Checker Policies

21

  • Lustre’s LFSCK Policy: mapping between MDT-object and OST-object
  • MDT-object’s LOV EA matches to OST-object’s FID
  • OST-object’s Parent FID matches to MDT-object’s FID

MDT

FID LOV EA

… MDT-object A

OST

xattr

OST-object a

data FID Parent FID xattr OST-

  • bjects

MDT-

  • bjects

LFSCK

LFSCK’s policy is incomplete!

Corruption 2

Cannot be fixed!

slide-23
SLIDE 23

Our Contributions

22

  • A systematic approach to analyze PFS checker policies
  • PFS type-aware fault injection
  • PFS consistency model & taxonomy
slide-24
SLIDE 24

Our Contributions

23

  • A systematic approach to analyze PFS checker policies
  • PFS type-aware fault injection
  • PFS consistency model & taxonomy
  • A comprehensive study on the checkers of two widely used PFSes
  • Has exposed 33 suboptimal repairs
  • Has exposed 2 abnormal behaviors

(e.g., kernel panic), which has led to 1 new patch on Lustre

slide-25
SLIDE 25

Outline

  • Motivation & Contributions
  • Methodology
  • PFS type-aware fault injection
  • Fault models
  • PFS consistency model
  • PFS checker taxonomy
  • Experimental Result
  • Conclusion & Future Work

24

slide-26
SLIDE 26

PFS Type-aware Fault Injection

  • Why type-aware fault injection
  • Key observation
  • PFS metadata and the local file system metadata are closely correlated
  • E.g., Both Lustre and BeeGFS has metadata structures stored in inode extended attribute (xattr)
  • Benefits of fine-grained fault injection
  • reveal PFS checker policies precisely
  • Enable analyzing the contract between PFS checker and local FS

25

PFS name metadata structures in xattr shared metadata structures with Ext4 inode Lustre FID, LOV EA, parent FID, linkEA nlink BeeGFS fhgfs nlink, size

slide-27
SLIDE 27

Fault Models

  • Four fault models to capture the typical corruptions that may occur in

the local storage stack and be exposed to the PFS checker

  • junk, out-of-sync, zero, duplicate

26

slide-28
SLIDE 28

Fault Models

  • Four fault models to capture the typical corruptions that may occur in

the local storage stack and be exposed to the PFS checker

  • junk, out-of-sync, zero, duplicate
  • Fault model #1: junk
  • Bytes of the on-disk structure are replaced by random values
  • Caused by disk corruptions, local FS bugs, etc

27

… 1 1 1 1 … Before fault After fault … 1 1 1 …

slide-29
SLIDE 29

Fault Models

  • Fault model #2: out-of-sync
  • In-memory copy of the structure is inconsistent with on-disk copy
  • Caused by software bugs ,memory/disk corruptions, etc
  • Please refer to our paper for fault models #3 & #4

28

in-memory File 1 Structure A

… 1 1 1 1 …

File 1 Structure A

… 1 1 1 1 …

  • n-disk

File 1 Structure A

… 1 1 1 1 …

File 1 Structure A

Before fault

in-memory

  • n-disk

After fault

slide-30
SLIDE 30

PFS Consistency Model

  • General principles that PFS checkers should ensure to maintain PFS

integrity

  • Applicable to diverse PFSes
  • Include the definition of Consistency Group & 6 consistency rules

29

slide-31
SLIDE 31

PFS Consistency Model

  • General principles that PFS checkers should ensure to maintain PFS

integrity

  • Applicable to diverse PFSes
  • Include the definition of Consistency Group & 6 consistency rules
  • Consistency Group (CG)
  • Include an MDT-object and all its associated child OST-objects

30

MDT-object

  • f client file

OST-object 1 OST-object 2

Valid CG

slide-32
SLIDE 32

PFS Consistency Model

  • General principles that PFS checkers should ensure to maintain PFS

integrity

  • Applicable to diverse PFSes
  • Include the definition of Consistency Group & 6 consistency rules
  • Consistency Group (CG)
  • Include an MDT-object and all its associated child OST-objects
  • Consistency rules
  • CG-rule1: every object in a CG should be consistent individually

31

OST-object 1

MDT-object

  • f client file

Valid CG

OST-object 2

slide-33
SLIDE 33

PFS Consistency Model

  • General principles that PFS checkers should ensure to maintain PFS

integrity

  • Applicable to diverse PFSes
  • Include the definition of Consistency Group & 6 consistency rules
  • Consistency Group (CG)
  • Include an MDT-object and all its associated child OST-objects
  • Consistency rules
  • CG-rule2: one MDT-object of a client directory maps to no child OST-object

32

MDT-object of client directory

Valid CG

slide-34
SLIDE 34

PFS Consistency Model

  • General principles that PFS checkers should ensure to maintain PFS

integrity

  • Applicable to diverse PFSes
  • Include the definition of Consistency Group & 6 consistency rules
  • Consistency Group (CG)
  • Include an MDT-object and all its associated child OST-objects
  • Consistency rules
  • CG-rule3: one MDT-object of a client file maps to at least one child OST-object

33

OST-object 1

MDT-object

  • f client file

Valid CG

OST-object 2

slide-35
SLIDE 35

PFS Consistency Model

  • General principles that PFS checkers should ensure to maintain PFS

integrity

  • Applicable to diverse PFSes
  • Include the definition of Consistency Group & 6 consistency rules
  • Consistency Group (CG)
  • Include an MDT-object and all its associated child OST-objects
  • Consistency rules
  • CG-rule4: one OST-object maps to one and only one parent MDT-object

34

OST-object 1

MDT-object

  • f client file

Valid CG

OST-object 2

slide-36
SLIDE 36

PFS Consistency Model

  • General principles that PFS checkers should ensure to maintain PFS

integrity

  • Applicable to diverse PFSes
  • Include the definition of Consistency Group & 6 consistency rules
  • Consistency Group (CG)
  • Include an MDT-object and all its associated child OST-objects
  • Consistency rule
  • CG-rule5: the mapping b/w a parent MDT-object and a child OST-object is bidirectional

35

MDT-object

  • f client file

OST-object 1 OST-object 2

Valid CG

slide-37
SLIDE 37

PFS Consistency Model

  • General principles that PFS checkers should ensure to maintain PFS

integrity

  • Applicable to diverse PFSes
  • Include the definition of Consistency Group & 6 consistency rules
  • Consistency Group (CG)
  • Include an MDT-object and all its associated child OST-objects
  • Consistency rules
  • CG-rule6: an object violating previous rules may only exist in a specified location

36

/lost+found

MDT-object

  • f client file

OST-object 1 OST-object 2

CG-rule5 violation

slide-38
SLIDE 38

PFS Checker Taxonomy

  • A general characterization of checker policies
  • Qualitatively measures the policies
  • Enable cross-PFS comparison
  • Include 4 Detection levels & 4 Repair levels based on consistency model

37

Detection levels Definition

Dabn.

PFS checker behaves abnormally w/o reporting detection results

Dzero

PFS checker finishes normally but misses all CG corruptions

Dpar.

PFS checker partially detects CG corruptions

Dcom.

PFS checker detects CG corruptions completely

Repair levels Definition

Rwro.

PFS checker fixes CG corruptions in a wrong way

Rzero

PFS checker reports failure on repair

Rpar.

PFS checker partially fixes CG corruptions

Rcom.

PFS checker fixes corruptions and CGs’re valid again

slide-39
SLIDE 39

PFS Checker Taxonomy

  • A general characterization of checker policies
  • Qualitatively measures the policies
  • Enable cross-PFS comparison
  • Include 4 Detection levels & 4 Repair levels based on consistency model

38

Detection levels Definition

Dabn.

PFS checker behaves abnormally w/o reporting detection results

Dzero

PFS checker finishes normally but misses all CG corruptions

Dpar.

PFS checker partially detects CG corruptions

Dcom.

PFS checker detects CG corruptions completely

Repair levels Definition

Rwro.

PFS checker fixes CG corruptions in a wrong way

Rzero

PFS checker reports failure on repair

Rpar.

PFS checker partially fixes CG corruptions

Rcom.

PFS checker fixes corruptions and CGs’re valid again

slide-40
SLIDE 40

PFS Checker Taxonomy

  • A general characterization of checker policies
  • Qualitatively measures the policies
  • Enable cross-PFS comparison
  • Include 4 Detection levels & 4 Repair levels based on consistency model

39

Detection levels Definition

Dabn.

PFS checker behaves abnormally w/o reporting detection results

Dzero

PFS checker finishes normally but misses all CG corruptions

Dpar.

PFS checker partially detects CG corruptions

Dcom.

PFS checker detects CG corruptions completely

Repair levels Definition

Rwro.

PFS checker fixes CG corruptions in a wrong way

Rzero

PFS checker reports failure on repair

Rpar.

PFS checker partially fixes CG corruptions

Rcom.

PFS checker fixes corruptions and CGs’re valid again

slide-41
SLIDE 41

PFS Checker Taxonomy

  • A general characterization of checker policies
  • Qualitatively measures the policies
  • Enable cross-PFS comparison
  • Include 4 Detection levels & 4 Repair levels based on consistency model

40

Detection levels Definition

Dabn.

PFS checker behaves abnormally w/o reporting detection results

Dzero

PFS checker finishes normally but misses all CG corruptions

Dpar.

PFS checker partially detects CG corruptions

Dcom.

PFS checker detects CG corruptions completely

Repair levels Definition

Rwro.

PFS checker fixes CG corruptions in a wrong way

Rzero

PFS checker reports failure on repair

Rpar.

PFS checker partially fixes CG corruptions

Rcom.

PFS checker fixes corruptions and CGs’re valid again

slide-42
SLIDE 42

PFS Checker Taxonomy

  • A general characterization of checker policies
  • Qualitatively measures the policies
  • Enable cross-PFS comparison
  • Include 4 Detection levels & 4 Repair levels based on consistency model

41

Detection levels Definition

Dabn.

PFS checker behaves abnormally w/o reporting detection results

Dzero

PFS checker finishes normally but misses all CG corruptions

Dpar.

PFS checker partially detects CG corruptions

Dcom.

PFS checker detects CG corruptions completely

Repair levels Definition

Rwro.

PFS checker fixes CG corruptions in a wrong way

Rzero

PFS checker reports failure on repair

Rpar.

PFS checker partially fixes CG corruptions

Rcom.

PFS checker fixes corruptions and CGs’re valid again

slide-43
SLIDE 43

PFS Checker Taxonomy

  • A general characterization of checker policies
  • Qualitatively measures the policies
  • Enable cross-PFS comparison
  • Include 4 Detection levels & 4 Repair levels based on consistency model

42

Detection levels Definition

Dabn.

PFS checker behaves abnormally w/o reporting detection results

Dzero

PFS checker finishes normally but misses all CG corruptions

Dpar.

PFS checker partially detects CG corruptions

Dcom.

PFS checker detects CG corruptions completely

Repair levels Definition

Rwro.

PFS checker fixes CG corruptions in a wrong way

Rzero

PFS checker reports failure on repair

Rpar.

PFS checker partially fixes CG corruptions

Rcom.

PFS checker fixes corruptions and CGs’re valid again

slide-44
SLIDE 44

PFS Checker Taxonomy

  • A general characterization of checker policies
  • Qualitatively measures the policies
  • Enable cross-PFS comparison
  • Include 4 Detection levels & 4 Repair levels based on consistency model

43

Detection levels Definition

Dabn.

PFS checker behaves abnormally w/o reporting detection results

Dzero

PFS checker finishes normally but misses all CG corruptions

Dpar.

PFS checker partially detects CG corruptions

Dcom.

PFS checker detects CG corruptions completely

Repair levels Definition

Rwro.

PFS checker fixes CG corruptions in a wrong way

Rzero

PFS checker reports failure on repair

Rpar.

PFS checker partially fixes CG corruptions

Rcom.

PFS checker fixes corruptions and CGs’re valid again

slide-45
SLIDE 45

PFS Checker Taxonomy

  • A general characterization of checker policies
  • Qualitatively measures the policies
  • Enable cross-PFS comparison
  • Include 4 Detection levels & 4 Repair levels based on consistency model

44

Detection levels Definition

Dabn.

PFS checker behaves abnormally w/o reporting detection results

Dzero

PFS checker finishes normally but misses all CG corruptions

Dpar.

PFS checker partially detects CG corruptions

Dcom.

PFS checker detects CG corruptions completely

Repair levels Definition

Rwro.

PFS checker fixes CG corruptions in a wrong way

Rzero

PFS checker reports failure on repair

Rpar.

PFS checker partially fixes CG corruptions

Rcom.

PFS checker fixes corruptions and CGs’re valid again

slide-46
SLIDE 46

PFS Checker Taxonomy

  • A general characterization of checker policies
  • Qualitatively measures the policies
  • Enable cross-PFS comparison
  • Include 4 Detection levels & 4 Repair levels based on consistency model

45

Detection levels Definition

Dabn.

PFS checker behaves abnormally w/o reporting detection results

Dzero

PFS checker finishes normally but misses all CG corruptions

Dpar.

PFS checker partially detects CG corruptions

Dcom.

PFS checker detects CG corruptions completely

Repair levels Definition

Rwro.

PFS checker fixes CG corruptions in a wrong way

Rzero

PFS checker reports failure on repair

Rpar.

PFS checker partially fixes CG corruptions

Rcom.

PFS checker fixes corruptions and CGs’re valid again

slide-47
SLIDE 47

Outline

  • Motivation & Contributions
  • Methodology
  • PFS type-aware fault injection
  • Fault models
  • PFS consistency model
  • PFS checker taxonomy
  • Experimental Results
  • Conclusion & Future Work

46

slide-48
SLIDE 48

Experimental Results

  • Studied 11 Lustre structures and 7 BeeGFS structures

47

Lustre Structures

junk zero duplicate

  • ut-of-sync

MDT-object — — — Dabn. Rzero OST-object Dzero Rzero Dzero Rzero — Dabn. Rzero llog record Dzero Rzero Dzero Rzero — Dzero Rzero FID on MDT Dcom. Rwro. Dcom. Rzero Dcom. Rwro. Dcom. Rwro. FID on OST Dcom. Rzero Dcom. Rzero Dcom. Rzero Dzero Rzero FLDB Dzero Rzero Dzero Rzero — Dzero Rzero OI table *Dcom. Rzero *Dcom. Rzero — *Dcom. Rcom. LOV EA *Dcom. Rcom. *Dcom. Rcom. *Dcom. Rcom. *Dcom. Rcom. PFID Dpar. Rpar. Dcom. Rcom. Dcom. Rcom. Dzero Rzero linkEA Dcom. Rcom. Dcom. Rcom. Dcom. +Rcom. Dcom. Rcom. nlink Dzero Rzero *Dcom. Rzero — —

BeeGFS Structures

junk zero duplicate

  • ut-of-sync

dentry-by-name (MDT-object) — — — Dcom. Rcom. dentry-by-ID (MDT-object) — — — Dcom. Rcom. chunk (OST-object) Dzero Rzero Dzero Rzero — Dpar. Rzero fghfs Dcom. Rwro. Dcom. Rwro. Dcom. Rzero Dcom. Rwro. content directory — — — Dcom. Rcom. nlink *Dcom. Rzero *Dcom. Rzero — — size *Dcom. Rzero *Dcom. Rzero — —

slide-49
SLIDE 49

Experimental Results

  • Studied 11 Lustre structures and 7 BeeGFS structures

48

Lustre Structures

junk zero duplicate

  • ut-of-sync

MDT-object — — — Dabn. Rzero OST-object Dzero Rzero Dzero Rzero — Dabn. Rzero llog record Dzero Rzero Dzero Rzero — Dzero Rzero FID on MDT Dcom. Rwro. Dcom. Rzero Dcom. Rwro. Dcom. Rwro. FID on OST Dcom. Rzero Dcom. Rzero Dcom. Rzero Dzero Rzero FLDB Dzero Rzero Dzero Rzero — Dzero Rzero OI table *Dcom. Rzero *Dcom. Rzero — *Dcom. Rcom. LOV EA *Dcom. Rcom. *Dcom. Rcom. *Dcom. Rcom. *Dcom. Rcom. PFID Dpar. Rpar. Dcom. Rcom. Dcom. Rcom. Dzero Rzero linkEA Dcom. Rcom. Dcom. Rcom. Dcom. +Rcom. Dcom. Rcom. nlink Dzero Rzero *Dcom. Rzero — —

BeeGFS Structures

junk zero duplicate

  • ut-of-sync

dentry-by-name (MDT-object) — — — Dcom. Rcom. dentry-by-ID (MDT-object) — — — Dcom. Rcom. chunk (OST-object) Dzero Rzero Dzero Rzero — Dpar. Rzero fghfs Dcom. Rwro. Dcom. Rwro. Dcom. Rzero Dcom. Rwro. content directory — — — Dcom. Rcom. nlink *Dcom. Rzero *Dcom. Rzero — — size *Dcom. Rzero *Dcom. Rzero — —

slide-50
SLIDE 50

Experimental Results

  • Studied 11 Lustre structures and 7 BeeGFS structures

49

Lustre Structures

junk zero duplicate

  • ut-of-sync

MDT-object — — — Dabn. Rzero OST-object Dzero Rzero Dzero Rzero — Dabn. Rzero llog record Dzero Rzero Dzero Rzero — Dzero Rzero FID on MDT Dcom. Rwro. Dcom. Rzero Dcom. Rwro. Dcom. Rwro. FID on OST Dcom. Rzero Dcom. Rzero Dcom. Rzero Dzero Rzero FLDB Dzero Rzero Dzero Rzero — Dzero Rzero OI table *Dcom. Rzero *Dcom. Rzero — *Dcom. Rcom. LOV EA *Dcom. Rcom. *Dcom. Rcom. *Dcom. Rcom. *Dcom. Rcom. PFID Dpar. Rpar. Dcom. Rcom. Dcom. Rcom. Dzero Rzero linkEA Dcom. Rcom. Dcom. Rcom. Dcom. +Rcom. Dcom. Rcom. nlink Dzero Rzero *Dcom. Rzero — —

BeeGFS Structures

junk zero duplicate

  • ut-of-sync

dentry-by-name (MDT-object) — — — Dcom. Rcom. dentry-by-ID (MDT-object) — — — Dcom. Rcom. chunk (OST-object) Dzero Rzero Dzero Rzero — Dpar. Rzero fghfs Dcom. Rwro. Dcom. Rwro. Dcom. Rzero Dcom. Rwro. content directory — — — Dcom. Rcom. nlink *Dcom. Rzero *Dcom. Rzero — — size *Dcom. Rzero *Dcom. Rzero — —

slide-51
SLIDE 51

Experimental Results

  • Studied 11 Lustre structures and 7 BeeGFS structures

50

Lustre Structures

junk zero duplicate

  • ut-of-sync

MDT-object — — — Dabn. Rzero OST-object Dzero Rzero Dzero Rzero — Dabn. Rzero llog record Dzero Rzero Dzero Rzero — Dzero Rzero FID on MDT Dcom. Rwro. Dcom. Rzero Dcom. Rwro. Dcom. Rwro. FID on OST Dcom. Rzero Dcom. Rzero Dcom. Rzero Dzero Rzero FLDB Dzero Rzero Dzero Rzero — Dzero Rzero OI table *Dcom. Rzero *Dcom. Rzero — *Dcom. Rcom. LOV EA *Dcom. Rcom. *Dcom. Rcom. *Dcom. Rcom. *Dcom. Rcom. PFID Dpar. Rpar. Dcom. Rcom. Dcom. Rcom. Dzero Rzero linkEA Dcom. Rcom. Dcom. Rcom. Dcom. +Rcom. Dcom. Rcom. nlink Dzero Rzero *Dcom. Rzero — —

BeeGFS Structures

junk zero duplicate

  • ut-of-sync

dentry-by-name (MDT-object) — — — Dcom. Rcom. dentry-by-ID (MDT-object) — — — Dcom. Rcom. chunk (OST-object) Dzero Rzero Dzero Rzero — Dpar. Rzero fghfs Dcom. Rwro. Dcom. Rwro. Dcom. Rzero Dcom. Rwro. content directory — — — Dcom. Rcom. nlink *Dcom. Rzero *Dcom. Rzero — — size *Dcom. Rzero *Dcom. Rzero — —

slide-52
SLIDE 52

Experimental Results

  • Studied 11 Lustre structures and 7 BeeGFS structures

51

Lustre Structures

junk zero duplicate

  • ut-of-sync

MDT-object — — — Dabn. Rzero OST-object Dzero Rzero Dzero Rzero — Dabn. Rzero llog record Dzero Rzero Dzero Rzero — Dzero Rzero FID on MDT Dcom. Rwro. Dcom. Rzero Dcom. Rwro. Dcom. Rwro. FID on OST Dcom. Rzero Dcom. Rzero Dcom. Rzero Dzero Rzero FLDB Dzero Rzero Dzero Rzero — Dzero Rzero OI table *Dcom. Rzero *Dcom. Rzero — *Dcom. Rcom. LOV EA *Dcom. Rcom. *Dcom. Rcom. *Dcom. Rcom. *Dcom. Rcom. PFID Dpar. Rpar. Dcom. Rcom. Dcom. Rcom. Dzero Rzero linkEA Dcom. Rcom. Dcom. Rcom. Dcom. +Rcom. Dcom. Rcom. nlink Dzero Rzero *Dcom. Rzero — —

BeeGFS Structures

junk zero duplicate

  • ut-of-sync

dentry-by-name (MDT-object) — — — Dcom. Rcom. dentry-by-ID (MDT-object) — — — Dcom. Rcom. chunk (OST-object) Dzero Rzero Dzero Rzero — Dpar. Rzero fghfs Dcom. Rwro. Dcom. Rwro. Dcom. Rzero Dcom. Rwro. content directory — — — Dcom. Rcom. nlink *Dcom. Rzero *Dcom. Rzero — — size *Dcom. Rzero *Dcom. Rzero — —

slide-53
SLIDE 53

Experimental Results

  • 14 cases: checkers repair CG corruptions completely

52

Lustre Structures

junk zero duplicate

  • ut-of-sync

MDT-object — — — Dabn. Rzero OST-object Dzero Rzero Dzero Rzero — Dabn. Rzero llog record Dzero Rzero Dzero Rzero — Dzero Rzero FID on MDT Dcom. Rwro. Dcom. Rzero Dcom. Rwro. Dcom. Rwro. FID on OST Dcom. Rzero Dcom. Rzero Dcom. Rzero Dzero Rzero FLDB Dzero Rzero Dzero Rzero — Dzero Rzero OI table *Dcom. Rzero *Dcom. Rzero — *Dcom. Rcom. LOV EA *Dcom. Rcom. *Dcom. Rcom. *Dcom. Rcom. *Dcom. Rcom. PFID Dpar. Rpar. Dcom. Rcom. Dcom. Rcom. Dzero Rzero linkEA Dcom. Rcom. Dcom. Rcom. Dcom. +Rcom. Dcom. Rcom. nlink Dzero Rzero *Dcom. Rzero — —

BeeGFS Structures

junk zero duplicate

  • ut-of-sync

dentry-by-name (MDT-object) — — — Dcom. Rcom. dentry-by-ID (MDT-object) — — — Dcom. Rcom. chunk (OST-object) Dzero Rzero Dzero Rzero — Dpar. Rzero fghfs Dcom. Rwro. Dcom. Rwro. Dcom. Rzero Dcom. Rwro. content directory — — — Dcom. Rcom. nlink *Dcom. Rzero *Dcom. Rzero — — size *Dcom. Rzero *Dcom. Rzero — —

slide-54
SLIDE 54

Experimental Results

  • 18 cases: checkers detects CG corruptions but can’t repair completely

53

Lustre Structures

junk zero duplicate

  • ut-of-sync

MDT-object — — — Dabn. Rzero OST-object Dzero Rzero Dzero Rzero — Dabn. Rzero llog record Dzero Rzero Dzero Rzero — Dzero Rzero FID on MDT Dcom. Rwro. Dcom. Rzero Dcom. Rwro. Dcom. Rwro. FID on OST Dcom. Rzero Dcom. Rzero Dcom. Rzero Dzero Rzero FLDB Dzero Rzero Dzero Rzero — Dzero Rzero OI table *Dcom. Rzero *Dcom. Rzero — *Dcom. Rcom. LOV EA *Dcom. Rcom. *Dcom. Rcom. *Dcom. Rcom. *Dcom. Rcom. PFID Dpar. Rpar. Dcom. Rcom. Dcom. Rcom. Dzero Rzero linkEA Dcom. Rcom. Dcom. Rcom. Dcom. +Rcom. Dcom. Rcom. nlink Dzero Rzero *Dcom. Rzero — —

BeeGFS Structures

junk zero duplicate

  • ut-of-sync

dentry-by-name (MDT-object) — — — Dcom. Rcom. dentry-by-ID (MDT-object) — — — Dcom. Rcom. chunk (OST-object) Dzero Rzero Dzero Rzero — Dpar. Rzero fghfs Dcom. Rwro. Dcom. Rwro. Dcom. Rzero Dcom. Rwro. content directory — — — Dcom. Rcom. nlink *Dcom. Rzero *Dcom. Rzero — — size *Dcom. Rzero *Dcom. Rzero — —

slide-55
SLIDE 55

Experimental Results

  • 12 cases: checkers only check the in-memory copy of the structure
  • Could potentially miss corruptions of on-disk structures

54

Lustre Structures

junk zero duplicate

  • ut-of-sync

MDT-object — — — Dabn. Rzero OST-object Dzero Rzero Dzero Rzero — Dabn. Rzero llog record Dzero Rzero Dzero Rzero — Dzero Rzero FID on MDT Dcom. Rwro. Dcom. Rzero Dcom. Rwro. Dcom. Rwro. FID on OST Dcom. Rzero Dcom. Rzero Dcom. Rzero Dzero Rzero FLDB Dzero Rzero Dzero Rzero — Dzero Rzero OI table *Dcom. Rzero *Dcom. Rzero — *Dcom. Rcom. LOV EA *Dcom. Rcom. *Dcom. Rcom. *Dcom. Rcom. *Dcom. Rcom. PFID Dpar. Rpar. Dcom. Rcom. Dcom. Rcom. Dzero Rzero linkEA Dcom. Rcom. Dcom. Rcom. Dcom. +Rcom. Dcom. Rcom. nlink Dzero Rzero *Dcom. Rzero — —

BeeGFS Structures

junk zero duplicate

  • ut-of-sync

dentry-by-name (MDT-object) — — — Dcom. Rcom. dentry-by-ID (MDT-object) — — — Dcom. Rcom. chunk (OST-object) Dzero Rzero Dzero Rzero — Dpar. Rzero fghfs Dcom. Rwro. Dcom. Rwro. Dcom. Rzero Dcom. Rwro. content directory — — — Dcom. Rcom. nlink *Dcom. Rzero *Dcom. Rzero — — size *Dcom. Rzero *Dcom. Rzero — —

slide-56
SLIDE 56

Experimental Results

  • 2 cases: LFSCK triggers kernel panic
  • Has been confirmed by developers and led to 1 new patch
  • WhamCloud Community Jira: LU-13980, 09/26/2020

55

Lustre Structures

junk zero duplicate

  • ut-of-sync

MDT-object — — — Dabn. Rzero OST-object Dzero Rzero Dzero Rzero — Dabn. Rzero llog record Dzero Rzero Dzero Rzero — Dzero Rzero FID on MDT Dcom. Rwro. Dcom. Rzero Dcom. Rwro. Dcom. Rwro. FID on OST Dcom. Rzero Dcom. Rzero Dcom. Rzero Dzero Rzero FLDB Dzero Rzero Dzero Rzero — Dzero Rzero OI table *Dcom. Rzero *Dcom. Rzero — *Dcom. Rcom. LOV EA *Dcom. Rcom. *Dcom. Rcom. *Dcom. Rcom. *Dcom. Rcom. PFID Dpar. Rpar. Dcom. Rcom. Dcom. Rcom. Dzero Rzero linkEA Dcom. Rcom. Dcom. Rcom. Dcom. +Rcom. Dcom. Rcom. nlink Dzero Rzero *Dcom. Rzero — —

BeeGFS Structures

junk zero duplicate

  • ut-of-sync

dentry-by-name (MDT-object) — — — Dcom. Rcom. dentry-by-ID (MDT-object) — — — Dcom. Rcom. chunk (OST-object) Dzero Rzero Dzero Rzero — Dpar. Rzero fghfs Dcom. Rwro. Dcom. Rwro. Dcom. Rzero Dcom. Rwro. content directory — — — Dcom. Rcom. nlink *Dcom. Rzero *Dcom. Rzero — — size *Dcom. Rzero *Dcom. Rzero — —

slide-57
SLIDE 57

Outline

  • Motivation & Contributions
  • Methodology
  • PFS type-aware fault injection
  • Fault models
  • PFS consistency model
  • PFS checker taxonomy
  • Experimental Result
  • Conclusion & Future Work

56

slide-58
SLIDE 58

Conclusion & Future Work

  • A systematic approach to study PFS checkers
  • Has led to a new patch on Lustre
  • Future work
  • More automation (e.g., apply fuzzing techniques)
  • Study other PFSes (e.g., OrangeFS, Ceph)
  • Improve PFS checkers
  • Policy completeness
  • Performance

57

slide-59
SLIDE 59

Conclusion & Future Work

  • A systematic approach to study PFS checkers
  • Has led to a new patch on Lustre
  • Future work
  • More automation (e.g., apply fuzzing techniques)
  • Study other PFSes (e.g., OrangeFS, Ceph)
  • Improve PFS checkers
  • Policy completeness
  • Performance

58

Thank you & Questions?