CS 744: GOOGLE FILE SYSTEM Shivaram Venkataraman Fall 2019 - - PowerPoint PPT Presentation

cs 744 google file system
SMART_READER_LITE
LIVE PREVIEW

CS 744: GOOGLE FILE SYSTEM Shivaram Venkataraman Fall 2019 - - PowerPoint PPT Presentation

CS 744: GOOGLE FILE SYSTEM Shivaram Venkataraman Fall 2019 ANNOUNCEMENTS - Assignment 1 out later today - Group submission form - Anybody on the waitlist? OUTLINE 1. Brief history 2. GFS 3. Discussion 4. What happened next? HISTORY OF


slide-1
SLIDE 1

CS 744: GOOGLE FILE SYSTEM

Shivaram Venkataraman Fall 2019

slide-2
SLIDE 2

ANNOUNCEMENTS

  • Assignment 1 out later today
  • Group submission form
  • Anybody on the waitlist?
slide-3
SLIDE 3

OUTLINE

  • 1. Brief history
  • 2. GFS
  • 3. Discussion
  • 4. What happened next?
slide-4
SLIDE 4

HISTORY OF DISTRIBUTED FILE SYSTEMS

slide-5
SLIDE 5

SUN NFS

File Server Client Client Client Client RPC RPC RPC RPC Local FS

slide-6
SLIDE 6

/dev/sda1 on / /dev/sdb1 on /backups NFS on /home

/ backups home bak1 bak2 bak3 etc bin tyler 537 p1 p2 .bashrc

slide-7
SLIDE 7

File HANDLES

Local FS Local FS

Client Server

NFS

Examples

  • pen

read

slide-8
SLIDE 8

CACHING

Client cache records time when data block was fetched (t1) Before using data block, client does a STAT request to server

  • get’s last modified timestamp for this file (t2) (not block…)
  • compare to cache timestamp
  • refetch data block if changed since timestamp (t2 > t1)

Local FS

Server

cache: B

Client 2

NFS cache: A

t1 t2

slide-9
SLIDE 9

NFS FEATURES

NFS handles client and server crashes very well; robust APIs that are:

  • stateless: servers don’t remember clients
  • idempotent: doing things twice never hurts

Caching is hard, especially with crashes Problems: – Consistency model is odd (client may not see updates until 3s after file closed) – Scalability limitations as more clients call stat() on server

slide-10
SLIDE 10

ANDREW FILE SYSTEM

  • Design for scale
  • Whole-file caching
  • Callbacks from server
slide-11
SLIDE 11

WORKLOAD PATTERNS (1991)

slide-12
SLIDE 12

WORKLOAD PATTERNS (1991)

slide-13
SLIDE 13

OceanSTORE/PAST

Wide area storage systems Fully decentralized Built on distributed hash tables (DHT)

slide-14
SLIDE 14

GFS: WHY ?

slide-15
SLIDE 15

GFS: WHY ?

Components with failures Files are huge ! Applications are different

slide-16
SLIDE 16

GFS: WORKLOAD ASSUMPTIONS

“Modest” number of large files Two kinds of reads: Large Streaming and small random Writes: Many large, sequential writes. No random High bandwidth more important than low latency

slide-17
SLIDE 17

GFS: DESIGN

  • Single Master for

metadata

  • Chunkservers for

storing data

  • No POSIX API !
  • No Caches!
slide-18
SLIDE 18

CHUNK SIZE TRADE-OFFS

Client à Master Client à Chunkserver Metadata

slide-19
SLIDE 19

GFS: REPLICATION

  • 3-way replication to handle faults
  • Primary replica for each chunk
  • Chain replication (consistency)
  • Dataflow: Pipelining, network-aware
slide-20
SLIDE 20

RECORD APPENDS

Write Client specifies the offset Record Append GFS chooses offset Consistency At-least once Atomic Application level

slide-21
SLIDE 21

MASTER OPERATIONS

  • No “directory” inode! Simplifies locking
  • Replica placement considerations
  • Implementing deletes
slide-22
SLIDE 22

FAULT TOLERANCE

  • Chunk replication with 3 replicas
  • Master
  • Replication of log, checkpoint
  • Shadow master
  • Data integrity using checksum blocks
slide-23
SLIDE 23

DISCUSSION

slide-24
SLIDE 24

GFS SOCIAL NETWORK

You are building a new social networking application. The operations you will need to perform are (a) add a new friend id for a given user (b) generate a histogram of number of friends per user. How will you do this using GFS as your storage system ?

slide-25
SLIDE 25

GFS EVAL

List your takeaways from "Figure 3: Aggregate Throughputs”

slide-26
SLIDE 26

GFS SCALE

The evaluation (Table 2) shows clusters with up to 180 TB of

  • data. What part of the design would need to change if we instead

had 180 PB of data?

slide-27
SLIDE 27

WHAT HAPPENED NEXT

slide-28
SLIDE 28

Keynote at PDSW-DISCS 2017: 2nd Joint International Workshop On Parallel Data Storage & Data Intensive Scalable Computing Systems

slide-29
SLIDE 29

GFS EVOLUTION

Motivation:

  • GFS Master

One machine not large enough for large FS Single bottleneck for metadata operations (data path offloaded) Fault tolerant, but not HA

  • Lack of predictable performance

No guarantees of latency (GFS problems: one slow chunkserver -> slow writes)

slide-30
SLIDE 30

GFS EVOLUTION

GFS master replaced by Colossus Metadata stored in BigTable Recursive structure ? If Metadata is ~1/10000 the size of data 100 PB data → 10 TB metadata 10TB metadata → 1GB metametadata 1GB metametadata → 100KB meta...

slide-31
SLIDE 31

GFS EVOLUTION

Need for Efficient Storage Rebalance old, cold data Distributes newly written data evenly across disk Manage both SSD and hard disks

slide-32
SLIDE 32

Heterogeneous storage

F4: Facebook

Blob stores Key Value Stores

slide-33
SLIDE 33

NEXT STEPS

  • Assignment 1 out tonight!
  • Next week: MapReduce, Spark