BigTable CS 452 BigTable In the early 2000s, Google had way more - - PowerPoint PPT Presentation

bigtable
SMART_READER_LITE
LIVE PREVIEW

BigTable CS 452 BigTable In the early 2000s, Google had way more - - PowerPoint PPT Presentation

BigTable CS 452 BigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldnt scale Want something better than a filesystem (GFS) BigTable optimized for: - Lots of data, large infrastructure -


slide-1
SLIDE 1

BigTable

CS 452

slide-2
SLIDE 2

BigTable

In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a filesystem (GFS) BigTable optimized for:

  • Lots of data, large infrastructure
  • Relatively simple queries

Relies on Chubby, GFS

slide-3
SLIDE 3

Chubby

slide-4
SLIDE 4

Chubby

Distributed coordination service Goal: allow client applications to synchronize and manage dynamic configuration state Intuition: only some parts of an app need consensus!

  • Lab 2: Highly available view service
  • Master election in a distributed FS (e.g. GFS)
  • Metadata for sharded services

Implementation: (Multi-)Paxos SMR

slide-5
SLIDE 5

Why Chubby?

Many applications need coordination (locking, metadata, etc). Every sufficiently complicated distributed system contains an ad-hoc, informally-specified, bug- ridden, slow implementation of Paxos Paxos is a known good solution (Multi-)Paxos is hard to implement and use

slide-6
SLIDE 6

How to do consensus as a service

Chubby provides:

  • Small files
  • Locking
  • “Sequencers”

Filesystem-like API

  • Open, Close, Poison
  • GetContents, SetContents, Delete
  • Acquire, TryAcquire, Release
  • GetSequencer, SetSequencer, CheckSequencer
slide-7
SLIDE 7

Back to BigTable

slide-8
SLIDE 8

Uninterpreted strings in rows and columns (r : string) -> (c : string) -> (t : int64) -> string Mostly schema-less; column “families” for access Data sorted by row name

  • lexicographically close names likely to be nearby

Each piece of data versioned via timestamps

  • Either user- or server-generated
  • Control garbage-collection
slide-9
SLIDE 9

BigTable components

Client Master Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server GFS

slide-10
SLIDE 10

BigTable components

Client Master Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server GFS

slide-11
SLIDE 11

Tablets

Each table composed of one or more tablets Starts at one, splits once it’s big enough

  • Split at row boundaries

Tablets ~100MB-200MB a data b data c data d data

slide-12
SLIDE 12

Tablets

Each table composed of one or more tablets Starts at one, splits once it’s big enough

  • Split at row boundaries

Tablets ~100MB-200MB a data b data c data d data e data

slide-13
SLIDE 13

Tablets

Each table composed of one or more tablets Starts at one, splits once it’s big enough

  • Split at row boundaries

Tablets ~100MB-200MB a data b data c data d data e data

slide-14
SLIDE 14

Tablets

A tablet is indexed by its range of keys

  • <START> - “c”
  • “c” - <END>

Each tablet lives on at most one tablet server Master coordinates assignments of tablets to servers

slide-15
SLIDE 15

Tablets

Tablet locations stored in METADATA table Root tablet stores locations of METADATA tablets Root tablet location stored in Chubby

slide-16
SLIDE 16

Tablet serving

Tablet data persisted to GFS

  • GFS writes replicated to 3 nodes
  • One of these nodes should be the tablet server!

Three important data structures:

  • memtable: in-memory map
  • SSTable: immutable, on-disk map
  • Commit log: operation log used for recovery
slide-17
SLIDE 17

Tablet serving

Writes go to the commit log, then to the memtable Reads see a merged view of memtable + SSTables

  • Data could be in memtable or on disk
  • Or, some columns in each
slide-18
SLIDE 18

Compaction and compression

Memtables spilled to disk once they grow too big

  • “minor compaction”: converted to SSTable

Periodically, all SSTables for a tablet compacted

  • “major compaction”: many SSTables -> one

Compression: each block of an SSTable compressed

  • Can get enormous ratios with text data
  • Locality helps—similar web pages in same block
slide-19
SLIDE 19

BigTable components

Client Master Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server GFS

slide-20
SLIDE 20

BigTable components

Client Master Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server GFS

slide-21
SLIDE 21

Master

Tracks tablet servers (using Chubby) Assigns tablets to servers Handles tablet server failures

slide-22
SLIDE 22

Master startup

  • Acquire master lock in Chubby
  • Find live tablet servers (each tablet server writes its

identity to a directory in Chubby)

  • Communicate with live servers to find out who has

which tablet

  • Scan METADATA tablets to find unassigned tablets
slide-23
SLIDE 23

Master operation

Detect tablet server failures

  • Assign tablets to other servers

Merge tablets (if they fall below a size threshold) Handle split tablets

  • Splits initiated by tablet servers
  • Master responsible for assigning new tablet

Clients never read from master

slide-24
SLIDE 24

BigTable components

Client Master Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server GFS

slide-25
SLIDE 25

BigTable components

Client Master Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server GFS

slide-26
SLIDE 26

BigTable components

Client Master Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server GFS Where is the root tablet?

slide-27
SLIDE 27

BigTable components

Client Master Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server GFS Tablet server 2

slide-28
SLIDE 28

BigTable components

Client Master Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server GFS

slide-29
SLIDE 29

BigTable components

Client Master Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server GFS Where is the METADATA tablet for table T row R?

slide-30
SLIDE 30

BigTable components

Client Master Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server GFS Tablet server 1

slide-31
SLIDE 31

BigTable components

Client Master Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server GFS

slide-32
SLIDE 32

BigTable components

Client Master Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server GFS Where is table T row R?

slide-33
SLIDE 33

BigTable components

Client Master Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server GFS Tablet server 3

slide-34
SLIDE 34

BigTable components

Client Master Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server GFS

slide-35
SLIDE 35

BigTable components

Client Master Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server GFS Read table T row R

slide-36
SLIDE 36

BigTable components

Client Master Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server Tablet Server GFS Row

slide-37
SLIDE 37

Optimizations

Clients cache tablet locations Tablet servers only respond if Chubby session active, so this is safe Locality groups Put column families that are infrequently accessed together in separate SSTables Smart caching on tablet servers Bloom filters on SSTables