Tao: Facebook's Distributed Data Store For The Social Graph Bronson - - PowerPoint PPT Presentation

tao facebook s distributed data
SMART_READER_LITE
LIVE PREVIEW

Tao: Facebook's Distributed Data Store For The Social Graph Bronson - - PowerPoint PPT Presentation

Tao: Facebook's Distributed Data Store For The Social Graph Bronson et. al., ATC 2013 Joy Arulraj CMU 15-799 : Paper Presentation Talk Overview Graph-aware cache backed by a database Efficiency vs. consistency Motivation Memcached


slide-1
SLIDE 1

Tao: Facebook's Distributed Data Store For The Social Graph

Bronson et. al., ATC 2013

Joy Arulraj CMU 15-799 : Paper Presentation

slide-2
SLIDE 2

Talk Overview

  • Graph-aware cache backed by a database

– Efficiency vs. consistency

slide-3
SLIDE 3

Motivation

  • Memcached

– Distributed in-memory key-value store – Memory object caching system – Data mapping in client code (PHP API)

slide-4
SLIDE 4

Limitations

  • Association lists

– Get entire list to update one edge

  • Control logic

– Clients manage lookaside cache – But, only have a local perspective

  • Expensive read-after-write consistency

– Writes forwarded to master – Local state updated asynchronously

slide-5
SLIDE 5

Problem Statement

  • Need a “smart” caching layer

– Graph-aware – Distributed cache management – Provides read-my-write consistency

  • Solution

– Fix the API and leverage its constraints !

slide-6
SLIDE 6

Example

Alice was at CMU with Bob Cathy : Wish we were there ! David likes this

Id : 600 otype : LOCATION name: CMU Id : 400 otype : User name: Cathy Id : 500 otype : USER name: David Id : 800 otype : COMMENT text: Wish we were there ! Id : 200 otype : User name: Alice Id : 300 otype : User name: Bob

LOC CMT LIKES

Id : 700 otype : CHECKIN

FRIEND

slide-7
SLIDE 7

Data Model

  • Object

– (id) -> (otype, (key->value)*) – Entities, repeatable actions – Ex: users, comments

  • Association

– (id1, atype, id2) -> (time, (key->value)*) – Relationships, actions that model state transitions – Ex: tagged at, likes

slide-8
SLIDE 8

Data Model

  • Association List

– (id1, atype) -> [anew,…,aold] – Supports the Association Query API – Ex: (“CMU”, “COMMENT”)

slide-9
SLIDE 9

API

  • Association API

– assoc_add(id1, atype, id2, time, (k->v)*) – assoc_delete(id1, atype, id2)

  • Association Query API

– [POINT] assoc_get(id1, atype, id2) – [RANGE] assoc_range(id1, atype, pos, limit) – [COUNT] assoc_count(id1, atype)

slide-10
SLIDE 10

Client Queries

  • All queries start from an <id, atype>
  • 5 most recent comments on Alice’s checkin

– assoc_range(“Alice”, “COMMENT”, 0, 5)

  • Number of friends of Bob

– assoc_count(“Bob”, “FRIEND”)

slide-11
SLIDE 11

Tao’s Goals

  • Low read latency
  • Write consistency
  • High read availability
slide-12
SLIDE 12

Basic Architecture

Webservers

  • Stateless

Database

  • Partitioned based on <id>

Cache servers

  • Objects, Association Lists
  • Partitioned based on <id>

TAO

slide-13
SLIDE 13

Low Read Latency

Webservers

  • Too many network hops

Cache servers

  • Hotspots with smaller shards
slide-14
SLIDE 14

Datacenter-level Scalability

Tiers

  • Distributed write logic

Database

  • Thundering herds
slide-15
SLIDE 15

Splitting the cache layer

Follower Cache Leader Cache

slide-16
SLIDE 16

Write Consistency

  • Followers

– Absorb read hits – Forward read misses and writes to leaders – Write-through cache

  • Leader updates

– Synchronously sent in reply to writer – Asynchronously sent to other followers

slide-17
SLIDE 17

Write consistency

  • Leaders

– Serialize concurrent writes – Can prevent “thundering herds”

  • Association list updates

– Refills instead of invalidates – Idempotent pull-based incremental updates

slide-18
SLIDE 18

Multi-datacenter Scalability

Master Datacenter Replica Datacenter Forwarded writes Async DB replication

slide-19
SLIDE 19

High Read Availability

  • Follower failure

– Client contacts backup follower tier – May break read-after-write consistency

  • Leader failure

– Follower tiers reroute read misses directly to DB – Writes sent to another member of leader tier

slide-20
SLIDE 20

Handling Hot Spots

  • Consistent hashing

– Simplifies cluster expansion – Request rerouting

  • Load balancing

– Shard cloning – Small client-side cache

slide-21
SLIDE 21

Results

  • Reads dominate writes

– 99.8% read requests – 40% of requests are range queries

  • Most edge queries have empty results

– Tao can use cached assoc_count – Key advantage of app-aware caching

slide-22
SLIDE 22

Results

  • Availability

– Fraction of failed queries : 4.9*10-6

  • Follower Throughput

– 8 core Xeon + 144GB RAM + 10Gb Ethernet – 30-60K requests/sec

slide-23
SLIDE 23

Tao Summary

  • Low read latency

– Application-aware cache layer

  • Write consistency

– Replication model

  • High read availability

– Fault-tolerance

slide-24
SLIDE 24

Talk Summary

  • Graph-aware cache backed by a database

– Efficiency vs. consistency

  • Why did they not use a graph database ?

– They trust MySQL – Tao’s cache layer handles their demands

Thanks !