[PPT] - Rollerchain: a DHT for Efficient Replication IEEE NCA13 Jo ao Paiva PowerPoint Presentation

SLIDE 1

Rollerchain: a DHT for Efficient Replication

IEEE NCA’13 Jo˜ ao Paiva, Jo˜ ao Leit˜ ao, Lu´ ıs Rodrigues

Instituto Superior T´ ecnico / Inesc-ID, Lisboa, Portugal

August 22, 2013

SLIDE 2

Outline

Introduction Our approach Evaluation Conclusions

SLIDE 3

Motivation

◮ Distributed Hash Tables are structured overlays where

nodes organize into a predefined topology that supports routing.

◮ DHTs allow for scalable key-value storage.

SLIDE 4

Motivation

◮ In dynamic environments, replication is paramount to

maintaining data.

◮ However, predefined topologies are expensive to maintain in

dynamic environments (churn).

◮ DHTs do not handle churn as well as unstructured networks.

SLIDE 5

Motivation

◮ In dynamic environments, replication is paramount to

maintaining data.

◮ However, predefined topologies are expensive to maintain in

dynamic environments (churn).

◮ DHTs do not handle churn as well as unstructured networks.

SLIDE 6

Main Approaches to DHT replication

1. Neighbour Replication
2. Multi-Publication

SLIDE 7

Neighbour Replication

Each node replicates its data on its R closest neighbours

◮ Good control on replication degree ◮ Simple to locate replicas ◮ Expensive replication: data is moved to respect topological

constraints

◮ Not resilient under churn: each node acts on its own ◮ Poor load balancing: no active mechanisms to balance load

SLIDE 8

Neighbour Replication

Each node replicates its data on its R closest neighbours

◮ Good control on replication degree ◮ Simple to locate replicas ◮ Expensive replication: data is moved to respect topological

constraints

◮ Not resilient under churn: each node acts on its own ◮ Poor load balancing: no active mechanisms to balance load

SLIDE 9

Neighbour Replication: operation

SLIDE 10

Neighbour Replication: operation

SLIDE 11

Neighbour Replication: operation

SLIDE 12

Neighbour Replication: operation

SLIDE 13

Neighbour Replication: operation

SLIDE 14

Multi-Publication

Each object is attributed R different identifiers to be stored by R different nodes.

◮ Better load balancing ◮ Reduced correlated failures ◮ Expensive overlay maintenance: each object has a different set

f replicas

◮ Expensive replication: data is moved to respect topological

constraints

◮ Not resilient under churn: each node acts on its own

SLIDE 15

Multi-Publication

Each object is attributed R different identifiers to be stored by R different nodes.

◮ Better load balancing ◮ Reduced correlated failures ◮ Expensive overlay maintenance: each object has a different set

f replicas

◮ Expensive replication: data is moved to respect topological

constraints

◮ Not resilient under churn: each node acts on its own

SLIDE 16

Current DHTs

Based on structured networks

Characterized by:

◮ Nodes with fixed positions in the overlay ◮ Static replication degree ◮ Poor performance under churn

SLIDE 17

Main challenges

Challenges:

1. Increase churn resilience
2. Minimize replication costs
3. Improve load balancing

SLIDE 18

Outline

Introduction Our approach Evaluation Conclusions

SLIDE 19

Our approach: Architecture overview

◮ Ring-based overlay: Composed of virtual nodes

SLIDE 20

Our approach: Architecture overview

◮ Ring-based overlay: Composed of virtual nodes

SLIDE 21

Our approach: Dynamic topology overview

SLIDE 22

Our approach: Dynamic topology overview

SLIDE 23

Our approach: Dynamic topology overview

SLIDE 24

Our approach: Dynamic topology overview

SLIDE 25

Our approach: Dynamic topology overview

SLIDE 26

Our approach: Dynamic topology overview

SLIDE 27

Our approach: Dynamic topology overview

SLIDE 28

Our approach: beating the challenges

1. Increase churn resilience: unstructured networks
2. Minimize replication costs: variable replication degree
3. Improve load balancing: dynamic key distribution

SLIDE 29

Our approach: beating the challenges

1. Increase churn resilience: unstructured networks
2. Minimize replication costs: variable replication degree
3. Improve load balancing: dynamic key distribution

SLIDE 30

Increasing churn resilience

◮ Ring maintained through gossip mechanisms

SLIDE 31

Increasing churn resilience

◮ Gossip to keep virtual node membership up-to-date

SLIDE 32

Increasing churn resilience

◮ Gossip to trade connections between virtual nodes

SLIDE 33

Increasing churn resilience

SLIDE 34

Increasing churn resilience

SLIDE 35

Increasing churn resilience

SLIDE 36

Increasing churn resilience

SLIDE 37

Increasing churn resilience

SLIDE 38

Our approach: beating the challenges

1. Increase churn resilience: unstructured networks
2. Minimize replication costs: variable replication degree
3. Improve load balancing: dynamic key distribution

SLIDE 39

Minimizing replication costs: node failure

◮ Variable replication degree: No data movement on failure

SLIDE 40

Minimizing replication costs: node failure

◮ Variable replication degree: No data movement on failure

SLIDE 41

Minimizing replication costs: node failure

◮ Variable replication degree: No data movement on failure

SLIDE 42

Minimizing replication costs: node join

◮ Nodes can select where to join: may join recently-failed virtual

nodes

SLIDE 43

Minimizing replication costs: node join

◮ Nodes can select where to join: may join recently-failed virtual

nodes

SLIDE 44

Minimizing replication costs: node join

◮ Nodes can select where to join: may join recently-failed virtual

nodes

SLIDE 45

Minimizing replication costs: node join

◮ New nodes can replace failed nodes: Blue’s data was moved

nly once and never discarded

SLIDE 46

Minimizing replication costs: node join

◮ New nodes can replace failed nodes: Blue’s data was moved

nly once and never discarded

SLIDE 47

Minimizing replication costs: node join

◮ New nodes can replace failed nodes: Blue’s data was moved

nly once and never discarded

SLIDE 48

Our approach: beating the challenges

1. Increase churn resilience: unstructured networks
2. Minimize replication costs: variable replication degree
3. Improve load balancing: dynamic key distribution

SLIDE 49

Improving replication costs: creating dynamic key distribution

◮ Virtual nodes store a number of keys proportional to their

size: Blue’s data is split proportionally by its children

SLIDE 50

Improving replication costs: creating dynamic key distribution

◮ Virtual nodes store a number of keys proportional to their

size: Blue’s data is split proportionally by its children

SLIDE 51

Improving replication costs: creating dynamic key distribution

◮ Virtual nodes store a number of keys proportional to their

size: Blue’s data is split proportionally by its children

SLIDE 52

Outline

Introduction Our approach Evaluation Conclusions

SLIDE 53

Experimental settings

◮ Overlay simulation in Peersim ◮ 100K Nodes ◮ 50K Keys ◮ Replication degree = 7 ◮ 5M queries

SLIDE 54

Churn resilience

churn=1 churn=10 churn=100 Churn rate 20 40 60 80 100 Objects reachable (%)

Rollerchain Neighbour Multi-Pub

SLIDE 55

Replication costs

churn=1 churn=10 churn=100 Churn rate 20 40 60 80 100 Objects moved per node

Rollerchain Neighbour Multi-Pub

SLIDE 56

Load Balancing

50 100 150 200 250 STDEV of number of queries processed

Rollerchain Neighbour Multi-Pub

SLIDE 57

Outline

Introduction Our approach Evaluation Conclusions

SLIDE 58

Conclusions

◮ DHT based on Virtual Nodes ◮ Designed with replication in mind ◮ Unstructured Networks: Increase churn resilience ◮ Variable replication degree: Minimize replication costs ◮ Dynamic key distribution: Improve load balancing

SLIDE 59

Conclusions

◮ DHT based on Virtual Nodes ◮ Designed with replication in mind ◮ Unstructured Networks: Increase churn resilience ◮ Variable replication degree: Minimize replication costs ◮ Dynamic key distribution: Improve load balancing

SLIDE 60