SLIDE 1
Rollerchain: a DHT for Efficient Replication IEEE NCA13 Jo ao Paiva - - PowerPoint PPT Presentation
Rollerchain: a DHT for Efficient Replication IEEE NCA13 Jo ao Paiva - - PowerPoint PPT Presentation
Rollerchain: a DHT for Efficient Replication IEEE NCA13 Jo ao Paiva , Jo ao Leit ao, Lu s Rodrigues Instituto Superior T ecnico / Inesc-ID, Lisboa, Portugal August 22, 2013 Outline Introduction Our approach Evaluation
SLIDE 2
SLIDE 3
Motivation
◮ Distributed Hash Tables are structured overlays where
nodes organize into a predefined topology that supports routing.
◮ DHTs allow for scalable key-value storage.
SLIDE 4
Motivation
◮ In dynamic environments, replication is paramount to
maintaining data.
◮ However, predefined topologies are expensive to maintain in
dynamic environments (churn).
◮ DHTs do not handle churn as well as unstructured networks.
SLIDE 5
Motivation
◮ In dynamic environments, replication is paramount to
maintaining data.
◮ However, predefined topologies are expensive to maintain in
dynamic environments (churn).
◮ DHTs do not handle churn as well as unstructured networks.
SLIDE 6
Main Approaches to DHT replication
- 1. Neighbour Replication
- 2. Multi-Publication
SLIDE 7
Neighbour Replication
Each node replicates its data on its R closest neighbours
◮ Good control on replication degree ◮ Simple to locate replicas ◮ Expensive replication: data is moved to respect topological
constraints
◮ Not resilient under churn: each node acts on its own ◮ Poor load balancing: no active mechanisms to balance load
SLIDE 8
Neighbour Replication
Each node replicates its data on its R closest neighbours
◮ Good control on replication degree ◮ Simple to locate replicas ◮ Expensive replication: data is moved to respect topological
constraints
◮ Not resilient under churn: each node acts on its own ◮ Poor load balancing: no active mechanisms to balance load
SLIDE 9
Neighbour Replication: operation
SLIDE 10
Neighbour Replication: operation
SLIDE 11
Neighbour Replication: operation
SLIDE 12
Neighbour Replication: operation
SLIDE 13
Neighbour Replication: operation
SLIDE 14
Multi-Publication
Each object is attributed R different identifiers to be stored by R different nodes.
◮ Better load balancing ◮ Reduced correlated failures ◮ Expensive overlay maintenance: each object has a different set
- f replicas
◮ Expensive replication: data is moved to respect topological
constraints
◮ Not resilient under churn: each node acts on its own
SLIDE 15
Multi-Publication
Each object is attributed R different identifiers to be stored by R different nodes.
◮ Better load balancing ◮ Reduced correlated failures ◮ Expensive overlay maintenance: each object has a different set
- f replicas
◮ Expensive replication: data is moved to respect topological
constraints
◮ Not resilient under churn: each node acts on its own
SLIDE 16
Current DHTs
Based on structured networks
Characterized by:
◮ Nodes with fixed positions in the overlay ◮ Static replication degree ◮ Poor performance under churn
SLIDE 17
Main challenges
Challenges:
- 1. Increase churn resilience
- 2. Minimize replication costs
- 3. Improve load balancing
SLIDE 18
Outline
Introduction Our approach Evaluation Conclusions
SLIDE 19
Our approach: Architecture overview
◮ Ring-based overlay: Composed of virtual nodes
SLIDE 20
Our approach: Architecture overview
◮ Ring-based overlay: Composed of virtual nodes
SLIDE 21
Our approach: Dynamic topology overview
SLIDE 22
Our approach: Dynamic topology overview
SLIDE 23
Our approach: Dynamic topology overview
SLIDE 24
Our approach: Dynamic topology overview
SLIDE 25
Our approach: Dynamic topology overview
SLIDE 26
Our approach: Dynamic topology overview
SLIDE 27
Our approach: Dynamic topology overview
SLIDE 28
Our approach: beating the challenges
- 1. Increase churn resilience: unstructured networks
- 2. Minimize replication costs: variable replication degree
- 3. Improve load balancing: dynamic key distribution
SLIDE 29
Our approach: beating the challenges
- 1. Increase churn resilience: unstructured networks
- 2. Minimize replication costs: variable replication degree
- 3. Improve load balancing: dynamic key distribution
SLIDE 30
Increasing churn resilience
◮ Ring maintained through gossip mechanisms
SLIDE 31
Increasing churn resilience
◮ Gossip to keep virtual node membership up-to-date
SLIDE 32
Increasing churn resilience
◮ Gossip to trade connections between virtual nodes
SLIDE 33
Increasing churn resilience
SLIDE 34
Increasing churn resilience
SLIDE 35
Increasing churn resilience
SLIDE 36
Increasing churn resilience
SLIDE 37
Increasing churn resilience
SLIDE 38
Our approach: beating the challenges
- 1. Increase churn resilience: unstructured networks
- 2. Minimize replication costs: variable replication degree
- 3. Improve load balancing: dynamic key distribution
SLIDE 39
Minimizing replication costs: node failure
◮ Variable replication degree: No data movement on failure
SLIDE 40
Minimizing replication costs: node failure
◮ Variable replication degree: No data movement on failure
SLIDE 41
Minimizing replication costs: node failure
◮ Variable replication degree: No data movement on failure
SLIDE 42
Minimizing replication costs: node join
◮ Nodes can select where to join: may join recently-failed virtual
nodes
SLIDE 43
Minimizing replication costs: node join
◮ Nodes can select where to join: may join recently-failed virtual
nodes
SLIDE 44
Minimizing replication costs: node join
◮ Nodes can select where to join: may join recently-failed virtual
nodes
SLIDE 45
Minimizing replication costs: node join
◮ New nodes can replace failed nodes: Blue’s data was moved
- nly once and never discarded
SLIDE 46
Minimizing replication costs: node join
◮ New nodes can replace failed nodes: Blue’s data was moved
- nly once and never discarded
SLIDE 47
Minimizing replication costs: node join
◮ New nodes can replace failed nodes: Blue’s data was moved
- nly once and never discarded
SLIDE 48
Our approach: beating the challenges
- 1. Increase churn resilience: unstructured networks
- 2. Minimize replication costs: variable replication degree
- 3. Improve load balancing: dynamic key distribution
SLIDE 49
Improving replication costs: creating dynamic key distribution
◮ Virtual nodes store a number of keys proportional to their
size: Blue’s data is split proportionally by its children
SLIDE 50
Improving replication costs: creating dynamic key distribution
◮ Virtual nodes store a number of keys proportional to their
size: Blue’s data is split proportionally by its children
SLIDE 51
Improving replication costs: creating dynamic key distribution
◮ Virtual nodes store a number of keys proportional to their
size: Blue’s data is split proportionally by its children
SLIDE 52
Outline
Introduction Our approach Evaluation Conclusions
SLIDE 53
Experimental settings
◮ Overlay simulation in Peersim ◮ 100K Nodes ◮ 50K Keys ◮ Replication degree = 7 ◮ 5M queries
SLIDE 54
Churn resilience
churn=1 churn=10 churn=100 Churn rate 20 40 60 80 100 Objects reachable (%)
Rollerchain Neighbour Multi-Pub
SLIDE 55
Replication costs
churn=1 churn=10 churn=100 Churn rate 20 40 60 80 100 Objects moved per node
Rollerchain Neighbour Multi-Pub
SLIDE 56
Load Balancing
50 100 150 200 250 STDEV of number of queries processed
Rollerchain Neighbour Multi-Pub
SLIDE 57
Outline
Introduction Our approach Evaluation Conclusions
SLIDE 58
Conclusions
◮ DHT based on Virtual Nodes ◮ Designed with replication in mind ◮ Unstructured Networks: Increase churn resilience ◮ Variable replication degree: Minimize replication costs ◮ Dynamic key distribution: Improve load balancing
SLIDE 59
Conclusions
◮ DHT based on Virtual Nodes ◮ Designed with replication in mind ◮ Unstructured Networks: Increase churn resilience ◮ Variable replication degree: Minimize replication costs ◮ Dynamic key distribution: Improve load balancing
SLIDE 60