Rollerchain: a DHT for Efficient Replication IEEE NCA13 Jo ao Paiva - - PowerPoint PPT Presentation

rollerchain a dht for efficient replication
SMART_READER_LITE
LIVE PREVIEW

Rollerchain: a DHT for Efficient Replication IEEE NCA13 Jo ao Paiva - - PowerPoint PPT Presentation

Rollerchain: a DHT for Efficient Replication IEEE NCA13 Jo ao Paiva , Jo ao Leit ao, Lu s Rodrigues Instituto Superior T ecnico / Inesc-ID, Lisboa, Portugal August 22, 2013 Outline Introduction Our approach Evaluation


slide-1
SLIDE 1

Rollerchain: a DHT for Efficient Replication

IEEE NCA’13 Jo˜ ao Paiva, Jo˜ ao Leit˜ ao, Lu´ ıs Rodrigues

Instituto Superior T´ ecnico / Inesc-ID, Lisboa, Portugal

August 22, 2013

slide-2
SLIDE 2

Outline

Introduction Our approach Evaluation Conclusions

slide-3
SLIDE 3

Motivation

◮ Distributed Hash Tables are structured overlays where

nodes organize into a predefined topology that supports routing.

◮ DHTs allow for scalable key-value storage.

slide-4
SLIDE 4

Motivation

◮ In dynamic environments, replication is paramount to

maintaining data.

◮ However, predefined topologies are expensive to maintain in

dynamic environments (churn).

◮ DHTs do not handle churn as well as unstructured networks.

slide-5
SLIDE 5

Motivation

◮ In dynamic environments, replication is paramount to

maintaining data.

◮ However, predefined topologies are expensive to maintain in

dynamic environments (churn).

◮ DHTs do not handle churn as well as unstructured networks.

slide-6
SLIDE 6

Main Approaches to DHT replication

  • 1. Neighbour Replication
  • 2. Multi-Publication
slide-7
SLIDE 7

Neighbour Replication

Each node replicates its data on its R closest neighbours

◮ Good control on replication degree ◮ Simple to locate replicas ◮ Expensive replication: data is moved to respect topological

constraints

◮ Not resilient under churn: each node acts on its own ◮ Poor load balancing: no active mechanisms to balance load

slide-8
SLIDE 8

Neighbour Replication

Each node replicates its data on its R closest neighbours

◮ Good control on replication degree ◮ Simple to locate replicas ◮ Expensive replication: data is moved to respect topological

constraints

◮ Not resilient under churn: each node acts on its own ◮ Poor load balancing: no active mechanisms to balance load

slide-9
SLIDE 9

Neighbour Replication: operation

slide-10
SLIDE 10

Neighbour Replication: operation

slide-11
SLIDE 11

Neighbour Replication: operation

slide-12
SLIDE 12

Neighbour Replication: operation

slide-13
SLIDE 13

Neighbour Replication: operation

slide-14
SLIDE 14

Multi-Publication

Each object is attributed R different identifiers to be stored by R different nodes.

◮ Better load balancing ◮ Reduced correlated failures ◮ Expensive overlay maintenance: each object has a different set

  • f replicas

◮ Expensive replication: data is moved to respect topological

constraints

◮ Not resilient under churn: each node acts on its own

slide-15
SLIDE 15

Multi-Publication

Each object is attributed R different identifiers to be stored by R different nodes.

◮ Better load balancing ◮ Reduced correlated failures ◮ Expensive overlay maintenance: each object has a different set

  • f replicas

◮ Expensive replication: data is moved to respect topological

constraints

◮ Not resilient under churn: each node acts on its own

slide-16
SLIDE 16

Current DHTs

Based on structured networks

Characterized by:

◮ Nodes with fixed positions in the overlay ◮ Static replication degree ◮ Poor performance under churn

slide-17
SLIDE 17

Main challenges

Challenges:

  • 1. Increase churn resilience
  • 2. Minimize replication costs
  • 3. Improve load balancing
slide-18
SLIDE 18

Outline

Introduction Our approach Evaluation Conclusions

slide-19
SLIDE 19

Our approach: Architecture overview

◮ Ring-based overlay: Composed of virtual nodes

slide-20
SLIDE 20

Our approach: Architecture overview

◮ Ring-based overlay: Composed of virtual nodes

slide-21
SLIDE 21

Our approach: Dynamic topology overview

slide-22
SLIDE 22

Our approach: Dynamic topology overview

slide-23
SLIDE 23

Our approach: Dynamic topology overview

slide-24
SLIDE 24

Our approach: Dynamic topology overview

slide-25
SLIDE 25

Our approach: Dynamic topology overview

slide-26
SLIDE 26

Our approach: Dynamic topology overview

slide-27
SLIDE 27

Our approach: Dynamic topology overview

slide-28
SLIDE 28

Our approach: beating the challenges

  • 1. Increase churn resilience: unstructured networks
  • 2. Minimize replication costs: variable replication degree
  • 3. Improve load balancing: dynamic key distribution
slide-29
SLIDE 29

Our approach: beating the challenges

  • 1. Increase churn resilience: unstructured networks
  • 2. Minimize replication costs: variable replication degree
  • 3. Improve load balancing: dynamic key distribution
slide-30
SLIDE 30

Increasing churn resilience

◮ Ring maintained through gossip mechanisms

slide-31
SLIDE 31

Increasing churn resilience

◮ Gossip to keep virtual node membership up-to-date

slide-32
SLIDE 32

Increasing churn resilience

◮ Gossip to trade connections between virtual nodes

slide-33
SLIDE 33

Increasing churn resilience

slide-34
SLIDE 34

Increasing churn resilience

slide-35
SLIDE 35

Increasing churn resilience

slide-36
SLIDE 36

Increasing churn resilience

slide-37
SLIDE 37

Increasing churn resilience

slide-38
SLIDE 38

Our approach: beating the challenges

  • 1. Increase churn resilience: unstructured networks
  • 2. Minimize replication costs: variable replication degree
  • 3. Improve load balancing: dynamic key distribution
slide-39
SLIDE 39

Minimizing replication costs: node failure

◮ Variable replication degree: No data movement on failure

slide-40
SLIDE 40

Minimizing replication costs: node failure

◮ Variable replication degree: No data movement on failure

slide-41
SLIDE 41

Minimizing replication costs: node failure

◮ Variable replication degree: No data movement on failure

slide-42
SLIDE 42

Minimizing replication costs: node join

◮ Nodes can select where to join: may join recently-failed virtual

nodes

slide-43
SLIDE 43

Minimizing replication costs: node join

◮ Nodes can select where to join: may join recently-failed virtual

nodes

slide-44
SLIDE 44

Minimizing replication costs: node join

◮ Nodes can select where to join: may join recently-failed virtual

nodes

slide-45
SLIDE 45

Minimizing replication costs: node join

◮ New nodes can replace failed nodes: Blue’s data was moved

  • nly once and never discarded
slide-46
SLIDE 46

Minimizing replication costs: node join

◮ New nodes can replace failed nodes: Blue’s data was moved

  • nly once and never discarded
slide-47
SLIDE 47

Minimizing replication costs: node join

◮ New nodes can replace failed nodes: Blue’s data was moved

  • nly once and never discarded
slide-48
SLIDE 48

Our approach: beating the challenges

  • 1. Increase churn resilience: unstructured networks
  • 2. Minimize replication costs: variable replication degree
  • 3. Improve load balancing: dynamic key distribution
slide-49
SLIDE 49

Improving replication costs: creating dynamic key distribution

◮ Virtual nodes store a number of keys proportional to their

size: Blue’s data is split proportionally by its children

slide-50
SLIDE 50

Improving replication costs: creating dynamic key distribution

◮ Virtual nodes store a number of keys proportional to their

size: Blue’s data is split proportionally by its children

slide-51
SLIDE 51

Improving replication costs: creating dynamic key distribution

◮ Virtual nodes store a number of keys proportional to their

size: Blue’s data is split proportionally by its children

slide-52
SLIDE 52

Outline

Introduction Our approach Evaluation Conclusions

slide-53
SLIDE 53

Experimental settings

◮ Overlay simulation in Peersim ◮ 100K Nodes ◮ 50K Keys ◮ Replication degree = 7 ◮ 5M queries

slide-54
SLIDE 54

Churn resilience

churn=1 churn=10 churn=100 Churn rate 20 40 60 80 100 Objects reachable (%)

Rollerchain Neighbour Multi-Pub

slide-55
SLIDE 55

Replication costs

churn=1 churn=10 churn=100 Churn rate 20 40 60 80 100 Objects moved per node

Rollerchain Neighbour Multi-Pub

slide-56
SLIDE 56

Load Balancing

50 100 150 200 250 STDEV of number of queries processed

Rollerchain Neighbour Multi-Pub

slide-57
SLIDE 57

Outline

Introduction Our approach Evaluation Conclusions

slide-58
SLIDE 58

Conclusions

◮ DHT based on Virtual Nodes ◮ Designed with replication in mind ◮ Unstructured Networks: Increase churn resilience ◮ Variable replication degree: Minimize replication costs ◮ Dynamic key distribution: Improve load balancing

slide-59
SLIDE 59

Conclusions

◮ DHT based on Virtual Nodes ◮ Designed with replication in mind ◮ Unstructured Networks: Increase churn resilience ◮ Variable replication degree: Minimize replication costs ◮ Dynamic key distribution: Improve load balancing

slide-60
SLIDE 60

Thank you