Wide Area Placement of Data Replicas for Fast and Highly Available - - PowerPoint PPT Presentation

wide area placement of data replicas
SMART_READER_LITE
LIVE PREVIEW

Wide Area Placement of Data Replicas for Fast and Highly Available - - PowerPoint PPT Presentation

Wide Area Placement of Data Replicas for Fast and Highly Available Data Access Fan Ping Xiaohu Li, Christopher McConnell Rohini Vabbalareddy, Jeong-Hyon Hwang State University of New York - Albany Outline Background Network Coordinate


slide-1
SLIDE 1

Fan Ping

Xiaohu Li, Christopher McConnell Rohini Vabbalareddy, Jeong-Hyon Hwang

State University of New York - Albany

Wide Area Placement of Data Replicas for Fast and Highly Available Data Access

slide-2
SLIDE 2

Outline

  • Background
  • Network Coordinate System
  • Data Replication
  • Data Replication for Performance
  • Data Replication for Performance and Availability
  • Conclusion
slide-3
SLIDE 3

Outline

  • Background
  • Network Coordinate System
  • Data Replication
  • Data Replication for Performance
  • Data Replication for Performance and Availability
  • Conclusion
slide-4
SLIDE 4

Data Intensive Distributed Systems

  • Google, Amazon, Facebook, Microsoft…
slide-5
SLIDE 5

Data Intensive Distributed Systems

  • Google, Amazon, Facebook, Microsoft…
  • Dynamo, Cassandra, PNUTS…
slide-6
SLIDE 6

Data Replica Placement

  • Given a replication degree (e.g., 3), where

should we put those data replicas in order to effectively improve the overall data access speed and availability?

  • Challenges
  • Scalability
  • Certain SLA
slide-7
SLIDE 7

Outline

  • Background
  • Network Coordinate System
  • Data Replication
  • Data Replication for Performance
  • Data Replication for Performance and Availability
  • Conclusion
slide-8
SLIDE 8

Network Coordinate Systems

  • Based on the network latencies between each other,

nodes are embedded into a virtual space so that their distances in this virtual space are close to the network latencies.

  • E.g., Vivaldi, RNP
slide-9
SLIDE 9

Network Coordinate Systems

slide-10
SLIDE 10

Network Coordinate Systems

slide-11
SLIDE 11

Network Coordinate Systems

slide-12
SLIDE 12

Network Coordinate Systems

slide-13
SLIDE 13

Network Coordinate Systems

slide-14
SLIDE 14

Network Coordinate Systems

slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21

Outline

  • Background
  • Network Coordinate System
  • Data Replication
  • Data Replication for Performance
  • Data Replication for Performance and Availability
  • Conclusion
slide-22
SLIDE 22

Outline

  • Background
  • Network Coordinate System
  • Data Replication
  • Data Replication for Performance
  • Data Replication for Performance and Availability
  • Conclusion
slide-23
SLIDE 23

Servers on the map

slide-24
SLIDE 24

Servers in the coordinate system

slide-25
SLIDE 25

Clients in the coordinate system

slide-26
SLIDE 26

Cluster the clients in the coordinate system

slide-27
SLIDE 27

Cluster the clients in the coordinate system

slide-28
SLIDE 28

Centroids of the clusters

slide-29
SLIDE 29

Servers near centroids of the clusters

slide-30
SLIDE 30

Simulation Settings

  • Java simulator
  • ~200 Planetlab-node trace as input
  • A certain number of nodes are selected as

servers

  • The other nodes are used as clients
slide-31
SLIDE 31

Performance VS. Number of Replicas

slide-32
SLIDE 32

Outline

  • Background
  • Network Coordinate System
  • Data Replication
  • Data Replication for Performance
  • Data Replication for Performance and Availability
  • Conclusion
slide-33
SLIDE 33

Conditional Failure vs. Angle

C R S1 S2

slide-34
SLIDE 34

Conditional Failure vs. Angle

C R S1 S2

0.05% 0.05%

slide-35
SLIDE 35

Conditional Failure vs. Angle

C R S1 S2

0.05% 0.05%

The conditional probability of the failure of (C, R, S2) given the failure of (C,R,S1) is more than

50%!!

slide-36
SLIDE 36

Conditional Failure vs. Angle

C S1 S2

Ɵ

(0,-20) (-10,50) (30,35)

slide-37
SLIDE 37

Conditional Failure vs. Angle

slide-38
SLIDE 38

Conditional Failure vs. Angle

slide-39
SLIDE 39

Conditional Failure vs. Angle

Ɵ

slide-40
SLIDE 40

Conditional Failure vs. Angle

slide-41
SLIDE 41

Estimations for Latency and Availability

  • Per-client latency

L(c, S) = dist(c,s)

  • Per-client availability

A(c,S) = 1- ( F(c,S1)*F(c,S2|S1)*…*F(c,Sr|Sr-1) )

  • Utility function to combine latency and

availability

U =

𝑩 𝑴

slide-42
SLIDE 42

Simulation Settings

  • Java simulator
  • Traceroute and ping data collected from ~

100 PlanetLab nods for a month

  • Randomly select some nodes as servers
  • The rest are clients
slide-43
SLIDE 43

Unavailability vs. Number of Replicas

slide-44
SLIDE 44

Outline

  • Background
  • Network Coordinate System
  • Data Replication
  • Data Replication for Performance
  • Data Replication for Performance and Availability
  • Conclusion
slide-45
SLIDE 45

Conclusion and Future Work

  • Improves the average user access latency by 35%
  • Improves the overall availability
  • Designs the utility function to take into account both

latency and availability

  • Needs more realistic dataset
  • Better utility function
  • Non-exponential algorithm (Greedy…)
  • Take inter-datacenter cost into account
slide-46
SLIDE 46

THANK YOU!

Questions?