SLIDE 1 Fan Ping
Xiaohu Li, Christopher McConnell Rohini Vabbalareddy, Jeong-Hyon Hwang
State University of New York - Albany
Wide Area Placement of Data Replicas for Fast and Highly Available Data Access
SLIDE 2 Outline
- Background
- Network Coordinate System
- Data Replication
- Data Replication for Performance
- Data Replication for Performance and Availability
- Conclusion
SLIDE 3 Outline
- Background
- Network Coordinate System
- Data Replication
- Data Replication for Performance
- Data Replication for Performance and Availability
- Conclusion
SLIDE 4 Data Intensive Distributed Systems
- Google, Amazon, Facebook, Microsoft…
SLIDE 5 Data Intensive Distributed Systems
- Google, Amazon, Facebook, Microsoft…
- Dynamo, Cassandra, PNUTS…
SLIDE 6 Data Replica Placement
- Given a replication degree (e.g., 3), where
should we put those data replicas in order to effectively improve the overall data access speed and availability?
- Challenges
- Scalability
- Certain SLA
SLIDE 7 Outline
- Background
- Network Coordinate System
- Data Replication
- Data Replication for Performance
- Data Replication for Performance and Availability
- Conclusion
SLIDE 8 Network Coordinate Systems
- Based on the network latencies between each other,
nodes are embedded into a virtual space so that their distances in this virtual space are close to the network latencies.
SLIDE 9
Network Coordinate Systems
SLIDE 10
Network Coordinate Systems
SLIDE 11
Network Coordinate Systems
SLIDE 12
Network Coordinate Systems
SLIDE 13
Network Coordinate Systems
SLIDE 14
Network Coordinate Systems
SLIDE 15
SLIDE 16
SLIDE 17
SLIDE 18
SLIDE 19
SLIDE 20
SLIDE 21 Outline
- Background
- Network Coordinate System
- Data Replication
- Data Replication for Performance
- Data Replication for Performance and Availability
- Conclusion
SLIDE 22 Outline
- Background
- Network Coordinate System
- Data Replication
- Data Replication for Performance
- Data Replication for Performance and Availability
- Conclusion
SLIDE 23
Servers on the map
SLIDE 24
Servers in the coordinate system
SLIDE 25
Clients in the coordinate system
SLIDE 26
Cluster the clients in the coordinate system
SLIDE 27
Cluster the clients in the coordinate system
SLIDE 28
Centroids of the clusters
SLIDE 29
Servers near centroids of the clusters
SLIDE 30 Simulation Settings
- Java simulator
- ~200 Planetlab-node trace as input
- A certain number of nodes are selected as
servers
- The other nodes are used as clients
SLIDE 31
Performance VS. Number of Replicas
SLIDE 32 Outline
- Background
- Network Coordinate System
- Data Replication
- Data Replication for Performance
- Data Replication for Performance and Availability
- Conclusion
SLIDE 33
Conditional Failure vs. Angle
C R S1 S2
SLIDE 34 Conditional Failure vs. Angle
C R S1 S2
0.05% 0.05%
SLIDE 35 Conditional Failure vs. Angle
C R S1 S2
0.05% 0.05%
The conditional probability of the failure of (C, R, S2) given the failure of (C,R,S1) is more than
50%!!
SLIDE 36 Conditional Failure vs. Angle
C S1 S2
Ɵ
(0,-20) (-10,50) (30,35)
SLIDE 37
Conditional Failure vs. Angle
SLIDE 38
Conditional Failure vs. Angle
SLIDE 39
Conditional Failure vs. Angle
Ɵ
SLIDE 40
Conditional Failure vs. Angle
SLIDE 41 Estimations for Latency and Availability
L(c, S) = dist(c,s)
A(c,S) = 1- ( F(c,S1)*F(c,S2|S1)*…*F(c,Sr|Sr-1) )
- Utility function to combine latency and
availability
U =
𝑩 𝑴
SLIDE 42 Simulation Settings
- Java simulator
- Traceroute and ping data collected from ~
100 PlanetLab nods for a month
- Randomly select some nodes as servers
- The rest are clients
SLIDE 43
Unavailability vs. Number of Replicas
SLIDE 44 Outline
- Background
- Network Coordinate System
- Data Replication
- Data Replication for Performance
- Data Replication for Performance and Availability
- Conclusion
SLIDE 45 Conclusion and Future Work
- Improves the average user access latency by 35%
- Improves the overall availability
- Designs the utility function to take into account both
latency and availability
- Needs more realistic dataset
- Better utility function
- Non-exponential algorithm (Greedy…)
- Take inter-datacenter cost into account
SLIDE 46 THANK YOU!
Questions?