Guoxin Liu, Haiying Shen and Haoyu Wang Holcombe Department of Electrical and Computer Engineering Clemson University Presented by Haoyu Wang
Computing Load Aware and Long-View Load Balancing for Cluster - - PowerPoint PPT Presentation
Computing Load Aware and Long-View Load Balancing for Cluster - - PowerPoint PPT Presentation
Computing Load Aware and Long-View Load Balancing for Cluster Storage Systems Guoxin Liu, Haiying Shen and Haoyu Wang Holcombe Department of Electrical and Computer Engineering Clemson University Presented by Haoyu Wang Outline 1,
Outline
1, Introduction 2, System design 3, Performance evaluation 4, Conclusion
Introduction
Background (Clemson Palmetto Clusters)
Load Balancing Problem
I/O load Data storage …... Why not consider the computing workload?
Introduction
Previous work
Previous work
- Challenge for load
balancing
– Data locality – Task delay – Long-term load balance – Cost-efficient & scalable
- Related work
– Random data allocation – Balancing the number
- f data blocks
– Balancing the I/O load
System Design
Main contribution
1, Trace analysis on computing workloads 2, Computing load aware long-view load balancing method 3, Trace-driven experiments
System Design
Trace Data Analysis
Click to edit subtitle style
0% 20% 40% 60% 80% 100% 1 10 100 1000
CDF Task running time (s)
System Design
Trace Data Analysis
Click to edit subtitle style
0% 20% 40% 60% 80% 100% 1 10 100 1000
CDF Task running time (s)
0% 20% 40% 60% 80% 100% 20000 40000 60000
CDF Number of currently submitted tasks
System Design
Trace Data Analysis
Click to edit subtitle style
0% 20% 40% 60% 80% 100% 1 10 100 1000
CDF Task running time (s)
0% 20% 40% 60% 80% 100% 20000 40000 60000
CDF Number of currently submitted tasks
80% 85% 90% 95% 100% 10000 20000 30000 40000
CDF Number of currently submitted tasks from different jobs
System Design
Trace Data Analysis
Click to edit subtitle style
0% 20% 40% 60% 80% 100% 1 10 100 1000
CDF Task running time (s)
0% 20% 40% 60% 80% 100% 20000 40000 60000
CDF Number of currently submitted tasks
80% 85% 90% 95% 100% 10000 20000 30000 40000
CDF Number of currently submitted tasks from different jobs
0% 20% 40% 60% 80% 100% 10 20 30 40
CDF
- Num. of data transmissions of a server
System Design
Trace Data Analysis
Click to edit subtitle style
0% 20% 40% 60% 80% 100% 1 10 100 1000
CDF Task running time (s)
0% 20% 40% 60% 80% 100% 20000 40000 60000
CDF Number of currently submitted tasks
80% 85% 90% 95% 100% 10000 20000 30000 40000
CDF Number of currently submitted tasks from different jobs
0% 20% 40% 60% 80% 100% 10 20 30 40
CDF
- Num. of data transmissions of a server
0% 20% 40% 60% 80% 100% 20000 40000 60000 80000
CDF Waiting time of a task (s)
System Design
CALV System Overview
Coefficient-based data reallocation
Principle 1: The data blocks contributing more computing workloads at more
- verloaded epochs in the spatial space and temporal space have a
higher priority to be selected to reallocate Principle2: Among all data blocks contributing workloads at an overloaded epoch, the data blocks contribute less workload at more underloaded epochs have a higher priority to be selected to reallocate.
System Design
CALV System Overview
Coefficient-based data reallocation
Si
(a) Reduce num. of reported data blocks in spatial space (b) Reduce num. of reported data blocks in temporal space
: Computing capacity of the server e1 e2 e3 d1 d2 d3 d1 d2 d6 d5 Sj e1 e2 e3 d1 d2 d3 d5 d7 d6 d4 Sk e1 e2 e3 d1 d2 d3 d2 d3 d2 d4 d3
(c) Avoid server underload
Selection of data block to reallocate
System Design
CALV System Overview
Lazy Data Block Transmission
Lazy data block transmission
Si : Computing capacity e1 e2 e3 d1 d3 d1 d2 d1 d2 d3 d1 d2 d3 e4 Sj e1 e2 e3 d5 d4 d5 d5 d5 e4
Performance Evaluation
Trace-driven experiments
Simulated environment:
3000 servers with typical fat-tree topology. 8 computing slots for each server Epoch set to 1 second Comparison method: Random, Sierra, Ursa, CA
Performance Evaluation
Trace-driven experiments Performance of Data locality
20 40 60 80 100 120 0.5 0.75 1 1.25 1.5
% of network load compared to Random x times of num. of jobs
Random Sierra Ursa CA CALV 20 40 60 80 100 120 0.5 0.75 1 1.25 1.5
% of network load compared to Random x times of num. of jobs
Random Sierra Ursa CA CALV
Performance Evaluation
Trace-driven experiments Performance of Task Latency
10 20 30 40 50 0.5 0.75 1 1.25 1.5
Reduced avg. latency per task (s) x times of num. of jobs
Random=0 Sierra Ursa CA CALV 10 20 30 40 50 0.5 0.75 1 1.25 1.5
Reduced avg. latency per task (s) x times of num. of jobs
Random=0 Sierra Ursa CA CALV
Performance Evaluation
Trace-driven experiments Performance of Cost-Efficiency
0.E+0 5.E+6 1.E+7 2.E+7 2.E+7 3.E+7 0.5 0.75 1 1.25 1.5
- Num. of reported
blocks x times of num. of jobs
CALV CALV-MAX CALV-Random CALV-All 1 4 16 64 256 1024 0.5 0.75 1 1.25 1.5
x times of num. of jobs
Saved % of network load Saved % of peak num. of reallocated blocks Reduced num. of overloads (*20)
Performance of Lazy Data transmission
Conclusion
Conclusion The importance of considering the computing workloads CALV is cost-efficient and could get long-term load balance
The End