Tackling Large Graphs with Secondary Storage
Amitabha Roy EPFL
1
Tackling Large Graphs with Secondary Storage Amitabha Roy EPFL 1 - - PowerPoint PPT Presentation
Tackling Large Graphs with Secondary Storage Amitabha Roy EPFL 1 Graphs Social networks Document networks Biological networks Humans, phones, bank accounts 2 Graph are Difficult Graph mining is challenging problem Traversal leads
Amitabha Roy EPFL
1
Social networks Document networks Biological networks Humans, phones, bank accounts
2
3
4
Graph Edges Hardware 1 trillion Tsubame 1 trillion Cray 1 trillion Blue Gene 1 trillion NEC
HPC/Graph500 benchmarks (June 2014)
5
Avery Ching, Facebook @Strata, 2/13/2014 Yes, using 3940 machines
6
7
If I can store the graph then why can’t I process it ?
8
1 2 3 4 6 5
1 2 3 4 5 6
9
RAM SSD Disk 1.4X 20X 200X 2ms seeks on a graph with a trillion edges ~ 1 year !
10
Twitter graph: 20X difference with 32 machines !
11
12
[SOSP’13]
13
3 4 5
Existing computational model
2 6 1
14
3 4 5
Activate vertex
2 6 1
15
3 4 5
Scatter Updates
2 6 1
16
3 4 5
Gather Updates
2 6 1
17
3 4 5 2 6 1 1 2 3 4 5 6
1 → 5 1 → 6 6 → 2 6 → 4 Edges Vertices
18
3 4 5 2 6 1 1 2 3 4 5 6
1 → 5 1 → 6 6 → 2 6 → 4 Edges Vertices
19
1 3 4 6 5
1 → 5 1 → 6
2
6 → 2 6 → 4 SEEK
20
1 3 4 6 5
Scan entire edge list 1 → 5 1 → 6
2
6 → 2 6 → 4 SCAN
21
1 3 4 6 5
Use only necessary edges 1 → 5 1 → 6
2
6 → 2 6 → 4 SCAN
22
Winning Tradeoff !
23
24
1 3 4 6 5
Order oblivious 1 → 5 1 → 6
2
6 → 2 6 → 4 SCAN
25
1 3 4 6 5
1 → 5 1 → 6
2
6 → 2 6 → 4 SCAN
1 2 3 4 5 6
SEEK
26
1 3 4 6 5
1 → 5 1 → 6
2
6 → 2 6 → 4 SCAN
1 2 3 4 5 6
SEEK Seeking in RAM is free ! How can we fit vertices in RAM ?
27
1 → 5 1 → 6 6 → 2 6 → 4
1 2 3 4 5 6
2 → 3
1 3 4 6 5 2
3 → 5 Fits in RAM
28
1 → 5 1 → 6 6 → 2 6 → 4
1 2 3 4 5 6
2 → 3
1 3 4 6 5 2
3 → 5 Load in RAM SCAN
29
30
31
(OSDI’12)
32
Time (seconds) 750 1500 2250 3000 Netflix/ALS Twitter/Pagerank RMAT27/WCC
GraphChi (Sharding) X-Stream (Total time)
33
34
35
BFS
Time (seconds) 0.1 1.0 10.0 100.0 CPUs
1 2 4 8 16
Ligra X-Stream
36
BFS
Time (seconds) 0.1 1.0 10.0 100.0 1000.0 CPUs
1 2 4 8 16
Ligra X-Stream Ligra (setup)
10 billion 100 billion 1 trillion
Powergraph OSDI’12 Ligra PPoPP’12
Edges
X-Stream SOSP’13 1 machine
Pregel SIGMOD’10 300 machines
How do we get further ? Scale out
37
38
Graph partitioning is hard to get right
39
SP SP
Red Blue
40
SP
IDLE IDLE Red Blue
41
SP SP
Stripe data across all disks Allow any machine to access any disk
SP SP
✔Balance Capacity ✔ Balance BW Red Blue
42
SP SP
Stripe data across all disks Allow any machine to access any disk
SP SP
Flat Storage Box Red Blue
43
44
SP SP SP SP
Flat Storage Box Red Blue
45
SP SP
Flat Storage Box Red IDLE IDLE Using only half the available bandwidth
46
47
Scan
Vertices
Scatter/Gather
48
Scan Edges
Vertices
Scatter
49
Scan Edges
Vertices
Scatter Flat Storage Box
Vertices
Scatter machine 1 machine 2
50
Scan Updates
Vertices
Gather Flat Storage Box
Vertices
Gather machine 1 machine 2
51
Vertices Vertices
machine 1 machine 2 Application of updates is commutative
Merge Vertices
No need to go to disk
52
SlipStream graph algorithms = X-Stream graph algorithms + Merge function
53
SP SP
Flat Storage Box Red
54
SP SP
Flat Storage Box Red Copy
55
SP SP
Flat Storage Box Red Red ✔ Back to Full Bandwidth
56
Flat Storage Box Compute Box
57
58
59
Store Block X
60
Where is block X ?
Need a location service f: file, block → machine, offset
61
Store block of updates
62
Give me any block of updates
Streaming is order oblivious !
63
64
SP SP
Red
SP SP
rand() = 1 rand() = 1 Blue
65
66
67
32 GB RAM 200 GB SSD 32 cores 2 TB 5200 RPM
1 32
10 GigE full bisection Rack
68
69
Normalized Wall Time 1 2 3 4 Machines 1 2 4 8 16 32
PR BFS SCC WCC BP MCST Cond. MIS SPMV SSSP
32X problem size at 2.7X cost
70
Normalized Wall Time 1 2 3 4 Machines 1 2 4 8 16 32
PR BFS SCC WCC BP MCST Cond. MIS SPMV SSSP
32X problem size at 2.7X cost Collisions Engineering Loss of sequentiality 0.5X 1X 0.5X
71
72
Metric Value Wall Time 2d 9h MTEPS 5 I/O 282 TB BW 1.53 GB/s
Don’t need supercomputers or very large clusters
73
Metric Value Wall Time 2d 9h MTEPS 5 I/O 282 TB BW 1.53 GB/s
Direct results from unordered edge list
74
System RAM Pre-process Run Powergraph 128 GB 1271s 103s SlipStream 32 GB X 1854s
WCC/RMAT/128M vertices 2B edges/2 machines
Preprocessing your data for locality can take a lot of time !
75
10 billion 100 billion 1 trillion
Powergraph OSDI’12 Ligra PPoPP’12
Edges
X-Stream SOSP’13 1 machine
Pregel SIGMOD’10 300 machines
SlipStream 32 machines
How do we get further ? Buy more disks :)
76
77