Time-Evolving Graph Processing at Scale
Anand Iyer#, Li Erran Li+, Tathagata Das*, Ion Stoica#*
#UC Berkeley +Uber Technologies *Databricks
Time-Evolving Graph Processing at Scale Anand Iyer # , Li Erran Li + - - PowerPoint PPT Presentation
Time-Evolving Graph Processing at Scale Anand Iyer # , Li Erran Li + , Tathagata Das * , Ion Stoica #* # UC Berkeley + Uber Technologies * Databricks Motivation Dynamically evolving graphs prevalent in many domains Social networks (e.g.,
Anand Iyer#, Li Erran Li+, Tathagata Das*, Ion Stoica#*
#UC Berkeley +Uber Technologies *Databricks
Dynamically evolving graphs prevalent in many domains
– Social networks (e.g., Twitter, Facebook) – Communication networks (e.g. cellular networks) – Internet-of-Things
– Product recommendations – Network troubleshooting – Real-time ad placement
– GraphX, Girafe, Powergraph, GraphLab, GraphChi, Chaos, …
– Some specialized systems exist. E.g., Kineograph, Chronos, not generic enough
a e d c a b e d c a b e d c f
a e d c a b e d c f a b e d c t1 t2 t3
GraphTau represents time-evolving graphs as a series of consistent graph snapshots
E.g. PageRank: pause iterating, update snapshot, continue iterating
B C A D F E A D D B C D E A A F B C A D F E A D D B C D E A A F
Transition (0.977, 0.968)
(X , Y): X is 10 iteration P ageRank Y is 23 iteration P ageRank After 11 iteration on graph 2, Both converge to 3-digit precision
(0.977, 0.968) (0.571, 0.556) 1.224 0.849 0.502 (2.33, 2.39) 2.07 0.849 0.502 (0.571, 0.556) (0.571, 0.556)
Represents a series of Graph[V,E] snapshots where V = vertices, E = edges
Graph[V,E] @ T = 1 Graph[V,E] @ T = 2 Graph[V,E] @ T = 3 Graph[V,E] @ T = 4
GraphStream[V,E]
class GraphStream { def transform(func: Graph => Graph): GraphStream }
func: User provided function to do bulk operations
allows aggregations over vertices and edges transform: Applies func over each snapshot Graphs in a GraphStream
class GraphStream { def transform(func: Graph => Graph): GraphStream }
T = 1 T = 2 T = 3 T = 4
Original GraphStream Transformed GraphStream func func func func
T = 1 T = 2 T = 3 T = 4
Original GraphStream Windowed GraphStream
class GraphStream { def mergeWindows( aggregationFuncs, windowLength, slidingInterval): GraphStream }
aggregationFuncs windowLen slidingInterval
GraphStream
Apply Pregel iterationFunc until next snapshot is available T = 1
class GraphStream { def StreamingBSP(..., iterationFunc, ...): GraphStream }
Combine previous results with new snaphot, continue iterating T = 2 T = 3 Continue until convergence
Faster convergence than running PageRank from scratch on every snapshot
class GraphStream { def updateLocalState (stateUpdateFunc, initialState): LocalStateStream } GraphStream
T = 1
initialState
T = 2 T = 3
stateUpdateFunc
abstraction
– PageRank – Connected Components
– Twitter follow graph: 41M vertices, ~1.5B edges – Live LTE network: 2M vertices, variable edges
when the graph is streamed in parts
GraphX on whole graph could not converge! GraphTau converged fast when 20% of the graph is streamed at a time Smaller batches lead to faster convergence
connected components
Re-implemented on general system GraphTau
connected components on whole window of snapshots
2 4 6 8 2 4 6 8 10 12
Analysis Time (s) Window Size (m)
Strawman GraphTau CellIQ
GraphTau managed to get performance comparable to specialized system, without domain specific optimizations
Consistent & fault-tolerant snapshot generation Co-ordinate snapshotting and computation Sliding window operations Mix data and graph parallel computations