UNDERSTANDING DISTRIBUTED DATAFLOW John Liagouris - PowerPoint PPT Presentation
UNDERSTANDING DISTRIBUTED DATAFLOW John Liagouris liagos@inf.ethz.ch SYSTEMS OUTPUT EXPLANATION AND PERFORMANCE ANALYSIS Google 3 May 2017 PART I: Why is this record in the output of my distributed dataflow? Concise explanations of
UNDERSTANDING DISTRIBUTED DATAFLOW John Liagouris liagos@inf.ethz.ch SYSTEMS OUTPUT EXPLANATION AND PERFORMANCE ANALYSIS Google 3 May 2017
PART I: Why is this record in the output of my distributed dataflow? ▸ Concise explanations of individual outputs ▸ On-demand output reproduction PART II: Why is my distributed dataflow slow? ▸ Bottleneck detection ▸ Critical path analysis 2
COLLABORATORS Vasiliki Kalavri Ralf Sager Andrea Lattuada Desislava Dimitrova Zaheer Chothia Sebastian Wicki Frank McSherry Moritz Hoffmann Timothy Roscoe 3
THE BIG PICTURE: UNDERSTANDING THE DATACENTER Strymon Enterprise Datacenter event logs ‣ The volume of datacenter logs is huge ‣ Keeping archives is not a viable solution ‣ We can process logs online 4
THE BIG PICTURE: UNDERSTANDING THE DATACENTER Strymon Enterprise Datacenter event logs Strymon is a novel system able to: ‣ Perform deep analytics on thousands of distributed streams of event logs in parallel ‣ Explain its outputs interactively 5
IDEAS IN STRYMON CAN BE GENERALIZED for dataflow systems iterative analytics input stream output stream streaming analytics and different execution models worker 1 synchronous vs asynchronous shared-nothing vs shared-memory worker 2 6
TIMELY DATAFLOW D. Murray, F. McSherry, M. Isard, R. Isaacs, P. Barham, M. Abadi. Naiad: A Timely Dataflow System. In SOSP, 2013. ▸ A steaming framework for data-parallel computations ▸ Cyclic dataflows ▸ Logical timestamps (epochs) ▸ Asynchronous execution ▸ Low latency DIFFERENTIAL DATAFLOW F. McSherry, D. Murray, R. Isaacs, M. Isard. Differential Dataflow . In CIDR, 2013. ▸ A high-level API on top of Timely Dataflow ▸ Incremental computation 7
PART I Why is this record in the output of my distributed dataflow? 8
EXPLANATIONS IN DATABASES COMPUTATION 1 2 3 PROVENANCE 9
THE PROBLEM: OUTPUT EXPLANATION OUTPUT INPUT 10
THE PROBLEM: OUTPUT EXPLANATION THIS RECORD LOOKS WRONG! {App 115 344} {A 115 344} {VM 233 -22} {F 233 122} {App 100 55} {W 100 -95} {VM 333 -124} {V 30 23} … … … … … … OUTPUT INPUT 11
THE PROBLEM: OUTPUT EXPLANATION THIS RECORD LOOKS WRONG! {App 115 344} {A 115 344} {VM 233 -22} {F 233 122} {App 100 55} {W 100 -95} {VM 333 -124} {V 30 23} … … … … … … OUTPUT INPUT 12
THE PROBLEM: OUTPUT EXPLANATION THIS RECORD LOOKS WRONG! {App 115 344} {A 115 344} {VM 233 -22} {F 233 122} {App 100 55} {W 100 -95} {VM 333 -124} {V 30 23} … … … … … … OUTPUT INPUT Output explanation: A subset of the input that is sufficient to reproduce the selected subset of the output 13
ANNOTATION-BASED TECHNIQUES metadata propagation 1 2 3 ▸ Fast ▸ Explode in size 14
INVERSION-BASED TECHNIQUES 1’ 2’ 3’ ▸ Small memory footprint ▸ Not generally applicable 15
BACKWARD TRACING dependencies 1 2 3 ▸ Small memory footprint ▸ Generally applicable ▸ Fast 16
PROBLEM 1: TOO MUCH INFORMATION Use Case: Graph Rechability 2 5 3 1 4 17
PROBLEM 1: TOO MUCH INFORMATION Use Case: Graph Reachability WHY IS (1,3) IN THE OUTPUT? ▸ Record (1,3) appears in the result 2 5 3 1 4 18
PROBLEM 1: TOO MUCH INFORMATION Use Case: Graph Reachability WHY IS (1,3) IN THE OUTPUT? ▸ Record (1,3) appears in the result 2 ▸ Naive backward tracing returns as an explanation all 5 3 1 4 edges of the graph 19
PROBLEM 1: TOO MUCH INFORMATION Use Case: Graph Reachability WHY IS (1,3) IN THE OUTPUT? ▸ Record (1,3) appears in the result 2 ▸ Naive backward tracing returns as an explanation all 5 3 1 4 edges of the graph ▸ A shortest path suffices 20
PROBLEM 2: NOT ENOUGH INFORMATION Use Case: Word Set Difference THE QUICK A BROWN FOX … THE LAZY DOG B … 21
PROBLEM 2: NOT ENOUGH INFORMATION Use Case: Word Set Difference WHY ONLY 3 WORDS ARE ▸ Record (doc A, 3 unique words) UNIQUE TO DOCUMENT A? appears in the result THE QUICK A BROWN FOX (doc A, 3 unique words) … THE LAZY DOG B (doc B, 2 unique words) … 22
PROBLEM 2: NOT ENOUGH INFORMATION Use Case: Word Set Difference WHY ONLY 3 WORDS ARE ▸ Record (doc A, 3 unique words) UNIQUE TO DOCUMENT A? appears in the result THE QUICK A BROWN FOX (doc A, 3 unique words) … ▸ Naive backward tracing returns as an explanation only the words of doc A THE LAZY DOG B (doc B, 2 unique words) … 23
PROBLEM 2: NOT ENOUGH INFORMATION Use Case: Word Set Difference WHY ONLY 3 WORDS ARE ▸ Record (doc A, 3 unique words) UNIQUE TO DOCUMENT A? appears in the result THE QUICK A BROWN FOX (doc A, 3 unique words) … ▸ Naive backward tracing returns as an explanation only the words of doc A THE LAZY DOG B (doc B, 2 unique words) … ▸ We also need the words of doc B to reproduce the record (doc A, 3 unique words) 24
CAN WE SOLVE BOTH PROBLEMS? Yes! Given that the system is able to: ▸ Keep track of the exact point in the computation a data record was produced ▸ Detect divergent records when replaying the computation on a subset of the input We exploit the main features of Differential Dataflow 25
EXPLANATIONS WITH DIFFERENTIAL DATAFLOW Op B Original INPUT OUTPUT Op A Op C dataflow: 26
EXPLANATIONS WITH DIFFERENTIAL DATAFLOW Op B Original INPUT OUTPUT Op A Op C dataflow: Join Explanation INPUT OUTPUT Join Join dataflow: Augment the original dataflow with a shadow dataflow 27
ITERATIVE BACKWARD TRACING Op B Original INPUT OUTPUT Op A Op C dataflow: Join Explanation EXPL QUERY Join Join dataflow: 28
ITERATIVE BACKWARD TRACING Op B Original INPUT OUTPUT Op A Op C dataflow: Trace Backwards Join Explanation EXPL QUERY Join Join dataflow: 29
ITERATIVE BACKWARD TRACING Op B Original INPUT OUTPUT Op A Op C dataflow: Compare Replay Join Explanation EXPL QUERY Join Join dataflow: 30
ITERATIVE BACKWARD TRACING Op B Original INPUT OUTPUT Op A Op C dataflow: k1 v k2 v’ … … k1 v k2 v’’ … … Trace divergent records backwards Join Explanation EXPL QUERY Join Join dataflow: 31
ITERATIVE BACKWARD TRACING Op B Original INPUT OUTPUT Op A Op C dataflow: Compare Replay again (for the new records) Join Explanation EXPL QUERY Join Join dataflow: 32 Repeat until a fix-point
EXAMPLE: EXPLAINING OUTPUTS OF WORD SET DIFFERENCE THE QUICK A BROWN FOX … THE LAZY DOG B … 33 33
EXAMPLE: EXPLAINING OUTPUTS OF WORD SET DIFFERENCE (THE, A) THE QUICK A (BROWN, A) BROWN FOX MAP … (FOX, A) (THE, B) THE LAZY DOG B (LAZY, B) MAP … (DOG, B) 34
EXAMPLE: EXPLAINING OUTPUTS OF WORD SET DIFFERENCE (THE, A) (THE, [A,B]) THE QUICK A (BROWN, A) BROWN FOX (BROWN, A) MAP … (FOX, A) (FOX, A) GROUP (LAZY, B) (DOG,B) (THE, B) THE LAZY DOG B (LAZY, B) MAP … (DOG, B) 35
EXAMPLE: EXPLAINING OUTPUTS OF WORD SET DIFFERENCE (THE, A) (THE, [A,B]) THE QUICK A (BROWN, A) BROWN FOX (BROWN, A) MAP … (FOX, A) (FOX, A) GROUP (LAZY, B) (DOG,B) (THE, B) THE LAZY DOG B (LAZY, B) MAP … (DOG, B) FILTER (BROWN,A) (FOX,A) (LAZY,B) (DOG,B) 36
EXAMPLE: EXPLAINING OUTPUTS OF WORD SET DIFFERENCE (THE, A) (THE, [A,B]) THE QUICK A (BROWN, A) BROWN FOX (BROWN, A) MAP … (FOX, A) (FOX, A) GROUP (LAZY, B) (DOG,B) (THE, B) THE LAZY DOG B (LAZY, B) MAP … (DOG, B) FILTER (BROWN,A) (FOX,A) (A, 3) (LAZY,B) GROUP (B, 2) (DOG,B) 37
EXAMPLE: EXPLAINING OUTPUTS OF WORD SET DIFFERENCE (THE, A) (THE, [A,B]) THE QUICK A (BROWN, A) BROWN FOX (BROWN, A) MAP … (FOX, A) (FOX, A) GROUP (LAZY, B) (DOG,B) (THE, B) THE LAZY DOG B (LAZY, B) MAP … (DOG, B) FILTER (BROWN,A) (FOX,A) (A, 3) (LAZY,B) GROUP (B, 2) (DOG,B) 38
EXAMPLE: EXPLAINING OUTPUTS OF WORD SET DIFFERENCE (THE, A) (THE, A) (THE, [A,B]) THE QUICK A (BROWN, A) BROWN FOX (BROWN, A) MAP … (FOX, A) (FOX, A) GROUP (LAZY, B) (DOG,B) (THE, B) THE LAZY DOG B (LAZY, B) MAP … (DOG, B) FILTER (BROWN,A) (FOX,A) (A, 3) (LAZY,B) GROUP (B, 2) (DOG,B) 39
EXAMPLE: EXPLAINING OUTPUTS OF WORD SET DIFFERENCE (THE, A) (THE, A) (THE, [A,B]) THE QUICK A (BROWN, A) BROWN FOX (BROWN, A) MAP … (FOX, A) (FOX, A) GROUP (LAZY, B) (DOG,B) (THE, B) THE LAZY DOG B (LAZY, B) MAP … (DOG, B) FILTER (BROWN,A) (FOX,A) (A, 3) (LAZY,B) GROUP (B, 2) (DOG,B) 40
RESULTS: EXPLAINING CONNECTED COMPONENTS ▸ Dataset: A subset of the Twitter graph with 1B edges ▸ Algorithm: Label propagation ▸ Output: Records of the form (A,B) denoting that nodes A and B belong to the same connected component ▸ System used: Differential Dataflow ▸ Machine used: Intel Xeon E5-4640 at 2.4GHz with 32 cores and 500G RAM More results: Z. Chothia, J. Liagouris, F. McSherry, T. Roscoe Explaining Outputs in Modern Data Analytics PVDLB 9(12):1137-1148, 2016. 41
EXPLAINING CONNECTED COMPONENTS Twitter 42
PART II Why is my distributed dataflow slow? 43
DISTRIBUTED DATAFLOWS client scheduler Apache Flink W1 Naiad W1 44
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.