Clarinet: WAN-Aware Optimization for Analyt ytics Queries Raajay - - PowerPoint PPT Presentation

clarinet wan aware optimization for
SMART_READER_LITE
LIVE PREVIEW

Clarinet: WAN-Aware Optimization for Analyt ytics Queries Raajay - - PowerPoint PPT Presentation

Clarinet: WAN-Aware Optimization for Analyt ytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1 Overview Web apps hosted on multiple DCs Low latency access to end-user 2 Overview Web apps hosted on


slide-1
SLIDE 1

Clarinet: WAN-Aware Optimization for Analyt ytics Queries

Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella

1

slide-2
SLIDE 2

Overview

2

  • Web apps hosted on multiple DCs  Low latency access to end-user
slide-3
SLIDE 3

Overview

2

  • Web apps hosted on multiple DCs  Low latency access to end-user
slide-4
SLIDE 4

Overview

2

  • Web apps hosted on multiple DCs  Low latency access to end-user
slide-5
SLIDE 5

Overview

2

  • Web apps hosted on multiple DCs  Low latency access to end-user
slide-6
SLIDE 6

Overview

2

  • Web apps hosted on multiple DCs  Low latency access to end-user
  • Need efficient methods to analyze data located in multiple data centers
slide-7
SLIDE 7

Centralized Aggregation is Wasteful

3

slide-8
SLIDE 8

Centralized Aggregation is Wasteful

3

Intra-data center Analytics Framework

SELECT * … FROM .. WHERE .. ;

slide-9
SLIDE 9

Centralized Aggregation is Wasteful

3

Intra-data center Analytics Framework

SELECT * … FROM .. WHERE .. ;

  • Available WAN bandwidth is limited  Aggregation latency overhead

50 100 150 200 250 300 350 400 450 500 1 11 21 31 41 51 61 71 81

Bandwidth (Mbps) Directional WAN links sorted by bandwidth Measured pairwise bandwidth between EC2 regions

450 Mbps 20 Mbps

slide-10
SLIDE 10

Centralized Aggregation is Wasteful

3

Intra-data center Analytics Framework

SELECT * … FROM .. WHERE .. ;

  • Available WAN bandwidth is limited  Aggregation latency overhead
  • WAN links are expensive  High data transfer cost

$$$$ $$$$ $$$

slide-11
SLIDE 11

Geo-distributed Analytics

4

slide-12
SLIDE 12

Geo-distributed Analytics

4

Analytics framework

One Logical Datacenter

slide-13
SLIDE 13

Geo-distributed Analytics

4

Analytics framework

One Logical Datacenter

Distributed Storage Layer

slide-14
SLIDE 14

Geo-distributed Analytics

4

Analytics framework

One Logical Datacenter

Distributed Storage Layer Distributed Execution Layer

SELECT * … FROM .. WHERE .. ;

Query Optimizer

Multi-stage parallelizable jobs

slide-15
SLIDE 15

Geo-distributed Analytics

4

Analytics framework

One Logical Datacenter

Distributed Storage Layer Distributed Execution Layer

SELECT * … FROM .. WHERE .. ;

Query Optimizer

Multi-stage parallelizable jobs Geo-distributed

Requires WAN-aware

  • ptimization
slide-16
SLIDE 16

Geo-distributed Analytics

4

Analytics framework

One Logical Datacenter

Distributed Storage Layer Distributed Execution Layer

SELECT * … FROM .. WHERE .. ;

Query Optimizer

Multi-stage parallelizable jobs Geo-distributed

Requires WAN-aware

  • ptimization

Iridium [SIGCOMM 15] GeoDe [NSDI 15]

slide-17
SLIDE 17

Geo-distributed Analytics

4

Analytics framework

One Logical Datacenter

Distributed Storage Layer Distributed Execution Layer

SELECT * … FROM .. WHERE .. ;

Query Optimizer

Multi-stage parallelizable jobs Geo-distributed

Requires WAN-aware

  • ptimization

Clarinet

slide-18
SLIDE 18

Geo-distributed Analytics

4

Analytics framework

One Logical Datacenter

Distributed Storage Layer Distributed Execution Layer

SELECT * … FROM .. WHERE .. ;

Query Optimizer

Multi-stage parallelizable jobs Geo-distributed

Requires WAN-aware

  • ptimization

Clarinet

2.7x reduction in query runtime

slide-19
SLIDE 19

WAN Aware Query Optimization

T2 DC2 T3 DC3 T1 DC1

5

T1, T2, T3: Tables storing click logs

slide-20
SLIDE 20

WAN Aware Query Optimization

T2 DC2 T3 DC3 T1 DC1

80 Gbps 40 Gbps 100 Gbps

5

T1, T2, T3: Tables storing click logs

slide-21
SLIDE 21

WAN Aware Query Optimization

T2 DC2 T3 DC3 T1 DC1

80 Gbps 40 Gbps 100 Gbps

5

QUERY SELECT T1.user, T1.latency, T2.latency, T3.latency FROM T1, T2, T3 WHERE T1.user == T2.user AND T1.user == T3.user AND T1.device == T2.device == T3.device == “mobile”;

T1, T2, T3: Tables storing click logs

slide-22
SLIDE 22

WAN Aware Query Optimization

T2 DC2 T3 DC3 T1 DC1

80 Gbps 40 Gbps 100 Gbps

5

T2 T3

𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓

QUERY SELECT T1.user, T1.latency, T2.latency, T3.latency FROM T1, T2, T3 WHERE T1.user == T2.user AND T1.user == T3.user AND T1.device == T2.device == T3.device == “mobile”;

T1, T2, T3: Tables storing click logs

slide-23
SLIDE 23

WAN Aware Query Optimization

T2 DC2 T3 DC3 T1 DC1

80 Gbps 40 Gbps 100 Gbps

5

T2 T3

𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓

T1

𝜏𝑁𝑝𝑐𝑗𝑚𝑓

QUERY SELECT T1.user, T1.latency, T2.latency, T3.latency FROM T1, T2, T3 WHERE T1.user == T2.user AND T1.user == T3.user AND T1.device == T2.device == T3.device == “mobile”;

T1, T2, T3: Tables storing click logs

slide-24
SLIDE 24

WAN Aware Query Optimization

T2 DC2 T3 DC3 T1 DC1

80 Gbps 40 Gbps 100 Gbps

5

10 GB 200 GB 200 GB 200 GB

T2 T3

𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓

T1

𝜏𝑁𝑝𝑐𝑗𝑚𝑓

QUERY SELECT T1.user, T1.latency, T2.latency, T3.latency FROM T1, T2, T3 WHERE T1.user == T2.user AND T1.user == T3.user AND T1.device == T2.device == T3.device == “mobile”;

T1, T2, T3: Tables storing click logs

slide-25
SLIDE 25

WAN Aware Query Optimization

T2 DC2 T3 DC3 T1 DC1

80 Gbps 40 Gbps 100 Gbps

5

10 GB 200 GB 200 GB 200 GB

T2 T3

𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓

T1

𝜏𝑁𝑝𝑐𝑗𝑚𝑓

Plan running time: 41 s

QUERY SELECT T1.user, T1.latency, T2.latency, T3.latency FROM T1, T2, T3 WHERE T1.user == T2.user AND T1.user == T3.user AND T1.device == T2.device == T3.device == “mobile”;

T1, T2, T3: Tables storing click logs

slide-26
SLIDE 26

WAN Aware Query Optimization

T2 DC2 T3 DC3 T1 DC1

80 Gbps 40 Gbps 100 Gbps

5

10 GB 200 GB 200 GB 200 GB

T2 T3

𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓

T1

𝜏𝑁𝑝𝑐𝑗𝑚𝑓

40 s 1 s

Plan running time: 41 s

QUERY SELECT T1.user, T1.latency, T2.latency, T3.latency FROM T1, T2, T3 WHERE T1.user == T2.user AND T1.user == T3.user AND T1.device == T2.device == T3.device == “mobile”;

T1, T2, T3: Tables storing click logs

slide-27
SLIDE 27

WAN Aware Query Optimization

T2 DC2 T3 DC3 T1 DC1

80 Gbps 40 Gbps 100 Gbps

5

10 GB 200 GB 200 GB 200 GB

T2 T3

𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓

T1

𝜏𝑁𝑝𝑐𝑗𝑚𝑓

40 s 1 s

Plan running time: 41 s

QUERY SELECT T1.user, T1.latency, T2.latency, T3.latency FROM T1, T2, T3 WHERE T1.user == T2.user AND T1.user == T3.user AND T1.device == T2.device == T3.device == “mobile”;

T1, T2, T3: Tables storing click logs WAN-only bottleneck

slide-28
SLIDE 28

WAN Aware Query Optimization

T2 DC2 T3 DC3 T1 DC1

80 Gbps 40 Gbps 100 Gbps

5

10 GB 200 GB 200 GB 200 GB

T2 T3

𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓

T1

𝜏𝑁𝑝𝑐𝑗𝑚𝑓

40 s 1 s

Plan running time: 41 s

QUERY SELECT T1.user, T1.latency, T2.latency, T3.latency FROM T1, T2, T3 WHERE T1.user == T2.user AND T1.user == T3.user AND T1.device == T2.device == T3.device == “mobile”;

T1, T2, T3: Tables storing click logs

⋈ ⋈

12 GB 200 GB 200 GB

T2 T1 T3

200 GB

𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓

⋈ ⋈

16 GB 200 GB 200 GB

T1 T3 T2

200 GB

𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓

Plan A Plan B Plan C

Plan running time: 20.96 s Plan running time: 17.6 s WAN-only bottleneck

slide-29
SLIDE 29

WAN Aware Query Optimization

T2 DC2 T3 DC3 T1 DC1

80 Gbps 40 Gbps 100 Gbps

5

10 GB 200 GB 200 GB 200 GB

T2 T3

𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓

T1

𝜏𝑁𝑝𝑐𝑗𝑚𝑓

40 s 1 s

Plan running time: 41 s

QUERY SELECT T1.user, T1.latency, T2.latency, T3.latency FROM T1, T2, T3 WHERE T1.user == T2.user AND T1.user == T3.user AND T1.device == T2.device == T3.device == “mobile”;

T1, T2, T3: Tables storing click logs

⋈ ⋈

12 GB 200 GB 200 GB

T2 T1 T3

200 GB

𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓

⋈ ⋈

16 GB 200 GB 200 GB

T1 T3 T2

200 GB

𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓

Plan A Plan B Plan C

Plan running time: 20.96 s Plan running time: 17.6 s Chosen by network agnostic query

  • ptimizer

WAN-only bottleneck

slide-30
SLIDE 30

WAN Aware Query Optimization

T2 DC2 T3 DC3 T1 DC1

80 Gbps 40 Gbps 100 Gbps

5

10 GB 200 GB 200 GB 200 GB

T2 T3

𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓

T1

𝜏𝑁𝑝𝑐𝑗𝑚𝑓

40 s 1 s

Plan running time: 41 s

QUERY SELECT T1.user, T1.latency, T2.latency, T3.latency FROM T1, T2, T3 WHERE T1.user == T2.user AND T1.user == T3.user AND T1.device == T2.device == T3.device == “mobile”;

T1, T2, T3: Tables storing click logs

⋈ ⋈

12 GB 200 GB 200 GB

T2 T1 T3

200 GB

𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓

⋈ ⋈

16 GB 200 GB 200 GB

T1 T3 T2

200 GB

𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓

Plan A Plan B Plan C

Plan running time: 20.96 s Plan running time: 17.6 s Chosen by network agnostic query

  • ptimizer

WAN-only bottleneck

slide-31
SLIDE 31

WAN Aware Query Optimization

T2 DC2 T3 DC3 T1 DC1

80 Gbps 40 Gbps 100 Gbps

5

10 GB 200 GB 200 GB 200 GB

T2 T3

𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓

T1

𝜏𝑁𝑝𝑐𝑗𝑚𝑓

40 s 1 s

Plan running time: 41 s

QUERY SELECT T1.user, T1.latency, T2.latency, T3.latency FROM T1, T2, T3 WHERE T1.user == T2.user AND T1.user == T3.user AND T1.device == T2.device == T3.device == “mobile”;

T1, T2, T3: Tables storing click logs

⋈ ⋈

12 GB 200 GB 200 GB

T2 T1 T3

200 GB

𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓

⋈ ⋈

16 GB 200 GB 200 GB

T1 T3 T2

200 GB

𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓

Plan A Plan B Plan C

Plan running time: 20.96 s Plan running time: 17.6 s Chosen by network agnostic query

  • ptimizer

WAN-only bottleneck

WAN-aware query optimizer that uses network transfer duration to choose query plans

slide-32
SLIDE 32

Outline

  • 1. Motivation
  • 2. Challenges in choosing query plan based on WAN transfer durations
  • 3. Solution
  • Single query
  • Multiple simultaneous queries
  • 4. Experimental Evaluation

6

slide-33
SLIDE 33

Other factors also affect query plan run time

7

80 Gbps 40 Gbps 100 Gbps

T2 DC2 T3 DC3 T1 DC1

slide-34
SLIDE 34

Other factors also affect query plan run time

7

80 Gbps 40 Gbps 100 Gbps

T2 DC2 T3 DC3 T1 DC1

200 GB 200 GB

T1 T2

𝜏𝐷 𝜏𝐷

slide-35
SLIDE 35

Other factors also affect query plan run time

7

80 Gbps 40 Gbps 100 Gbps

T2 DC2 T3 DC3 T1 DC1

200 GB 200 GB

T1 T2

𝜏𝐷 𝜏𝐷

T1 T2

MAP: SELECT MAP: SELECT REDUCE: JOIN

200 GB 200 GB

Map Reduce Job

slide-36
SLIDE 36

Other factors also affect query plan run time

7

80 Gbps 40 Gbps 100 Gbps

T2 DC2 T3 DC3 T1 DC1

200 GB 200 GB

T1 T2

𝜏𝐷 𝜏𝐷 20 s

T1 T2

MAP: SELECT MAP: SELECT REDUCE: JOIN

200 GB 200 GB

Map Reduce Job

Tasks placed in single DC

slide-37
SLIDE 37

Other factors also affect query plan run time

7

80 Gbps 40 Gbps 100 Gbps

T2 DC2 T3 DC3 T1 DC1

200 GB 200 GB

T1 T2

𝜏𝐷 𝜏𝐷 10 s

T1 T2

MAP: SELECT MAP: SELECT REDUCE: JOIN

200 GB 200 GB

Map Reduce Job

Tasks are placed uniformly across DC1 and DC2

slide-38
SLIDE 38

Other factors also affect query plan run time

7

80 Gbps 40 Gbps 100 Gbps

T2 DC2 T3 DC3 T1 DC1

200 GB 200 GB

T1 T2

𝜏𝐷 𝜏𝐷

T1 T2

MAP: SELECT MAP: SELECT REDUCE: JOIN

200 GB 200 GB

Map Reduce Job

  • 1. Plan A:

41 s

  • 2. Plan B:

20.96

  • 3. Plan C:

17.6 s

Tasks are placed uniformly across DC1 and DC2

While evaluating different query plans

slide-39
SLIDE 39

Other factors also affect query plan run time

7

80 Gbps 40 Gbps 100 Gbps

T2 DC2 T3 DC3 T1 DC1

200 GB 200 GB

T1 T2

𝜏𝐷 𝜏𝐷

T1 T2

MAP: SELECT MAP: SELECT REDUCE: JOIN

200 GB 200 GB

Map Reduce Job

  • 1. Plan A:

41 s

  • 2. Plan B:

20.96

  • 3. Plan C:

17.6 s 20.5 s 11.2 s

Tasks are placed uniformly across DC1 and DC2

While evaluating different query plans

slide-40
SLIDE 40

Other factors also affect query plan run time

7

80 Gbps 40 Gbps 100 Gbps

T2 DC2 T3 DC3 T1 DC1

200 GB 200 GB

T1 T2

𝜏𝐷 𝜏𝐷

T1 T2

MAP: SELECT MAP: SELECT REDUCE: JOIN

200 GB 200 GB

Map Reduce Job

  • 1. Plan A:

41 s

  • 2. Plan B:

20.96

  • 3. Plan C:

17.6 s 20.5 s 11.2 s

Tasks are placed uniformly across DC1 and DC2

While evaluating different query plans

Used by high priority application

slide-41
SLIDE 41

Other factors also affect query plan run time

7

80 Gbps 40 Gbps 100 Gbps

T2 DC2 T3 DC3 T1 DC1

200 GB 200 GB

T1 T2

𝜏𝐷 𝜏𝐷

T1 T2

MAP: SELECT MAP: SELECT REDUCE: JOIN

200 GB 200 GB

Map Reduce Job

  • 1. Plan A:

41 s

  • 2. Plan B:

20.96

  • 3. Plan C:

17.6 s 20.5 s 11.2 s

Tasks are placed uniformly across DC1 and DC2

While evaluating different query plans

Used by high priority application

slide-42
SLIDE 42

Other factors also affect query plan run time

7

80 Gbps 40 Gbps 100 Gbps

T2 DC2 T3 DC3 T1 DC1

200 GB 200 GB

T1 T2

𝜏𝐷 𝜏𝐷

T1 T2

MAP: SELECT MAP: SELECT REDUCE: JOIN

200 GB 200 GB

Map Reduce Job

  • 1. Plan A:

41 s

  • 2. Plan B:

20.96

  • 3. Plan C:

17.6 s 20.5 s 11.2 s

Tasks are placed uniformly across DC1 and DC2

While evaluating different query plans

Used by high priority application

Choose query plan based on: 1. Best available task placements

slide-43
SLIDE 43

Other factors also affect query plan run time

7

80 Gbps 40 Gbps 100 Gbps

T2 DC2 T3 DC3 T1 DC1

200 GB 200 GB

T1 T2

𝜏𝐷 𝜏𝐷

T1 T2

MAP: SELECT MAP: SELECT REDUCE: JOIN

200 GB 200 GB

Map Reduce Job

  • 1. Plan A:

41 s

  • 2. Plan B:

20.96

  • 3. Plan C:

17.6 s 20.5 s 11.2 s

Tasks are placed uniformly across DC1 and DC2

While evaluating different query plans

Used by high priority application

Choose query plan based on: 1. Best available task placements 2. Schedule of network transfers

slide-44
SLIDE 44

Joint plan selection, placement and scheduling

8

slide-45
SLIDE 45

Joint plan selection, placement and scheduling

Query Optimizer Multiple query plans (join orders) per query

SELECT * FROM … WHERE.. ;

8

slide-46
SLIDE 46

Joint plan selection, placement and scheduling

Query Optimizer Multiple query plans (join orders) per query

SELECT * FROM … WHERE.. ;

8

Logical plan to physical plan Assign parallelism for each stage

slide-47
SLIDE 47

Joint plan selection, placement and scheduling

Clarinet Query Optimizer Network aware task placement and scheduling for each query plan Multiple query plans (join orders) per query Choose plan with smallest run time for execution

SELECT * FROM … WHERE.. ;

8

Logical plan to physical plan Assign parallelism for each stage

slide-48
SLIDE 48

Joint plan selection, placement and scheduling

Clarinet Query Optimizer Network aware task placement and scheduling for each query plan Multiple query plans (join orders) per query Choose plan with smallest run time for execution

SELECT * FROM … WHERE.. ;

8

Logical plan to physical plan Assign parallelism for each stage

Clarinet binds query to plan lower in the stack

slide-49
SLIDE 49

Network aware placement and scheduling

9

T1 T3 T2

SELECT SELECT SELECT JOIN JOIN

slide-50
SLIDE 50

Network aware placement and scheduling

  • Task placement decided greedily one stage at a time
  • Minimize per stage run time

9

T1 T3 T2

SELECT SELECT SELECT JOIN JOIN

slide-51
SLIDE 51

Network aware placement and scheduling

  • Task placement decided greedily one stage at a time
  • Minimize per stage run time
  • Scheduling of network transfers
  • Determines start times of inter-DC network transfers

9

T1 T3 T2

SELECT SELECT SELECT JOIN JOIN

slide-52
SLIDE 52

Network aware placement and scheduling

  • Task placement decided greedily one stage at a time
  • Minimize per stage run time
  • Scheduling of network transfers
  • Determines start times of inter-DC network transfers
  • Formulate a Binary Integer Linear Program to solve

scheduling

  • Factors transfer dependencies

9

T1 T3 T2

SELECT SELECT SELECT JOIN JOIN

slide-53
SLIDE 53

10

How to extend the late-binding strategy to multiple queries?

slide-54
SLIDE 54

Queries affect each others’ run time

11

80 Gbps 40 Gbps 100 Gbps

T2 DC2 T3 DC3 T1 DC1

slide-55
SLIDE 55

Queries affect each others’ run time

11

80 Gbps 40 Gbps 100 Gbps

T2 DC2 T3 DC3 T1 DC1 QUERY 1 SELECT … device == “mobile” …; QUERY 2 SELECT … genre == “pc” …;

slide-56
SLIDE 56

Queries affect each others’ run time

11

⋈ ⋈

16 GB 200 GB 200 GB

T1 T3 T2

200 GB

𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝐷

80 Gbps 40 Gbps 100 Gbps

T2 DC2 T3 DC3 T1 DC1 QUERY 1 SELECT … device == “mobile” …; QUERY 2 SELECT … genre == “pc” …;

⋈ ⋈

16 GB 200 GB 200 GB

T1 T3 T2

200 GB

𝜏𝑄𝐷 𝜏𝑄𝐷 𝜏𝑆 Same query plan (Plan C) for Query 1 and Query 2

slide-57
SLIDE 57

Queries affect each others’ run time

11

⋈ ⋈

16 GB 200 GB 200 GB

T1 T3 T2

200 GB

𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝐷

80 Gbps 40 Gbps 100 Gbps

T2 DC2 T3 DC3 T1 DC1 QUERY 1 SELECT … device == “mobile” …; QUERY 2 SELECT … genre == “pc” …;

⋈ ⋈

16 GB 200 GB 200 GB

T1 T3 T2

200 GB

𝜏𝑄𝐷 𝜏𝑄𝐷 𝜏𝑆 Same query plan (Plan C) for Query 1 and Query 2 Contention increases query run time

slide-58
SLIDE 58

Queries affect each others’ run time

11

80 Gbps 40 Gbps 100 Gbps

T2 DC2 T3 DC3 T1 DC1 QUERY 1 SELECT … device == “mobile” …; QUERY 2 SELECT … genre == “pc” …;

⋈ ⋈

16 GB 200 GB 200 GB

T1 T3 T2

200 GB

𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝐷

⋈ ⋈

12 GB 200 GB 200 GB

T2 T1 T3

200 GB

𝜏𝑄𝐷 𝜏𝑄𝐷 𝜏𝐷

Different query plans for Query 1 (Plan C) and Query 2 (Plan B)

slide-59
SLIDE 59

Queries affect each others’ run time

11

80 Gbps 40 Gbps 100 Gbps

T2 DC2 T3 DC3 T1 DC1 QUERY 1 SELECT … device == “mobile” …; QUERY 2 SELECT … genre == “pc” …;

⋈ ⋈

16 GB 200 GB 200 GB

T1 T3 T2

200 GB

𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝐷

⋈ ⋈

12 GB 200 GB 200 GB

T2 T1 T3

200 GB

𝜏𝑄𝐷 𝜏𝑄𝐷 𝜏𝐷

Different query plans for Query 1 (Plan C) and Query 2 (Plan B)

No contention of network links

slide-60
SLIDE 60

Queries affect each others’ run time

11

80 Gbps 40 Gbps 100 Gbps

T2 DC2 T3 DC3 T1 DC1 QUERY 1 SELECT … device == “mobile” …; QUERY 2 SELECT … genre == “pc” …;

⋈ ⋈

16 GB 200 GB 200 GB

T1 T3 T2

200 GB

𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝑁𝑝𝑐𝑗𝑚𝑓 𝜏𝐷

⋈ ⋈

12 GB 200 GB 200 GB

T2 T1 T3

200 GB

𝜏𝑄𝐷 𝜏𝑄𝐷 𝜏𝐷

Different query plans for Query 1 (Plan C) and Query 2 (Plan B)

No contention of network links

Choosing execution plans jointly for multiple queries improves performance

slide-61
SLIDE 61

Iterative Shortest Job First

QO

QUERY A QUERY B

QO

QUERY C

QO

  • Best combination  minimize average

completion

  • Computationally intractable

12

slide-62
SLIDE 62

Iterative Shortest Job First

Clarinet QO

QUERY A QUERY B

QO

QUERY C

QO

  • Best combination  minimize average

completion

  • Computationally intractable
  • Iterative Shortest Job First (SJF)

scheduling heuristic

1. Pick shortest physical query plan in each iteration

12

slide-63
SLIDE 63

Iterative Shortest Job First

Clarinet QO

QUERY A QUERY B

QO

QUERY C

QO

  • Best combination  minimize average

completion

  • Computationally intractable
  • Iterative Shortest Job First (SJF)

scheduling heuristic

1. Pick shortest physical query plan in each iteration

10 12 18 5 8 20 30 Iter 1:

12

slide-64
SLIDE 64

Iterative Shortest Job First

Clarinet QO

QUERY A QUERY B

QO

QUERY C

QO

  • Best combination  minimize average

completion

  • Computationally intractable
  • Iterative Shortest Job First (SJF)

scheduling heuristic

1. Pick shortest physical query plan in each iteration

10 12 18 5 8 20 30 Iter 1:

12

slide-65
SLIDE 65

Iterative Shortest Job First

Clarinet QO

QUERY A QUERY B

QO

QUERY C

QO

  • Best combination  minimize average

completion

  • Computationally intractable
  • Iterative Shortest Job First (SJF)

scheduling heuristic

1. Pick shortest physical query plan in each iteration

  • Reserve bandwidth to guarantee

completion time

10 12 18 5 8 20 30 Iter 1: t

12

B1

Link 1 Link 2

5

slide-66
SLIDE 66

Iterative Shortest Job First

Clarinet QO

QUERY A

QUERY B

QO

QUERY C

QO

  • Best combination  minimize average

completion

  • Computationally intractable
  • Iterative Shortest Job First (SJF)

scheduling heuristic

1. Pick shortest physical query plan in each iteration

  • Reserve bandwidth to guarantee

completion time

t

12

B1

Link 1 Link 2

5

slide-67
SLIDE 67

Iterative Shortest Job First

Clarinet QO

QUERY A

QUERY B

QO

QUERY C

QO

  • Best combination  minimize average

completion

  • Computationally intractable
  • Iterative Shortest Job First (SJF)

scheduling heuristic

1. Pick shortest physical query plan in each iteration

  • Reserve bandwidth to guarantee

completion time

15 17 18 25 30 Iter 2: t

12

B1

Link 1 Link 2

5

slide-68
SLIDE 68

Iterative Shortest Job First

Clarinet QO

QUERY A

QUERY B

QO

QUERY C

QO

  • Best combination  minimize average

completion

  • Computationally intractable
  • Iterative Shortest Job First (SJF)

scheduling heuristic

1. Pick shortest physical query plan in each iteration

  • Reserve bandwidth to guarantee

completion time

15 17 18 25 30 Iter 2: t

12

B1

Link 1 Link 2

5

slide-69
SLIDE 69

Iterative Shortest Job First

Clarinet QO

QUERY A

QUERY B

QO

QUERY C

QO

  • Best combination  minimize average

completion

  • Computationally intractable
  • Iterative Shortest Job First (SJF)

scheduling heuristic

1. Pick shortest physical query plan in each iteration

  • Reserve bandwidth to guarantee

completion time

15 17 18 25 30 Iter 2: A1 A2 t

12

B1

Link 1 Link 2

5 15 7

slide-70
SLIDE 70

Avoid fragmentation and improve completion time

13

slide-71
SLIDE 71

Avoid fragmentation and improve completion time

  • SJF & reservation leads to bandwidth fragmentation

13

slide-72
SLIDE 72

Avoid fragmentation and improve completion time

  • SJF & reservation leads to bandwidth fragmentation

13

A1 A2 t B1

Link 1 Link 2

12 10

Scheduled in SJF order

22

slide-73
SLIDE 73

Avoid fragmentation and improve completion time

  • SJF & reservation leads to bandwidth fragmentation

13

A1 A2 t B1

Link 1 Link 2

12 10

Scheduled in SJF order

22 Dominant transfers execute sequentially

slide-74
SLIDE 74

Avoid fragmentation and improve completion time

  • SJF & reservation leads to bandwidth fragmentation

13

Extended idling A1 A2 t B1

Link 1 Link 2

12 10

Scheduled in SJF order

22 Dominant transfers execute sequentially

slide-75
SLIDE 75

Avoid fragmentation and improve completion time

  • SJF & reservation leads to bandwidth fragmentation

13

Alternate schedule with same query plans

t B1 A1 A2 12 2 Extended idling A1 A2 t B1

Link 1 Link 2

12 10

Scheduled in SJF order

22 Dominant transfers execute sequentially

slide-76
SLIDE 76

Avoid fragmentation and improve completion time

  • SJF & reservation leads to bandwidth fragmentation

13

Alternate schedule with same query plans

t B1 A1 A2 12 2 Extended idling A1 A2 t B1

Link 1 Link 2

12 10

Scheduled in SJF order

22 Dominant transfers execute sequentially

Re-arranging transfers resulting in deviation from SJF schedule can help

slide-77
SLIDE 77

k-Shortest Jobs First Heuristic

14

Link 1 Link 2 Link n Offline schedule

t

slide-78
SLIDE 78

k-Shortest Jobs First Heuristic

14

Link 1 Link 2

  • Identify transfers of k-shortest yet incomplete jobs

Link n Offline schedule

t

slide-79
SLIDE 79

k-Shortest Jobs First Heuristic

14

Link 1 Link 2

  • Identify transfers of k-shortest yet incomplete jobs
  • Relax transfer schedule  Start as soon as link is free and task is available

Link n Offline schedule

t

slide-80
SLIDE 80

k-Shortest Jobs First Heuristic

14

Link 1 Link 2

  • Identify transfers of k-shortest yet incomplete jobs
  • Relax transfer schedule  Start as soon as link is free and task is available
  • Best ’k’ Prior observations (or) through offline simulations

Link n Offline schedule

t

slide-81
SLIDE 81

Clarinet Implementation

15

QUERY 1 QUERY 2 QUERY 3

QO QO QO

Existing Query Optimizers Batch of queries

slide-82
SLIDE 82

Clarinet Implementation

15

QUERY 1 QUERY 2 QUERY 3

QO QO QO

Existing Query Optimizers

  • Modified Hive to generate multiple

plans Batch of queries

slide-83
SLIDE 83

Clarinet Implementation

15

QUERY 1 QUERY 2 QUERY 3

QO QO QO

Existing Query Optimizers

  • Modified Hive to generate multiple

plans

  • QOs control set of generated plans
  • Existing optimizations are applied
  • Push down Select
  • Partition pruning

Batch of queries

slide-84
SLIDE 84

Clarinet Implementation

15

QUERY 1 QUERY 2 QUERY 3

QO QO QO

Existing Query Optimizers

  • Modified Hive to generate multiple

plans

  • QOs control set of generated plans
  • Existing optimizations are applied
  • Push down Select
  • Partition pruning

Batch of queries Enforces Clarinet’s schedule

Clarinet

Execution framework

slide-85
SLIDE 85

Clarinet Implementation

15

QUERY 1 QUERY 2 QUERY 3

QO QO QO

Existing Query Optimizers

  • Modified Hive to generate multiple

plans

  • QOs control set of generated plans
  • Existing optimizations are applied
  • Push down Select
  • Partition pruning

Batch of queries Enforces Clarinet’s schedule

  • Modified Tez’s DAGScheduler

Clarinet

Execution framework

slide-86
SLIDE 86

Clarinet Implementation

15

QUERY 1 QUERY 2 QUERY 3

QO QO QO

Existing Query Optimizers

  • Modified Hive to generate multiple

plans

  • QOs control set of generated plans
  • Existing optimizations are applied
  • Push down Select
  • Partition pruning

Batch of queries Online query arrivals Enforces Clarinet’s schedule

  • Modified Tez’s DAGScheduler
  • Fairness guarantees

Clarinet

Execution framework

slide-87
SLIDE 87

16

Evaluation

Compare Clarinet with following GDA approaches:

slide-88
SLIDE 88

16

Evaluation

  • 1. Hive
  • 2. Hive + Iridium
  • 3. Hive + Reducers in single DC

Compare Clarinet with following GDA approaches:

slide-89
SLIDE 89

16

Evaluation

  • 1. Hive
  • 2. Hive + Iridium
  • 3. Hive + Reducers in single DC

: WAN agnostic task placement + scheduling Compare Clarinet with following GDA approaches:

slide-90
SLIDE 90

16

Evaluation

  • 1. Hive
  • 2. Hive + Iridium
  • 3. Hive + Reducers in single DC

: WAN agnostic task placement + scheduling : WAN aware task placement across DCs Compare Clarinet with following GDA approaches:

slide-91
SLIDE 91

16

Evaluation

  • 1. Hive
  • 2. Hive + Iridium
  • 3. Hive + Reducers in single DC

: WAN agnostic task placement + scheduling : WAN aware task placement across DCs : Distributed filtering + central aggregation Compare Clarinet with following GDA approaches:

slide-92
SLIDE 92

16

Evaluation

  • 1. Hive
  • 2. Hive + Iridium
  • 3. Hive + Reducers in single DC

: WAN agnostic task placement + scheduling : WAN aware task placement across DCs : Distributed filtering + central aggregation Compare Clarinet with following GDA approaches:

  • Geo-Distributed Analytics stack across 10 EC2 regions
slide-93
SLIDE 93

16

Evaluation

  • 1. Hive
  • 2. Hive + Iridium
  • 3. Hive + Reducers in single DC

: WAN agnostic task placement + scheduling : WAN aware task placement across DCs : Distributed filtering + central aggregation Compare Clarinet with following GDA approaches:

  • Geo-Distributed Analytics stack across 10 EC2 regions
  • Workload:
  • 30 batches of 12 randomly chosen TPC-DS queries
slide-94
SLIDE 94

Evaluation: Reduction in average completion time

17

GDA Approach

  • Vs. Hive

Average Gains

Clarinet 2.7x Hive + Iridium 1.5x Hive + Reducers in single DC 0.6x

slide-95
SLIDE 95

Evaluation: Reduction in average completion time

17

GDA Approach

  • Vs. Hive

Average Gains

Clarinet 2.7x Hive + Iridium 1.5x Hive + Reducers in single DC 0.6x

Clarinet chooses a different plan for 75% of queries

slide-96
SLIDE 96

Evaluation: Reduction in average completion time

17

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 6 11 16 21 26 31 36 41 46 51 56

CDF

Link ID sorted by bandwidth

WAN bandwidth distribution Hive bytes distribution Clarinet bytes distribution

Data from a single batch 12 queries

GDA Approach

  • Vs. Hive

Average Gains

Clarinet 2.7x Hive + Iridium 1.5x Hive + Reducers in single DC 0.6x

Clarinet chooses a different plan for 75% of queries

slide-97
SLIDE 97

Evaluation: Reduction in average completion time

17

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 6 11 16 21 26 31 36 41 46 51 56

CDF

Link ID sorted by bandwidth

WAN bandwidth distribution Hive bytes distribution Clarinet bytes distribution

Data from a single batch 12 queries

GDA Approach

  • Vs. Hive

Average Gains

Clarinet 2.7x Hive + Iridium 1.5x Hive + Reducers in single DC 0.6x

Clarinet chooses a different plan for 75% of queries

slide-98
SLIDE 98

Evaluation: Reduction in average completion time

17

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 6 11 16 21 26 31 36 41 46 51 56

CDF

Link ID sorted by bandwidth

WAN bandwidth distribution Hive bytes distribution Clarinet bytes distribution

Data from a single batch 12 queries

GDA Approach

  • Vs. Hive

Average Gains

Clarinet 2.7x Hive + Iridium 1.5x Hive + Reducers in single DC 0.6x

Clarinet chooses a different plan for 75% of queries

slide-99
SLIDE 99

18

Evaluation: Optimization overhead

slide-100
SLIDE 100

18

Evaluation: Optimization overhead

  • 1. Generate multiple query plans
  • 2. Iterative multi-query plan selection
slide-101
SLIDE 101

18

Evaluation: Optimization overhead

  • 1. Generate multiple query plans
  • Up to 64 plans in less than 5 s
  • 2. Iterative multi-query plan selection
slide-102
SLIDE 102

18

Evaluation: Optimization overhead

  • 1. Generate multiple query plans
  • Up to 64 plans in less than 5 s
  • 2. Iterative multi-query plan selection
  • Max. 15 s for batches with 12 queries
slide-103
SLIDE 103

18

Evaluation: Optimization overhead

  • 1. Generate multiple query plans
  • Up to 64 plans in less than 5 s
  • 2. Iterative multi-query plan selection
  • Max. 15 s for batches with 12 queries

Insignificant w.r.t. query running times (order of 10’s of minutes)

slide-104
SLIDE 104

Summary

  • WAN-awareness in QO +

cross-layer optimization

19

Distributed Storage Layer Distributed Execution Layer Query Optimizer

Clarinet

slide-105
SLIDE 105

Summary

  • WAN-awareness in QO +

cross-layer optimization

  • Presented a scalable way to

implement multi-query

  • ptimization with minimal
  • verhead

19

Distributed Storage Layer Distributed Execution Layer Query Optimizer

Clarinet

slide-106
SLIDE 106

Summary

  • WAN-awareness in QO +

cross-layer optimization

  • Presented a scalable way to

implement multi-query

  • ptimization with minimal
  • verhead

19

Distributed Storage Layer Distributed Execution Layer Query Optimizer

Clarinet

2.7x

Reduction in average completion time