Information-Agnostic Flow Scheduling for Commodity Data Centers Wei - - PowerPoint PPT Presentation

information agnostic flow scheduling for commodity data
SMART_READER_LITE
LIVE PREVIEW

Information-Agnostic Flow Scheduling for Commodity Data Centers Wei - - PowerPoint PPT Presentation

Information-Agnostic Flow Scheduling for Commodity Data Centers Wei Bai , Li Chen, Kai Chen, Dongsu Han (KAIST), Chen Tian (NJU), Hao Wang Sing Group @ Hong Kong University of Science and Technology USENIX NSDI 2015, Oakland, USA 1 Data Center


slide-1
SLIDE 1

Wei Bai, Li Chen, Kai Chen, Dongsu Han (KAIST), Chen Tian (NJU), Hao Wang Sing Group @ Hong Kong University of Science and Technology

Information-Agnostic Flow Scheduling for Commodity Data Centers

1

USENIX NSDI 2015, Oakland, USA

slide-2
SLIDE 2

Data Center Transport

  • Cloud applications

– Desire low latency for short messages

  • Goal: Minimize flow completion time (FCT)

– Many flow scheduling proposals…

2

slide-3
SLIDE 3

The State-of-the-art Solutions

  • PDQ [SIGCOMM’12]
  • pFabric [SIGCOMM’13]
  • PASE [SIGCOMM’14]

All assume prior knowledge of flow size information to approximate ideal preemptive Shortest Job First (SJF) with customized network elements

3

slide-4
SLIDE 4

The State-of-the-art Solutions

  • PDQ [SIGCOMM’12]
  • pFabric [SIGCOMM’13]
  • PASE [SIGCOMM’14]

All assume prior knowledge of flow size information to approximate ideal preemptive Shortest Job First (SJF) with customized network elements

Not feasible for some applications

4

slide-5
SLIDE 5

The State-of-the-art Solutions

  • PDQ [SIGCOMM’12]
  • pFabric [SIGCOMM’13]
  • PASE [SIGCOMM’14]

All assume prior knowledge of flow size information to approximate ideal preemptive Shortest Job First (SJF) with customized network elements

Hard to deploy in practice

5

slide-6
SLIDE 6

Question

Without prior knowledge of flow size information, how to minimize FCT in commodity data centers?

6

slide-7
SLIDE 7

Design Goal 1

Without prior knowledge of flow size information, how to minimize FCT in commodity data centers?

Information-agnostic: not assume a priori knowledge of flow size information available from the applications

7

slide-8
SLIDE 8

Design Goal 2

Without prior knowledge of flow size information, how to minimize FCT in commodity data centers?

FCT minimization: minimize average and tail FCTs of short flows & not adversely affect FCTs of large flows

8

slide-9
SLIDE 9

Design Goal 3

Without prior knowledge of flow size information, how to minimize FCT in commodity data centers?

Readily-deployable: work with existing commodity switches & be compatible with legacy network stacks

9

slide-10
SLIDE 10

Question

Without prior knowledge of flow size information, how to minimize FCT in commodity data centers?

Our answer: PIAS

10

slide-11
SLIDE 11

PIAS’S DESIGN

11

slide-12
SLIDE 12

Design Rationale

  • PIAS performs Multi-Level Feedback Queue

(MLFQ) to emulate Shortest Job First

Priority 1 Priority 2 Priority K

……

High Low

12

slide-13
SLIDE 13

Design Rationale

  • PIAS performs Multi-Level Feedback Queue

(MLFQ) to emulate Shortest Job First

Priority 1 Priority 2 Priority K

……

13

slide-14
SLIDE 14

Simple Example Illustrating PIAS

Congestion

14

slide-15
SLIDE 15

Simple Example Illustrating PIAS

Priority 1 Priority 2 Priority 4 Flow 1 with 10 packets and flow 2 with 2 packets arrive Priority 3

15

slide-16
SLIDE 16

Simple Example Illustrating PIAS

Priority 1 Priority 2 Priority 4 Flow 1 and 2 transmit simultaneously Priority 3

16

slide-17
SLIDE 17

Simple Example Illustrating PIAS

Priority 1 Priority 2 Priority 4 Flow 2 finishes while flow 1 is demoted to priority 2 Priority 3

17

slide-18
SLIDE 18

Simple Example Illustrating PIAS

Priority 1 Priority 2 Priority 4 Flow 3 with 2 packets arrives Priority 3

18

slide-19
SLIDE 19

Simple Example Illustrating PIAS

Priority 1 Priority 2 Priority 4 Flow 3 and 1 transmit simultaneously Priority 3

19

slide-20
SLIDE 20

Simple Example Illustrating PIAS

Priority 1 Priority 2 Priority 4 Flow 3 finishes while flow 1 is demoted to priority 3 Priority 3

20

slide-21
SLIDE 21

Simple Example Illustrating PIAS

Priority 1 Priority 2 Priority 4 Flow 4 with 2 packets arrives Priority 3

21

slide-22
SLIDE 22

Simple Example Illustrating PIAS

Priority 1 Priority 2 Priority 4 Flow 4 and 1 transmit simultaneously Priority 3

22

slide-23
SLIDE 23

Simple Example Illustrating PIAS

Priority 1 Priority 2 Priority 4 Priority 3 Flow 4 finishes while flow 1 is demoted to priority 4

23

slide-24
SLIDE 24

Simple Example Illustrating PIAS

Priority 1 Priority 2 Priority 4 Priority 3 Eventually, flow 1 finishes in priority 4

With MLFQ, PIAS can emulate Shortest Job First without prior knowledge of flow size information

24

slide-25
SLIDE 25

How to implement?

Priority 1 Priority 2 Priority K

……

  • Strict priority queueing on switches
  • Packet tagging as a shim layer at end hosts
  • 𝐿 priorities:

𝑄𝑗 1 ≤ 𝑗 ≤ 𝐿

  • 𝐿 − 1 demotion thresholds:

𝛽𝑘 (1 ≤ 𝑘 ≤ 𝐿 − 1)

  • The threshold to demote priority

from 𝑄

𝑘−1 to 𝑄 𝑘 is 𝛽𝑘−1

25

slide-26
SLIDE 26

How to implement?

Priority 1 Priority 2 Priority K

……

  • Strict priority queueing on switches
  • Packet tagging as a shim layer at end hosts

1

26

slide-27
SLIDE 27

How to implement?

Priority 1 Priority 2 Priority K

……

  • Strict priority queueing on switches
  • Packet tagging as a shim layer at end hosts

27

slide-28
SLIDE 28

How to implement?

Priority 1 Priority 2 Priority K

……

  • Strict priority queueing on switches
  • Packet tagging as a shim layer at end hosts

2

28

slide-29
SLIDE 29
  • Thresholds depend on:

– Flow size distribution – Traffic load

  • Solution:

– Solve a FCT minimization problem to calculate demotion thresholds

  • Problem:

– Traffic is highly dynamic

Determine Thresholds

Traffic variations -> Mismatched thresholds

29

slide-30
SLIDE 30

Impact of Mismatches

High Low

10MB 10MB 20KB

30

slide-31
SLIDE 31
  • When the threshold is perfect (20KB)

Impact of Mismatches

31

slide-32
SLIDE 32
  • When the threshold is too small (10KB)

Impact of Mismatches

Increased latency for short flows

32

slide-33
SLIDE 33
  • When the threshold is too large (1MB)

Impact of Mismatches

Increased latency for short flows

Leverage ECN to keep low buffer occupation

33

slide-34
SLIDE 34
  • When the threshold is too small (10KB)

Handle Mismatches

ECN can keep low latency If we enable ECN

34

slide-35
SLIDE 35
  • When the threshold is too large (1MB)

Handle Mismatches

ECN can keep low latency If we enable ECN

35

slide-36
SLIDE 36

PIAS in 1 Slide

  • PIAS packet tagging

– Maintain flow states and mark packets with priority

  • PIAS switches

– Enable strict priority queueing and ECN

  • PIAS rate control

– Employ Data Center TCP to react to ECN

36

slide-37
SLIDE 37

Testbed Experiments

  • PIAS prototype

– http://sing.cse.ust.hk/projects/PIAS

  • Testbed Setup

– A Gigabit Pronto-3295 switch – 16 Dell servers

  • Benchmarks

– Web search (DCTCP paper) – Data mining (VL2 paper) – Memcached

37

slide-38
SLIDE 38

Small Flows (<100KB)

Web Search Data Mining

Compared to DCTCP, PIAS reduces average FCT of small flows by up to 47% and 45%

47% 45%

38

slide-39
SLIDE 39

NS2 Simulation Setup

  • Topology

– 144-host leaf-spine fabric with 10G/40G links

  • Workloads

– Web search (DCTCP paper) – Data mining (VL2 paper)

  • Schemes

– Information-agnostic: PIAS, DCTCP and L2DCT – Information-aware: pFabric

39

slide-40
SLIDE 40

Overall Performance

Web Search Data Mining

PIAS has an obvious advantage over DCTCP and L2DCT in both workloads.

40

slide-41
SLIDE 41

Small Flows (<100KB)

Simulations confirm testbed experiment results

Web Search Data Mining

40% - 50% improvement

41

slide-42
SLIDE 42

Comparison with pFabric

PIAS only has 4.9% performance gap to pFabric for small flows in data mining workload

Web Search Data Mining

42

slide-43
SLIDE 43

Conclusion

  • PIAS: practical and effective

– Not assume flow information from applications – Enforce Multi-Level Feedback Queue scheduling – Use commodity switches & legacy network stacks Information-agnostic FCT minimization Readily deployable

43

slide-44
SLIDE 44

Thanks!

44

slide-45
SLIDE 45

Starvation

  • Measurement

– 5000 flows, 5.7 million MTU-sized packets – 200 timeouts, 31 two consecutive timeouts

  • Solutions

– Per-port ECN pushes back high priority flows when many low priority flow get starved – Treating a long-term starved flow as a new flow

45

slide-46
SLIDE 46

Persistent Connections

  • Solution: periodically reset flow states based
  • n more behaviors of traffic

– When a flow idles for some time, we reset the bytes sent of this flow to 0. – Define a flow as packets demarcated by incoming packets with payload within a single connection

46