AC DC TCP: Virtual Congestion Control Enforcement for Datacenter - - PowerPoint PPT Presentation

ac dc tcp virtual congestion control enforcement for
SMART_READER_LITE
LIVE PREVIEW

AC DC TCP: Virtual Congestion Control Enforcement for Datacenter - - PowerPoint PPT Presentation

AC DC TCP: Virtual Congestion Control Enforcement for Datacenter Networks Ke Keqiang He He , Eric Rozner, Kanak Agarwal, Yu Gu, Wes Felter, John Carter, Aditya Akella 1 Datacenter Network Congestion Control Congestion is not rare in


slide-1
SLIDE 1

AC⚡DC TCP: Virtual Congestion Control Enforcement for Datacenter Networks

Ke Keqiang He He, Eric Rozner, Kanak Agarwal, Yu Gu, Wes Felter, John Carter, Aditya Akella

1

slide-2
SLIDE 2

Datacenter Network Congestion Control

  • Congestion is not rare in datacenter networks [Singh, SIGCOMM’15]
  • Tail latency is huge
  • 99.9th-tile latency is orders of magnitude higher than the median [Mogul, HotOS’15]
  • Queueing latency is the major contributor [Jang, SIGCOMM’15]
  • New datacenter TCP congestion control schemes have been proposed
  • E.g., DCTCP, TIMELY, DCQCN, TCP-Bolt, ICTCP, etc

2

slide-3
SLIDE 3

But, We Can Not Control VM TCP Stacks

  • In multi-tenant datacenters, admins can not control VM TCP stacks
  • Because VMs are setup and managed by different entities

3

Virtualization Servers Storage Networking Tenant 1 VM Tenant 2 VM Tenant 3 VM

Infrastructure

TCP/IP stack TCP/IP stack TCP/IP stack

Therefore, outdated, inefficient, or misconfigured TCP stacks can be implemented in the VMs. This leads to 2 main problems.

slide-4
SLIDE 4

Problem #1: Large Queueing Latency

4

switch queue sender receiver

No queueing latency, TCP RTT is around 60 to 200 microseconds TCP RTT can reach tens of milliseconds because of packet queueing.

P P P P P P P P

slide-5
SLIDE 5

Problem #2: TCP Unfairness

  • ECN and non-ECN coexistence problem [Judd, NSDI’15]
  • Non-ECN: e.g., CUBIC
  • ECN: e.g., DCTCP

5

slide-6
SLIDE 6

Problem #2: TCP Unfairness (cont.)

  • Different congestion control algorithms lead to unfairness

6

5 flows with different CC algorithms congest a 10G link Dumbbell topology senders receivers

CC: Congestion Control

slide-7
SLIDE 7

AC

  • ⚡DC TCP: Administrator Control over Data

Center TCP

7

Implements TCP congestion control in the Virtual Switch Ensures VM TCP stacks can not impact the network

slide-8
SLIDE 8

AC⚡DC: High Level View

8

OS OS OS Apps Apps Apps Control plane Data path (AC/DC)

vNIC vNIC vNIC

Datacenter Network vSwitch Virtual Machines AC/DC (sender) AC/DC (receiver) Uniform per-flow CC Per-flow CC feedback Server

Case study: DCTCP CC in the vSwitch

slide-9
SLIDE 9

AC⚡DC Benefits

  • No modifications to VMs or hardware
  • Low latency provided by state-of-the-art CC algorithms
  • Improved TCP fairness and support both ECN and non-ECN flows
  • Enforce per-flow differentiation via congestion control, e.g.,
  • East-west and north-south flows can use different CCs (web server)
  • Give higher priority to “mission-critical” traffic (backend VM)

9

slide-10
SLIDE 10

AC⚡DC Design

  • Obtaining Congestion Control State
  • DCTCP Congestion Control in the vSwitch
  • Enforcing Congestion Control
  • Per-flow Differentiation via Congestion Control

10

slide-11
SLIDE 11

Obtaining Congestion Control State

  • Per-flow connection tracking
  • All traffic goes through the virtual switch
  • We can reconstruct CC via monitoring all the packets of a connection
  • Maintain per-flow congestion control variables
  • E.g., CC-related sequence numbers, dupack counter etc

11

Packet Flow classification Updating CC variables

slide-12
SLIDE 12

DCTCP Congestion Control in the vSwitch

  • Universal ECN marking
  • Get ECN feedback

12

slide-13
SLIDE 13

Universal ECN Marking

  • Why?
  • Not all VMs run ECN-Capable Transports (ECT) like DCTCP
  • Universal ECN Marking
  • All packets entering the fabric should be ECN-marked by the virtual switch
  • Solves the ECN and non-ECN coexistence problem

13

slide-14
SLIDE 14

Get ECN Feedback

14

congested switch Need a way to carry the congestion information back.

Congestion Experienced (CE) marked Sender side Receiver side

slide-15
SLIDE 15

Get ECN Feedback

15

congested switch

AC/DC AC/DC

sender receiver

Congestion feedback is encoded as 8 bytes: {ECN_bytes, Total_bytes}. Piggybacked on an existing TCP ACK (PACK).

Congestion Experienced (CE) marked

slide-16
SLIDE 16

DCTCP Congestion Control in the vSwitch

16

Extract CC info if it is PACK; Incoming ACK Update connection tracking variables; Update ⍺ once every RTT; Congestion? tcp_cong_avoid(); No Loss? Yes ⍺=max_alpha; Yes No wnd=wnd*(1 - ⍺/2); AC/DC enforces CC on the flow; Send ACK to VM; Cut wnd in last RTT? Yes No DCTCP Congestion Control Law

slide-17
SLIDE 17

Enforcing Congestion Control

  • TCP sends min(CWND, RWND)
  • CWND is congestion control window (congestion control)
  • RWND is receiver’s advertised window (flow control)
  • AC⚡DC reuses RWND for congestion control purpose
  • VMs with unaltered TCP stacks will naturally follow our enforcement
  • Non-conforming flows can be policed by dropping any excess packets

not allowed by the calculated congestion window

  • Loss has to be recovered e2e, this incentivizes tenants to respect standards

17

slide-18
SLIDE 18

Control Law for Per-flow Differentiation

18

𝑆𝑋𝑂𝐸 = 𝑆𝑋𝑂𝐸 ∗ (1 − 𝛽 2) 𝑆𝑋𝑂𝐸 = 𝑆𝑋𝑂𝐸 ∗ (1 − (𝛽 − 𝛽𝛾 2 ))

When 𝛾 is close to 1, it becomes DCTCP. When 𝛾 is close to 0, it backs-off aggressively. Larger 𝛾 for higher priority traffic.

DCTCP: AC⚡DC TCP:

slide-19
SLIDE 19

Implementation

  • Prototype implementation in Open vSwitch kernel datapath
  • ~1200 LoC added
  • Our design leverages available techniques to improve performance
  • RCU-enabled hash tables to perform connection tracking
  • AC⚡DC manipulates TCP segments, instead of MTU-sized packets
  • AC⚡DC leverages NIC checksumming so the TCP checksum does not have to

be recomputed after header fields are modified

19

NIC Hypervisor TCP/IP AC⚡DC VM1 Stack VM2 Stack Manipulates TCP segments NIC recalculates TCP checksum TCP segment P P P P TSO

slide-20
SLIDE 20

Evaluation

  • Testbed: 17 servers (6-core, 60GB memory), 6 10Gbps switches
  • Microbenchmark topologies

20

senders receiver Incast topology Dumbbell topology senders receivers

slide-21
SLIDE 21

Evaluation

  • Macrobechmark topology
  • Metrics: TCP RTT, loss rate, Flow Completion Time (FCT)

21

17 servers attached to a 10G switch.

slide-22
SLIDE 22

Experiment Setting (compared 3 schemes)

  • CUBIC
  • CUBIC stack on top of standard OVS
  • DCTCP
  • DCTCP stack on top of standard OVS
  • AC⚡DC
  • CUBIC/Reno/Vegas/HighSpeed/Illinois stacks on top of AC⚡DC

22

VM VM VM CUBIC DCTCP Any

CUBIC DCTCP AC⚡DC

OVS OVS AC⚡DC VMs Hypervisor

slide-23
SLIDE 23

Tracking Window Size

23

Running DCTCP stack on top of AC⚡DC, only outputs calculated RWND without enforcement. AC⚡DC closely tracks the window size of DCTCP. Dumbbell topology senders receivers

slide-24
SLIDE 24

Convergence

24

CUBIC DCTCP AC/DC

AC/DC has comparable convergence properties as DCTCP and is better than CUBIC. Dumbbell topology senders receivers

slide-25
SLIDE 25

AC⚡DC improves fairness when VMs use different CCs

25

Standard OVS AC⚡DC Dumbbell topology senders receivers

slide-26
SLIDE 26

Overhead (CPU and Memory)

26

Sender side

Less than 1% additional CPU overhead compared with the baseline. Each connection uses 320 bytes to maintain CC variables (10k connections use 3.2MB). Dumbbell topology senders receivers

slide-27
SLIDE 27

TCP Incast RTT & drop rate

27

50th percentile RTT 99.9th percentile RTT Packet drop rate

AC⚡DC tracks the performance of DCTCP closely. senders receiver Incast topology

slide-28
SLIDE 28

Flow completion time with trace-driven workloads

28

Web-searching workload (DCTCP) Data-mining workload (CONGA)

AC⚡DC obtains same performance as DCTCP. AC⚡DC can reduce FCT by 36% - 76% compared with default CUBIC. 17 servers attached to a 10G switch.

slide-29
SLIDE 29

Summary

  • AC⚡DC allows administrators to regain control over arbitrary tenant

TCP stacks by enforcing congestion control in the virtual switch

  • AC⚡DC requires no changes to VMs or network hardware
  • AC⚡DC is scalable, light-weight (< 1% CPU overhead) and flexible

29

slide-30
SLIDE 30

Thanks!

30

slide-31
SLIDE 31

Backup Slides

31

slide-32
SLIDE 32

Related Work

  • DCTCP
  • ECN-based congestion control for DCNs
  • TIMELY
  • Latency-based congestion control for DCNs
  • Accurate latency measurement provided by accurate NIC timestamps
  • vCC
  • vCC and AC⚡DC are closely related works by two independent teams J

32

slide-33
SLIDE 33

ECN and non-ECN Coexistence

33

Switch configured with WRED/ECN ECN Non-ECN

When queue occupancy is larger than marking threshold, non-ECN packets are dropped

slide-34
SLIDE 34

IPSec

  • AC⚡DC is not able to inspect the TCP headers for IPSec traffic
  • May perform approximating rate limiting based on congestion

feedback information.

34