iLab X Transport Layer Dominik Scholz scholz@net.in.tum.de Chair - - PowerPoint PPT Presentation

ilab x transport layer
SMART_READER_LITE
LIVE PREVIEW

iLab X Transport Layer Dominik Scholz scholz@net.in.tum.de Chair - - PowerPoint PPT Presentation

Chair of Network Architectures and Services Department of Informatics Technical University of Munich iLab X Transport Layer Dominik Scholz scholz@net.in.tum.de Chair of Network Architectures and Services Department of Informatics Technical


slide-1
SLIDE 1

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

iLab X Transport Layer

Dominik Scholz scholz@net.in.tum.de

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

SoSe 2019

slide-2
SLIDE 2

Outline

Transport Layer UDP TCP Other Transport Layer Protocols

1/39

slide-3
SLIDE 3

Outline

Transport Layer UDP TCP Other Transport Layer Protocols

2/39

slide-4
SLIDE 4

Transport Layer

wireless LAN app 1 app 2 TCP/UDP IP app 1 app 2 TCP/UDP IP Ethernet driver WLAN driver IP Ethernet driver WLAN driver

application protocol application protocol transport protocol IP protocol IP protocol Ethernet protocol WLAN protocol

app 1 app 1 Ethernet router

3/39

slide-5
SLIDE 5

Ports

  • purpose: transport layer multiplexing / demultiplexing
  • 16bit number (0..65535)
  • address applications on a host

Client/Server communication

  • client-side: usually random choice from [1024..65535]
  • server-side: well known port numbers

Well-known port numbers

  • HTTP/HTTPS: TCP port 80/443
  • SSH: TCP port 22
  • DNS: UDP and TCP port 53

see: http://www.iana.org/assignments/port-numbers

4/39

slide-6
SLIDE 6

Sockets

application layer API to networking functionality usually offered by the OS network stack

Message Orientation

sender receiver send(“Hi Bob!”) recv() -> “Hi Bob!” send(“How are you?”) recv() -> “How are you?”

Stream Orientation

sender receiver (possible outcome) send(“Hi Bob!”) recv() -> “” send(“How are you?”) recv() -> “Hi Bob!How are you?”

5/39

slide-7
SLIDE 7

Transport Protocol Implementations

User Datagram Protocol (UDP)

  • unreliable
  • lightweight

Transmission Control Protocol (TCP)

  • reliable
  • connection oriented
  • sending-rate limitation

Other

  • Stream Control Transmission Protocol (SCTP)
  • Multipath TCP (MTCP)
  • Quick UDP Internet Connections (QUIC)

6/39

slide-8
SLIDE 8

Outline

Transport Layer UDP TCP Other Transport Layer Protocols

7/39

slide-9
SLIDE 9

User Datagram Protocol (UDP)

15 16 31

source port destination port length checksum

Functions

  • port multiplexing / demultiplexing
  • error checking

Example Applications

  • DNS (port 53)
  • RIP (port 520)
  • media streaming / realtime communication

8/39

slide-10
SLIDE 10

User Datagram Protocol (UDP)

15 16 31

source port destination port length checksum

Functions

  • port multiplexing / demultiplexing
  • error checking

Example Applications

  • DNS (port 53)
  • RIP (port 520)
  • media streaming / realtime communication

Why is UDP used for these applications?

8/39

slide-11
SLIDE 11

UDP Summary

Characteristics

  • simple and lightweight
  • unreliable
  • message-oriented
  • stateless
  • good choice for time-critical applications
  • supports unidirectional communication

Problems

  • unlimited sending rate may overload the network/receiver

9/39

slide-12
SLIDE 12

Outline

Transport Layer UDP TCP Other Transport Layer Protocols

10/39

slide-13
SLIDE 13

Transmission Control Protocol (TCP)

Functions

  • port multiplexing / demultiplexing
  • error checking
  • reliable and ordered delivery
  • stream-orientation
  • control of sending-rate (avoid overloading the network or the receiver)

Applications

  • most reliable protocols: HTTP(S), SMTP

, etc.

11/39

slide-14
SLIDE 14

Background: Reliable Data Transfer

How does the sender know whether a packet was successfully transferred?

  • requires feedback from the receiver
  • requires identification of packets

Sender Receiver segment X segment Y ACK segment X ACK segment Y

12/39

slide-15
SLIDE 15

Reliable Data Transfer in TCP

Sequence Number (SEQ)

  • indicates the first data byte of a segment
  • increased with every byte of payload sent
  • initial SEQ is exchanged during connection establishment

Sender Receiver SEQ=5035 SEQ=6059 SEQ=12 ACK=6059 SEQ=12 ACK=7083

13/39

slide-16
SLIDE 16

Reliable Data Transfer in TCP

Sequence Number (SEQ)

  • indicates the first data byte of a segment
  • increased with every byte of payload sent
  • initial SEQ is exchanged during connection establishment

Sender Receiver SEQ=5035 SEQ=6059 SEQ=12 ACK=6059 SEQ=12 ACK=7083 What is the size of the segments?

13/39

slide-17
SLIDE 17

Reliable Data Transfer in TCP (contd.)

Acknowledgement Number (ACK)

  • gives the next sequence number that the receiver is expecting
  • also acknowledges all smaller sequence numbers

Sender Receiver SEQ=5035 SEQ=6059 SEQ=12 ACK=6059 SEQ=12 ACK=7083

14/39

slide-18
SLIDE 18

Retransmission after Timeout

  • timeout at the sender triggers retransmission

Sender Receiver SEQ=1 SEQ=2 ACK=2

timeout

SEQ=2

15/39

slide-19
SLIDE 19

Fast Retransmit

  • sender retansmits segment after receiving three duplicate ACKs

Sender Receiver SEQ=1 SEQ=2 SEQ=3 SEQ=4 SEQ=5 ACK=2 ACK=2 ACK=2 ACK=2 3 duplicate ACKs SEQ=2

16/39

slide-20
SLIDE 20

Connection Establishment

3-way-handshake

  • establish initial sequence numbers and window sizes
  • ut-of-band TCP injection: http://arxiv.org/abs/1602.07128
  • negotiate options

Client Server [ S Y N ] S E Q = 7 [ S Y N , A C K ] S E Q = 1 3 A C K = 8 [ A C K ] S E Q = 8 A C K = 1 4

17/39

slide-21
SLIDE 21

Connection Establishment

3-way-handshake

  • establish initial sequence numbers and window sizes
  • ut-of-band TCP injection: http://arxiv.org/abs/1602.07128
  • negotiate options
  • vulnerable to SYN-flood attacks → SYN cookies, TCPCT

Client Server [ S Y N ] S E Q = 7 [ S Y N , A C K ] S E Q = 1 3 A C K = 8 [ A C K ] S E Q = 8 A C K = 1 4

17/39

slide-22
SLIDE 22

Connection Teardown

4-way-handshake

  • each side needs to terminate the connection

→ half-open connections possible

  • initiator waits for a timeout before closing the connection

Initiator Receiver [ F I N ] [ A C K ] [ F I N ] [ A C K ]

timeout

18/39

slide-23
SLIDE 23

TCP header

3 4 6 7 15 16 31

source port destination port sequence number acknowledgement number hdr len resvd

U R G A C K P S H R S T S Y N F I N

window size checksum urgent pointer [options]

  • up to 40 Bytes of header options

e.g. Window Scale, Selective Acknowledgment (SACK)

  • header length: 20 – 60 Bytes

19/39

slide-24
SLIDE 24

Limiting the Sending-rate

Why?

  • avoid overloading the receiver → flow control
  • avoid overloading the network → congestion control

Sending Window

  • specifies the amount of unacknowledged data that the sender is allowed to send
  • is equal to the max. number of bytes in transit
  • sending_window = min(receive_window, cwnd)

20/39

slide-25
SLIDE 25

Flow Control

Flow Control

  • prohibits overloading the receiver
  • receiver announces the current size of the receive_window to the sender in the TCP header window size

field

  • limited by the buffer size at the receiver

21/39

slide-26
SLIDE 26

Background: Network Congestion

Jacobson, Van. "Congestion avoidance and control." ACM SIGCOMM Computer Communication Review, 1988.

22/39

slide-27
SLIDE 27

Background: Network Congestion

  • segments get lost due to full buffers in routers
  • retransmissions may even amplify a congestion

Jacobson, Van. "Congestion avoidance and control." ACM SIGCOMM Computer Communication Review, 1988.

22/39

slide-28
SLIDE 28

Background: Network Congestion

  • segments get lost due to full buffers in routers
  • retransmissions may even amplify a congestion
  • self-clocking creates an equilibrium at the max. sending-rate:

Jacobson, Van. "Congestion avoidance and control." ACM SIGCOMM Computer Communication Review, 1988.

22/39

slide-29
SLIDE 29

Congestion Control

Principles

  • basic assumption: packet loss is only caused by congestion
  • end-host driven: no support from the network necessary

Two phases

  • Slow Start starts a connection: gradually increase the amount of data in-transit until reaching the

equilibrium

  • Congestion Avoidance tries to keep the equilibrium state and react to changes on the link

State

  • current size of the congestion window (cwnd)
  • slow start threshold (ssthresh) defines transition between phases

23/39

slide-30
SLIDE 30

Congestion Control: Slow Start Phase

  • initialization: cwnd = 10 ∗ MSS, ssthresh

time[RTT] 1 2 3 4 5 cwnd[MSS] 20 40 60 80

24/39

slide-31
SLIDE 31

Congestion Control: Slow Start Phase

  • initialization: cwnd = 10 ∗ MSS, ssthresh
  • when receiving an ACK: cwnd = cwnd + 1MSS

time[RTT] 1 2 3 4 5 cwnd[MSS] 20 40 60 80

24/39

slide-32
SLIDE 32

Congestion Control: Slow Start Phase

  • initialization: cwnd = 10 ∗ MSS, ssthresh
  • when receiving an ACK: cwnd = cwnd + 1MSS

time[RTT] 1 2 3 4 5 cwnd[MSS] 20 40 60 80

24/39

slide-33
SLIDE 33

Congestion Control: Slow Start Phase

  • initialization: cwnd = 10 ∗ MSS, ssthresh
  • when receiving an ACK: cwnd = cwnd + 1MSS

time[RTT] 1 2 3 4 5 cwnd[MSS] 20 40 60 80

24/39

slide-34
SLIDE 34

Congestion Control: Slow Start Phase

  • initialization: cwnd = 10 ∗ MSS, ssthresh
  • when receiving an ACK: cwnd = cwnd + 1MSS

time[RTT] 1 2 3 4 5 cwnd[MSS] 20 40 60 80

24/39

slide-35
SLIDE 35

Congestion Control: Slow Start Phase

  • initialization: cwnd = 10 ∗ MSS, ssthresh
  • when receiving an ACK: cwnd = cwnd + 1MSS

time[RTT] 1 2 3 4 5 cwnd[MSS] 20 40 60 80 ssthresh

  • r packet loss

24/39

slide-36
SLIDE 36

Congestion Control: Congestion Avoidance Phase

  • when receiving an ACK: increase cwnd using a cubic function

time[RTT] 1 2 3 4 5 cwnd[MSS] 80 100 120 140 160 ssthresh

25/39

slide-37
SLIDE 37

Congestion Control: Congestion Avoidance Phase

  • when receiving an ACK: increase cwnd using a cubic function

time[RTT] 1 2 3 4 5 cwnd[MSS] 80 100 120 140 160 ssthresh Wmax

25/39

slide-38
SLIDE 38

Congestion Control: Congestion Avoidance Phase

  • when receiving an ACK: increase cwnd using a cubic function

time[RTT] 1 2 3 4 5 cwnd[MSS] 80 100 120 140 160 ssthresh Wmax

25/39

slide-39
SLIDE 39

Congestion Control: Congestion Avoidance Phase

  • when receiving an ACK: increase cwnd using a cubic function
  • slow growth around Wmax enhances stability

time[RTT] 1 2 3 4 5 cwnd[MSS] 80 100 120 140 160 ssthresh Wmax

25/39

slide-40
SLIDE 40

Congestion Control: Congestion Avoidance Phase

  • when receiving an ACK: increase cwnd using a cubic function
  • fast growth away from Wmax increases bandwith utilization

time[RTT] 1 2 3 4 5 cwnd[MSS] 80 100 120 140 160 ssthresh Wmax

25/39

slide-41
SLIDE 41

Congestion Control: Packet Loss

  • timeout: assumption: the network is congested

→ go to slow start ssthresh = 0.8 ∗ last_cwnd cwnd = 10 ∗ MSS

  • 3 duplicate ACKs: assumption: only a segment was lost

→ continue congestion avoidance ssthresh = 0.8 ∗ last_cwnd cwnd = ssthresh + 3MSS

26/39

slide-42
SLIDE 42

TCP CUBIC

27/39

slide-43
SLIDE 43

A Word of Caution

TCP Congestion Control details differ

  • RFC2001 (1997), RFC2581, RFC5681, (2009): standard
  • CUBIC: original paper1, RFC8312

1 Ha et al. "CUBIC: a new TCP-friendly high-speed TCP variant." ACM SIGOPS operating systems, (2008)

28/39

slide-44
SLIDE 44

A Word of Caution

TCP Congestion Control details differ

  • RFC2001 (1997), RFC2581, RFC5681, (2009): standard
  • CUBIC: original paper1, RFC8312
  • Lecture: concepts

1 Ha et al. "CUBIC: a new TCP-friendly high-speed TCP variant." ACM SIGOPS operating systems, (2008)

28/39

slide-45
SLIDE 45

A Word of Caution

TCP Congestion Control details differ

  • RFC2001 (1997), RFC2581, RFC5681, (2009): standard
  • CUBIC: original paper1, RFC8312
  • Lecture: concepts
  • Linux 3.x: optimized/adapted implementation

1 Ha et al. "CUBIC: a new TCP-friendly high-speed TCP variant." ACM SIGOPS operating systems, (2008)

28/39

slide-46
SLIDE 46

A Word of Caution

TCP Congestion Control details differ

  • RFC2001 (1997), RFC2581, RFC5681, (2009): standard
  • CUBIC: original paper1, RFC8312
  • Lecture: concepts
  • Linux 3.x: optimized/adapted implementation
  • Linux 4.x: further improvements

1 Ha et al. "CUBIC: a new TCP-friendly high-speed TCP variant." ACM SIGOPS operating systems, (2008)

28/39

slide-47
SLIDE 47

TCP CUBIC – Problems

  • congestion indicated only by packet loss
  • keeps the buffers full

Problems

  • vulnerable to random packet loss
  • high latency

29/39

slide-48
SLIDE 48

TCP BBR

  • Bottleneck-Bandwidth and RTT
  • developed by Google, published late 2016
  • available since Linux 4.9
  • two congestion estimators
  • estimated RTT
  • estimated Bottleneck-Bandwidth

→ Bandwidth-Delay-Product (BDP) 30/39

slide-49
SLIDE 49

TCP BBR vs. CUBIC

Neal Cardwell et. al. "BBR Congestion Control" IETF 97: Seoul. Nov 2016

31/39

slide-50
SLIDE 50

TCP BBR vs. CUBIC

Neal Cardwell et. al. "BBR Congestion Control" IETF 97: Seoul. Nov 2016

32/39

slide-51
SLIDE 51

TCP BBR – Problems

  • Young and immature algorithm
  • Actively researched

Problems2 3

  • RTT unfairness
  • Bottleneck overestimation (inter-flow unfairness)
  • Inter-protocol unfairness
  • Inter-flow synchronization

2

  • M. Hock et al. "Experimental Evalution of BBR Congestion Control" ICNP 2017

3

  • D. Scholz et al. "Towards a Deeper Understanding of TCP BBR Congestion Control" IFIP Networking 2018

33/39

slide-52
SLIDE 52

TCP BBR – Problems

  • Young and immature algorithm
  • Actively researched

Problems2 3

  • RTT unfairness
  • Bottleneck overestimation (inter-flow unfairness)
  • Inter-protocol unfairness
  • Inter-flow synchronization

BBR 2.0 announced at IETF 100 in 2017 BBR 2.0 first details presented at IETF 102 in 2018

2

  • M. Hock et al. "Experimental Evalution of BBR Congestion Control" ICNP 2017

3

  • D. Scholz et al. "Towards a Deeper Understanding of TCP BBR Congestion Control" IFIP Networking 2018

33/39

slide-53
SLIDE 53

TCP Options

Window Scaling

  • default window size max. 65 KB (16bit field)
  • example: 16MBit/s, 150ms RTT, bandwidth-delay product:

16MBit/s ∗ 0.15s = 2, 400Kbit = 300KB

  • solution: window scaling allows to increase the window size up to 4GB
  • window scaling is negotiated during the TCP handshake
  • problem remains: sequence numbers (32bit) still limit the amount of unacknowledged data

Selective Acknowledgements (SACK)

  • allow the receiver to acknowledge ranges of segments
  • avoid unnecessary retransmissions compared to cumulative ACKs

34/39

slide-54
SLIDE 54

TCP Summary

Characteristics

  • complex
  • reliable → head-of-line blocking
  • stream-oriented
  • sending-rate adaption

Problems

  • vulnerable to resource exploitation
  • congestion control may be too restrictive, e.g. wireless networks

35/39

slide-55
SLIDE 55

Outline

Transport Layer UDP TCP Other Transport Layer Protocols

36/39

slide-56
SLIDE 56

Stream Control Transmission Protocol (SCTP)

first standardized in 2000 by RFC 2690

  • TCP/UDP hybrid: reliable, optional ordering, message-oriented
  • permits reliable, unordered delivery
  • other features: multihoming, 4-way-handshake, etc.

Problems:

  • requires changes in application implementations
  • lack of support in middleboxes (firewalls, NATs, etc.)

37/39

slide-57
SLIDE 57

Multipath TCP (MPTCP)

  • stardardized in 2013 by RFC 6824
  • can use multiple interfaces/links simultaneously
  • goal: improve resource utilization, throughput and reliability
  • mimics standard TCP

, even offers a fallback mode

38/39

slide-58
SLIDE 58

Quick UDP Internet Connections (QUIC)

  • developed by Google, implemented in Chrome Browser, released in 2013
  • UDP-based protocol that implements reliability, congestion control, multiple streams, encryption etc.
  • goal: reduced latency (compared to TCP + TLS)
  • mimics UDP (middlebox support)

39/39