Chair of Network Architectures and Services Department of Informatics Technical University of Munich
iLab X Transport Layer Dominik Scholz scholz@net.in.tum.de Chair - - PowerPoint PPT Presentation
iLab X Transport Layer Dominik Scholz scholz@net.in.tum.de Chair - - PowerPoint PPT Presentation
Chair of Network Architectures and Services Department of Informatics Technical University of Munich iLab X Transport Layer Dominik Scholz scholz@net.in.tum.de Chair of Network Architectures and Services Department of Informatics Technical
Outline
Transport Layer UDP TCP Other Transport Layer Protocols
1/39
Outline
Transport Layer UDP TCP Other Transport Layer Protocols
2/39
Transport Layer
wireless LAN app 1 app 2 TCP/UDP IP app 1 app 2 TCP/UDP IP Ethernet driver WLAN driver IP Ethernet driver WLAN driver
application protocol application protocol transport protocol IP protocol IP protocol Ethernet protocol WLAN protocol
app 1 app 1 Ethernet router
3/39
Ports
- purpose: transport layer multiplexing / demultiplexing
- 16bit number (0..65535)
- address applications on a host
Client/Server communication
- client-side: usually random choice from [1024..65535]
- server-side: well known port numbers
Well-known port numbers
- HTTP/HTTPS: TCP port 80/443
- SSH: TCP port 22
- DNS: UDP and TCP port 53
see: http://www.iana.org/assignments/port-numbers
4/39
Sockets
application layer API to networking functionality usually offered by the OS network stack
Message Orientation
sender receiver send(“Hi Bob!”) recv() -> “Hi Bob!” send(“How are you?”) recv() -> “How are you?”
Stream Orientation
sender receiver (possible outcome) send(“Hi Bob!”) recv() -> “” send(“How are you?”) recv() -> “Hi Bob!How are you?”
5/39
Transport Protocol Implementations
User Datagram Protocol (UDP)
- unreliable
- lightweight
Transmission Control Protocol (TCP)
- reliable
- connection oriented
- sending-rate limitation
Other
- Stream Control Transmission Protocol (SCTP)
- Multipath TCP (MTCP)
- Quick UDP Internet Connections (QUIC)
6/39
Outline
Transport Layer UDP TCP Other Transport Layer Protocols
7/39
User Datagram Protocol (UDP)
15 16 31
source port destination port length checksum
Functions
- port multiplexing / demultiplexing
- error checking
Example Applications
- DNS (port 53)
- RIP (port 520)
- media streaming / realtime communication
8/39
User Datagram Protocol (UDP)
15 16 31
source port destination port length checksum
Functions
- port multiplexing / demultiplexing
- error checking
Example Applications
- DNS (port 53)
- RIP (port 520)
- media streaming / realtime communication
Why is UDP used for these applications?
8/39
UDP Summary
Characteristics
- simple and lightweight
- unreliable
- message-oriented
- stateless
- good choice for time-critical applications
- supports unidirectional communication
Problems
- unlimited sending rate may overload the network/receiver
9/39
Outline
Transport Layer UDP TCP Other Transport Layer Protocols
10/39
Transmission Control Protocol (TCP)
Functions
- port multiplexing / demultiplexing
- error checking
- reliable and ordered delivery
- stream-orientation
- control of sending-rate (avoid overloading the network or the receiver)
Applications
- most reliable protocols: HTTP(S), SMTP
, etc.
11/39
Background: Reliable Data Transfer
How does the sender know whether a packet was successfully transferred?
- requires feedback from the receiver
- requires identification of packets
Sender Receiver segment X segment Y ACK segment X ACK segment Y
12/39
Reliable Data Transfer in TCP
Sequence Number (SEQ)
- indicates the first data byte of a segment
- increased with every byte of payload sent
- initial SEQ is exchanged during connection establishment
Sender Receiver SEQ=5035 SEQ=6059 SEQ=12 ACK=6059 SEQ=12 ACK=7083
13/39
Reliable Data Transfer in TCP
Sequence Number (SEQ)
- indicates the first data byte of a segment
- increased with every byte of payload sent
- initial SEQ is exchanged during connection establishment
Sender Receiver SEQ=5035 SEQ=6059 SEQ=12 ACK=6059 SEQ=12 ACK=7083 What is the size of the segments?
13/39
Reliable Data Transfer in TCP (contd.)
Acknowledgement Number (ACK)
- gives the next sequence number that the receiver is expecting
- also acknowledges all smaller sequence numbers
Sender Receiver SEQ=5035 SEQ=6059 SEQ=12 ACK=6059 SEQ=12 ACK=7083
14/39
Retransmission after Timeout
- timeout at the sender triggers retransmission
Sender Receiver SEQ=1 SEQ=2 ACK=2
timeout
SEQ=2
15/39
Fast Retransmit
- sender retansmits segment after receiving three duplicate ACKs
Sender Receiver SEQ=1 SEQ=2 SEQ=3 SEQ=4 SEQ=5 ACK=2 ACK=2 ACK=2 ACK=2 3 duplicate ACKs SEQ=2
16/39
Connection Establishment
3-way-handshake
- establish initial sequence numbers and window sizes
- ut-of-band TCP injection: http://arxiv.org/abs/1602.07128
- negotiate options
Client Server [ S Y N ] S E Q = 7 [ S Y N , A C K ] S E Q = 1 3 A C K = 8 [ A C K ] S E Q = 8 A C K = 1 4
17/39
Connection Establishment
3-way-handshake
- establish initial sequence numbers and window sizes
- ut-of-band TCP injection: http://arxiv.org/abs/1602.07128
- negotiate options
- vulnerable to SYN-flood attacks → SYN cookies, TCPCT
Client Server [ S Y N ] S E Q = 7 [ S Y N , A C K ] S E Q = 1 3 A C K = 8 [ A C K ] S E Q = 8 A C K = 1 4
17/39
Connection Teardown
4-way-handshake
- each side needs to terminate the connection
→ half-open connections possible
- initiator waits for a timeout before closing the connection
Initiator Receiver [ F I N ] [ A C K ] [ F I N ] [ A C K ]
timeout
18/39
TCP header
3 4 6 7 15 16 31
source port destination port sequence number acknowledgement number hdr len resvd
U R G A C K P S H R S T S Y N F I N
window size checksum urgent pointer [options]
- up to 40 Bytes of header options
e.g. Window Scale, Selective Acknowledgment (SACK)
- header length: 20 – 60 Bytes
19/39
Limiting the Sending-rate
Why?
- avoid overloading the receiver → flow control
- avoid overloading the network → congestion control
Sending Window
- specifies the amount of unacknowledged data that the sender is allowed to send
- is equal to the max. number of bytes in transit
- sending_window = min(receive_window, cwnd)
20/39
Flow Control
Flow Control
- prohibits overloading the receiver
- receiver announces the current size of the receive_window to the sender in the TCP header window size
field
- limited by the buffer size at the receiver
21/39
Background: Network Congestion
Jacobson, Van. "Congestion avoidance and control." ACM SIGCOMM Computer Communication Review, 1988.
22/39
Background: Network Congestion
- segments get lost due to full buffers in routers
- retransmissions may even amplify a congestion
Jacobson, Van. "Congestion avoidance and control." ACM SIGCOMM Computer Communication Review, 1988.
22/39
Background: Network Congestion
- segments get lost due to full buffers in routers
- retransmissions may even amplify a congestion
- self-clocking creates an equilibrium at the max. sending-rate:
Jacobson, Van. "Congestion avoidance and control." ACM SIGCOMM Computer Communication Review, 1988.
22/39
Congestion Control
Principles
- basic assumption: packet loss is only caused by congestion
- end-host driven: no support from the network necessary
Two phases
- Slow Start starts a connection: gradually increase the amount of data in-transit until reaching the
equilibrium
- Congestion Avoidance tries to keep the equilibrium state and react to changes on the link
State
- current size of the congestion window (cwnd)
- slow start threshold (ssthresh) defines transition between phases
23/39
Congestion Control: Slow Start Phase
- initialization: cwnd = 10 ∗ MSS, ssthresh
time[RTT] 1 2 3 4 5 cwnd[MSS] 20 40 60 80
24/39
Congestion Control: Slow Start Phase
- initialization: cwnd = 10 ∗ MSS, ssthresh
- when receiving an ACK: cwnd = cwnd + 1MSS
time[RTT] 1 2 3 4 5 cwnd[MSS] 20 40 60 80
24/39
Congestion Control: Slow Start Phase
- initialization: cwnd = 10 ∗ MSS, ssthresh
- when receiving an ACK: cwnd = cwnd + 1MSS
time[RTT] 1 2 3 4 5 cwnd[MSS] 20 40 60 80
24/39
Congestion Control: Slow Start Phase
- initialization: cwnd = 10 ∗ MSS, ssthresh
- when receiving an ACK: cwnd = cwnd + 1MSS
time[RTT] 1 2 3 4 5 cwnd[MSS] 20 40 60 80
24/39
Congestion Control: Slow Start Phase
- initialization: cwnd = 10 ∗ MSS, ssthresh
- when receiving an ACK: cwnd = cwnd + 1MSS
time[RTT] 1 2 3 4 5 cwnd[MSS] 20 40 60 80
24/39
Congestion Control: Slow Start Phase
- initialization: cwnd = 10 ∗ MSS, ssthresh
- when receiving an ACK: cwnd = cwnd + 1MSS
time[RTT] 1 2 3 4 5 cwnd[MSS] 20 40 60 80 ssthresh
- r packet loss
24/39
Congestion Control: Congestion Avoidance Phase
- when receiving an ACK: increase cwnd using a cubic function
time[RTT] 1 2 3 4 5 cwnd[MSS] 80 100 120 140 160 ssthresh
25/39
Congestion Control: Congestion Avoidance Phase
- when receiving an ACK: increase cwnd using a cubic function
time[RTT] 1 2 3 4 5 cwnd[MSS] 80 100 120 140 160 ssthresh Wmax
25/39
Congestion Control: Congestion Avoidance Phase
- when receiving an ACK: increase cwnd using a cubic function
time[RTT] 1 2 3 4 5 cwnd[MSS] 80 100 120 140 160 ssthresh Wmax
25/39
Congestion Control: Congestion Avoidance Phase
- when receiving an ACK: increase cwnd using a cubic function
- slow growth around Wmax enhances stability
time[RTT] 1 2 3 4 5 cwnd[MSS] 80 100 120 140 160 ssthresh Wmax
25/39
Congestion Control: Congestion Avoidance Phase
- when receiving an ACK: increase cwnd using a cubic function
- fast growth away from Wmax increases bandwith utilization
time[RTT] 1 2 3 4 5 cwnd[MSS] 80 100 120 140 160 ssthresh Wmax
25/39
Congestion Control: Packet Loss
- timeout: assumption: the network is congested
→ go to slow start ssthresh = 0.8 ∗ last_cwnd cwnd = 10 ∗ MSS
- 3 duplicate ACKs: assumption: only a segment was lost
→ continue congestion avoidance ssthresh = 0.8 ∗ last_cwnd cwnd = ssthresh + 3MSS
26/39
TCP CUBIC
27/39
A Word of Caution
TCP Congestion Control details differ
- RFC2001 (1997), RFC2581, RFC5681, (2009): standard
- CUBIC: original paper1, RFC8312
1 Ha et al. "CUBIC: a new TCP-friendly high-speed TCP variant." ACM SIGOPS operating systems, (2008)
28/39
A Word of Caution
TCP Congestion Control details differ
- RFC2001 (1997), RFC2581, RFC5681, (2009): standard
- CUBIC: original paper1, RFC8312
- Lecture: concepts
1 Ha et al. "CUBIC: a new TCP-friendly high-speed TCP variant." ACM SIGOPS operating systems, (2008)
28/39
A Word of Caution
TCP Congestion Control details differ
- RFC2001 (1997), RFC2581, RFC5681, (2009): standard
- CUBIC: original paper1, RFC8312
- Lecture: concepts
- Linux 3.x: optimized/adapted implementation
1 Ha et al. "CUBIC: a new TCP-friendly high-speed TCP variant." ACM SIGOPS operating systems, (2008)
28/39
A Word of Caution
TCP Congestion Control details differ
- RFC2001 (1997), RFC2581, RFC5681, (2009): standard
- CUBIC: original paper1, RFC8312
- Lecture: concepts
- Linux 3.x: optimized/adapted implementation
- Linux 4.x: further improvements
1 Ha et al. "CUBIC: a new TCP-friendly high-speed TCP variant." ACM SIGOPS operating systems, (2008)
28/39
TCP CUBIC – Problems
- congestion indicated only by packet loss
- keeps the buffers full
Problems
- vulnerable to random packet loss
- high latency
29/39
TCP BBR
- Bottleneck-Bandwidth and RTT
- developed by Google, published late 2016
- available since Linux 4.9
- two congestion estimators
- estimated RTT
- estimated Bottleneck-Bandwidth
→ Bandwidth-Delay-Product (BDP) 30/39
TCP BBR vs. CUBIC
Neal Cardwell et. al. "BBR Congestion Control" IETF 97: Seoul. Nov 2016
31/39
TCP BBR vs. CUBIC
Neal Cardwell et. al. "BBR Congestion Control" IETF 97: Seoul. Nov 2016
32/39
TCP BBR – Problems
- Young and immature algorithm
- Actively researched
Problems2 3
- RTT unfairness
- Bottleneck overestimation (inter-flow unfairness)
- Inter-protocol unfairness
- Inter-flow synchronization
2
- M. Hock et al. "Experimental Evalution of BBR Congestion Control" ICNP 2017
3
- D. Scholz et al. "Towards a Deeper Understanding of TCP BBR Congestion Control" IFIP Networking 2018
33/39
TCP BBR – Problems
- Young and immature algorithm
- Actively researched
Problems2 3
- RTT unfairness
- Bottleneck overestimation (inter-flow unfairness)
- Inter-protocol unfairness
- Inter-flow synchronization
BBR 2.0 announced at IETF 100 in 2017 BBR 2.0 first details presented at IETF 102 in 2018
2
- M. Hock et al. "Experimental Evalution of BBR Congestion Control" ICNP 2017
3
- D. Scholz et al. "Towards a Deeper Understanding of TCP BBR Congestion Control" IFIP Networking 2018
33/39
TCP Options
Window Scaling
- default window size max. 65 KB (16bit field)
- example: 16MBit/s, 150ms RTT, bandwidth-delay product:
16MBit/s ∗ 0.15s = 2, 400Kbit = 300KB
- solution: window scaling allows to increase the window size up to 4GB
- window scaling is negotiated during the TCP handshake
- problem remains: sequence numbers (32bit) still limit the amount of unacknowledged data
Selective Acknowledgements (SACK)
- allow the receiver to acknowledge ranges of segments
- avoid unnecessary retransmissions compared to cumulative ACKs
34/39
TCP Summary
Characteristics
- complex
- reliable → head-of-line blocking
- stream-oriented
- sending-rate adaption
Problems
- vulnerable to resource exploitation
- congestion control may be too restrictive, e.g. wireless networks
35/39
Outline
Transport Layer UDP TCP Other Transport Layer Protocols
36/39
Stream Control Transmission Protocol (SCTP)
first standardized in 2000 by RFC 2690
- TCP/UDP hybrid: reliable, optional ordering, message-oriented
- permits reliable, unordered delivery
- other features: multihoming, 4-way-handshake, etc.
Problems:
- requires changes in application implementations
- lack of support in middleboxes (firewalls, NATs, etc.)
37/39
Multipath TCP (MPTCP)
- stardardized in 2013 by RFC 6824
- can use multiple interfaces/links simultaneously
- goal: improve resource utilization, throughput and reliability
- mimics standard TCP
, even offers a fallback mode
38/39
Quick UDP Internet Connections (QUIC)
- developed by Google, implemented in Chrome Browser, released in 2013
- UDP-based protocol that implements reliability, congestion control, multiple streams, encryption etc.
- goal: reduced latency (compared to TCP + TLS)
- mimics UDP (middlebox support)