SIGCOMM '05 1
On the Predictability of Large Transfer TCP Throughput
Qi He Constantine Dovrolis Mostafa Ammar
College of Computing Georgia Institute of Technology
On the Predictability of Large Transfer TCP Throughput Qi He - - PowerPoint PPT Presentation
On the Predictability of Large Transfer TCP Throughput Qi He Constantine Dovrolis Mostafa Ammar College of Computing Georgia Institute of Technology SIGCOMM '05 1 Outline TCP throughput prediction: problem statement and motivation
SIGCOMM '05 1
College of Computing Georgia Institute of Technology
SIGCOMM '05 2
A formula-based predictor Types of FB prediction errors Experimental evaluation
Typical history-based predictors Dealing with outliers and level shifts Experimental evaluation
What makes some paths less predictable than others?
SIGCOMM '05 3
Predict the throughput of a bulk TCP transfer on a given path
Server selection Overlay/multi-homed routing Load balancing Grid computing P2P downloading
SIGCOMM '05 4
SIGCOMM '05 5
Prediction based on actual TCP transfers No previous transfers required
Advantages
History of previous TCP transfers on the same path Estimates of path’s RTT and loss rate
Inputs
Prediction accuracy? Prediction accuracy?
Issues
Time series forecasting theory
History Based (HB)
Analytical model for TCP throughput
Formula Based (FB) Basis Prediction Method
SIGCOMM '05 6
FB can be significantly inaccurate, especially for congestion-limited flows HB is quite accurate even with simple linear predictors and sporadic previous samples
Focus on cause-effect relations, rather than black box evaluation
Load, degree of multiplexing Receiver window, transfer frequency
SIGCOMM '05 7
A formula-based predictor Types of FB prediction errors Experimental evaluation
Typical history-based predictors Dealing with outliers and level shifts Experimental evaluation
What makes some paths less predictable than others?
SIGCOMM '05 8
T, p: RTT and loss rate experienced during the flow We use PFTK model by Padhye et. al (Sigcomm ’98)
2
M: path MTU (Maximum Transfer Unit) W: TCP maximum congestion window T0: TCP retransmission timeout b: segments released per new ACK
SIGCOMM '05 9
Typical measurement: periodic probing, e.g., Ping
2 ^
SIGCOMM '05 10
Typical measurement: periodic probing, e.g., Ping
= > + + = ' if , ) ' , ' min( ' if ), ' , ) ' 32 1 ( ) 8 ' 3 , 1 min( 3 ' 2 ' min(
2 ^
p T W A p T W p p bp T bp T M R
A’: available bandwidth estimation
SIGCOMM '05 11
T’, p’ T, p Underestimate or
Adaptive and bursty TCP sampling vs. non-adaptive periodic sampling Overestimate throughput Additional load of the target flow may increase T, p Effect Issue Temporal: before flow during flow Sampling: periodic probing TCP “sampling”
SIGCOMM '05 12
RTT T' and Loss Rate p' (ping, 60s) TCP Throughput R, RTT Te and Loss Rate pe (ping and iperf, 60s ) Available bandwidth A' (pathload, 20s-60s)
One Measurement Epoch
Epoch: IPerf for TCP transfers, pathload for available bandwidth, ping (interval: 100ms, pkt size: 41bytes) for RTT & loss rate
SIGCOMM '05 13
Epoch: IPerf for TCP transfers, pathload for available bandwidth, ping (interval: 100ms, pkt size: 41bytes) for RTT & loss rate
Available bandwidth A' (pathload, 20s-60s) RTT T' and Loss Rate p' (ping, 60s) TCP Throughput R, RTT Te and Loss Rate pe (ping and iperf, 60s )
150 epochs Each trace consists of 150 consecutive epochs We used 35 Internet paths; 7 traces on each path; hosts in US, Europe, Korea
SIGCOMM '05 14
^ ^
^
=(1/w) R, and = wR both have: |E|=w-1 e.g., = ½ R, and = 2 R both have: |E|=1
^
^ ^
n = i i
1 2
SIGCOMM '05 15
Overestimation by >100% (E>1) for 40% of the measurements Dominance of overestimation errors (E>0)
Prevalent occurrences of T’ < T and p’ < p
20 40 60 80 100
2 4 6 8 10 CDF (%) Relative Error E CDF of Relative Prediction Error (lossy paths)
SIGCOMM '05 16
Overestimation by >100% (E>1) for 40% of the measurements Dominance of overestimation errors (E>0)
Prevalent occurrences of T’ < T and p’ < p
20 40 60 80 100 20 40 60 80 100 0.001 0.02 0.04 0.06 0.08 0.1 CDF (%) RTT Increase (ms) Loss Rate Increase Loss rate increase RTT increase
SIGCOMM '05 17
Prediction using Ping RTT & loss rate measurements during target flow Prediction errors are still significant, but overestimation & underestimation are almost symmetric
20 40 60 80 100
2 4 6 8 CDF (%) Relative Error E CDF of Relative Prediction Error RTT/loss rate during TCP flow RTT/loss rate prior to TCP flow
SIGCOMM '05 18
Large errors are more common in lower-throughput paths Explanation: in a congested path, slight load increase causes large loss rate increase
SIGCOMM '05 19
80 30 10 2 1 0.5 0.1 5 10 15 20 RMSRE (log scale) Path Index W=20KB (window-limited) W=1MB (congestion-limited)
Throughput is more predictable for window-limited TCP flows Explanation: window-limited flows do not saturate path’s bottleneck
SIGCOMM '05 20
A formula-based predictor Types of FB prediction errors Experimental evaluation
Typical history-based predictors Dealing with outliers and level shifts Experimental evaluation
What makes some paths less predictable than others?
SIGCOMM '05 21
Moving Average (MA) Exponentially Weighted Moving Average (EWMA) Non-seasonal Holt-Winters (HW) An EWMA variation that captures the time series trend
+ − = + = i n i k k i
X n X
1 1
1 ˆ
i i i
1
+
t i s i s i t i f i i s i t i s i f i
X X X X X X X X X X
1 1 1 1
ˆ ) 1 ( ) ˆ ˆ ( ˆ ˆ ) 1 ( ˆ ˆ ˆ ˆ
− − + +
− + − = − + = + = β β α α
1 2 1 −
n n
SIGCOMM '05 22
Why are LS and OL undesirable?
Cause large prediction errors and differences among predictors; complicate the analysis of HB predictability
Dealing with LS and OL is more important than choosing among predictors
Actions: ignore OL, restart predictor upon LS
2 4 6 8 10 12 14 16 18 20 40 60 80 100 120 140 Throughput (Mbps) Measurement Epoch UTAH-LULEA
SIGCOMM '05 23
HB prediction is much more accurate than FB prediction 90% of traces have RMSREs < 0.4 (with LS/OL detection) With LS/OL detections, the choices of predictor and of predictor parameters make little difference
20 40 60 80 100 0.01 0.1 0.4 1 5 CDF (%) RMSRE (log scale) CDF of Prediction Error (Holt-Winters) 0.8-HW-LSO 0.8-HW 0.4-HW
SIGCOMM '05 24
Longer measurement period does not degrade accuracy significantly
Even with single transfer every 24 minutes, RMSRE is below 0.4 in 75% of the traces
20 40 60 80 100 CDF CDF of Prediction Error vs. Sampling Frequency 3-min interval 6-min interval 24-min interval 45-min interval 0.01 0.1 0.2 0.4 1 5 RMSRE (log scale)
SIGCOMM '05 25
A formula-based predictor Types of FB prediction errors Empirical evaluation
Typical history-based predictors Dealing with outliers and level shifts Empirical evaluation
What makes some paths less predictable than others?
SIGCOMM '05 26
Factors examined:
Link utilization Degree of statistical multiplexing
Approach:
Analyze the Coefficient of Variation (CoV) of the marginal distribution of TCP throughput
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 CoV RMSRE Prediction Error vs. CoV
SIGCOMM '05 27
Model: Processor Sharing server (C) with Poisson session arrivals
Flow arrival rate: λ, avg flow size: θ Offered load: Per-flow rate: Distribution of # sessions:
C λθ ρ =
N C N r / ) ( =
) 1 ( ) ( ρ π
ρ
− =
N
N
[ ]
2 2
) 1 log( ) 1 ( ) , 2 ( ) 1 log( ) 1 ( ) ( ρ ρ ρ ρ ρ ρ − − ⋅ + − − = L N r CoV
CoV of per-session throughput increases with the
So, relative prediction error increases with offered load
0.5 1 1.5 2 2.5 3 0.2 0.4 0.6 0.8 1 CoV Offered load on a 50Mbps link
Bandwidth Share C/N Variance (Processor Sharing)
CoV
SIGCOMM '05 28
Consider the avail-bw A at non-congested Processor Sharing server (C)
Traffic model: N homogenous flows with rate limit: r, flow arrival rate: λ (Poisson), avg flow size: θ
Conclusion: provided that utilization remains constant, CoV of available bandwidth decreases as number of flows increases
So, we expect lower prediction error as number of flows increases
SIGCOMM '05 29
Main reason: loss rate and RTT increase due to target flow
Even with very simple predictors and sporadic previous transfers
Hardest-to-predict paths: heavily utilized bottleneck link, loaded with just a few flows