I nfluence of Recovery Time on TCP Behaviour Chris Develder Didier - - PowerPoint PPT Presentation

i nfluence of recovery time on tcp behaviour
SMART_READER_LITE
LIVE PREVIEW

I nfluence of Recovery Time on TCP Behaviour Chris Develder Didier - - PowerPoint PPT Presentation

I nfluence of Recovery Time on TCP Behaviour Chris Develder Didier Colle Pim Van Heuven Steven Van den Berghe Mario Pickavet Piet Demeester I ntroduction Network recovery: backup paths to recover traffic lost due to network failures


slide-1
SLIDE 1

I nfluence of Recovery Time on TCP Behaviour

Chris Develder Didier Colle Pim Van Heuven Steven Van den Berghe Mario Pickavet Piet Demeester

slide-2
SLIDE 2

I ntroduction

· Network recovery: backup paths to recover traffic lost due to network failures · Many questions remain to be answered:

  • How fast should this happen? Is fast protection better,
  • r isn't it desirable? How does e.g. TCP react to

protection switches?

slide-3
SLIDE 3

Outline

· Experiment set- up · Qualitative discussion · TCP goodput · More detailed analysis · Finding the "best" delay · Conclusion · Experiment set- up · Qualitative discussion · TCP goodput · More detailed analysis · Finding the "best" delay · Conclusion

slide-4
SLIDE 4

Experiment set-up

· Two sets of TCP flows:

– A→B: the "(protection) switched flows" – C→D: the "fixed flows"

· MPLS paths and pre- established backup paths

– to be able to influence exact timing – protection switch: "manually"

A B C D LSR 4 LSR 5 LSR 6 LSR 7 LSR 8 LSR 9 LSR 10 LSR 11

: access node : LSR : working path A- B : backup path A- B : working path C- D : access link : backbone link

slide-5
SLIDE 5

Experiment set-up

· Simulation scenario:

– start of TCP sources: random – [0- 10s[: link up – [10- 20s[: link down; protection switch after delay 0/ 50/ 1000 ms – [20- 30s[: link up again

A B C D LSR 4 LSR 5 LSR 6 LSR 7 LSR 8 LSR 9 LSR 10 LSR 11

: access node : LSR : working path A- B : backup path A- B : working path C- D : access link : backbone link

slide-6
SLIDE 6

Experiment set-up

· FYI: TCP NewReno mechanisms (RFC 2582)

  • slow start: (cwnd ≤ sstresh)

– increase cwnd: + 1 per ACK – set sstresh= cwnd/ 2; cwnd= 1 after timeout

  • congestion avoidance: (cwnd > sstresh)

– if cwnd reaches sstresh – linear increase of cwnd

  • fast recovery, fast retransmit:

– if packet loss: retransmit; sstresh= cwnd/ 2; cwnd= sstresh – three duplicate ACKs: sstresh*= 1/ 2; cwnd= sstresh+ 3

  • newreno: extend fast recovery and fast retr.

– for each extra duplicate ACK: cwnd+ + ; stay in fast recovery

slide-7
SLIDE 7

Outline

· Experiment set- up · Qualitative discussion · TCP goodput · More detailed analysis · Finding the "best" delay · Conclusion · Experiment set- up · Qualitative discussion · TCP goodput · More detailed analysis · Finding the "best" delay · Conclusion

slide-8
SLIDE 8

Qualitative discussion — what will happen?

· When a failure occurs:

– switched flows join fixed ones – backbone link will become bottleneck – due to overload, packet losses will occur – TCP will react by backing off

slide-9
SLIDE 9

Qualitative discussion — what will happen?

· Influence of protection switch delay:

– no delay:

  • immediate buffer overflow on bottleneck backbone

link

  • both fixed and switched flows are heavily affected

– small delay:

  • switched flows have backed off somewhat when

joining the fixed ones

  • fixed flows are less affected

– large delay:

  • switched flows fall back to zero
  • rather smooth transition of bottleneck from access to

backbone

slide-10
SLIDE 10

Qualitative discussion — simulation parameters

· Simulation parameters:

– number of TCP NewReno sources:

  • 5 fixed,
  • 5 switched

– access bandwidth: 8 Mbit/ s – backbone bandwidth: 10 Mbit/ s – propagation delay: 10ms/ link

  • this results in a RTT of 100- 150ms

(+ 20ms in case of protection switch)

– queue size: 50 packets – max. TCP window size set at 30

slide-11
SLIDE 11

Qualitative discussion — bandwidth and queues

· No protection switching delay (0ms)

A B C D

bandwidth occupation queue occupation

  • before failure: access links are bottleneck
  • during failure: bottleneck shifts to backbone
  • after failure: access links are bottleneck (queues

in access are being filled again) slow! link is filled for 80% ; queue empty link is filled for 100% ; queue filled link gets filled for 100% ; immediate queue overflow;

  • scillations due to TCP behaviour

bandwidth drops: fixed flows are affected due to losses in backbone bandwidth seriously drops; recovery is rather slow! immediate overflow! 100% bandwidth drops

slide-12
SLIDE 12

Qualitative discussion — bandwidth and queues

· Small protection switching delay (50ms)

A B C D

bandwidth occupation queue occupation

NO immediate overflow!

  • during failure: bottleneck shifts to backbone
  • after failure: access links are bottleneck (queues

in access are being filled again) link gets filled for 100% ; NO immediate queue overflow;

  • scillations due to TCP behaviour

bandwidth drops: fixed flows are affected AFTER CERTAIN DELAY bandwidth drops less; recovery apparently is faster faster... delay

  • before failure: access links are bottleneck

link is filled for 80% ; queue empty link is filled for 100% ; queue filled

slide-13
SLIDE 13

Qualitative discussion — bandwidth and queues

· Large protection switching delay (1000ms)

A B C D

bandwidth occupation queue occupation

  • during failure: bottleneck shifts to backbone
  • after failure: access links are bottleneck (queues

in access are being filled again)

  • before failure: access links are bottleneck

link is filled for 80% ; queue empty link is filled for 100% ; queue filled link gets filled for 100% after delay; NO immediate queue overflow: very gradual shift of bottleneck bandwidth drops: fixed flows are affected only after rather long delay bandwidth drops to zero; very gradual recovery delay slow! gradual shift of bottleneck

slide-14
SLIDE 14

Outline

· Experiment set- up · Qualitative discussion · TCP goodput · More detailed analysis · Finding the "best" delay · Conclusion · Experiment set- up · Qualitative discussion · TCP goodput · More detailed analysis · Finding the "best" delay · Conclusion

slide-15
SLIDE 15

TCP goodput

· Previous slides showed througput, window size evolution and queue occupation:

– this learnt something about what happens, – but it isn't obvious to decide what is best from these graphs

· So: what matters to end user?

– end user of TCP only cares about how long it takes to transfer file, access webpage, etc. – what matters is GOODPUT: number of bytes successfully transported end- to- end per second

slide-16
SLIDE 16

TCP goodput

· Goodput evolution for different delays per flow category:

0 k 1.250 k 2.500 k 10 20 30

switch 0.000 switch 0.050 switch 1.000 fix 0.000 fix 0.050 fix 1.000

switched flows fixed flows

no delay:

  • switched lose

significantly

  • fixed show drop too

50 ms delay:

  • switched lose as much

as for delay 0, but

  • drop in goodput for

fixed is smaller 1000 ms delay:

  • switched lose a lot

more and recover more slowly

  • drop in goodput for

fixed is less (of course)

slide-17
SLIDE 17

TCP goodput

· Goodput evolution for different delays over aggregate of all flows:

  • The difference between the

three cases is limited to the first seconds after the failure

  • For the first second, the 50

ms case has 28.72% better total goodput than the 0 ms case

0 k 1.000 k 2.000 k 10 20 30

delay 0.000 delay 0.050 delay 1.000

0 k 1,000 k delay 0 ms delay 50 ms switched flows fixed flows

2 8 .7 2 %

all flows

slide-18
SLIDE 18

TCP goodput

· Preliminary conclusion:

– extremely fast protection switching is not a must – it is better to have a certain delay than none at all, – but finding the optimal value doesn't appear to be simple

(dependent on round trip time for TCP flows, and also on traffic load)

slide-19
SLIDE 19

Outline

· Experiment set- up · Qualitative discussion · TCP goodput · More detailed analysis · Finding the "best" delay · Conclusion · Experiment set- up · Qualitative discussion · TCP goodput · More detailed analysis · Finding the "best" delay · Conclusion

slide-20
SLIDE 20

More detailed analysis

· Main cause for better goodput with delay 50 ms:

  • delay 0 ms: TCP sources suffering multiple packet

losses recover slowly if they stay in fast retransmit & recovery phase ⇒ only one packet per round trip time (RTT) is transmitted

  • delay 50 ms: some TCP flows fall back to slow start

(due to timeout) ⇒ this gives better goodput! (more than one packet/ RTT)

slide-21
SLIDE 21

More detailed analysis

· Illustration by packet traces

  • horizontal X- axis: time (s)
  • vertical Y- axis: sequence

number of packet or ACK

  • markers:

packet sent ack recieved packet dropped ack dropped flow 1 flow 2 flow 3 switched flows fixed flows flow 1 flow 2 flow 3

  • how it works:

– packet is sent – ACK is received – new packet is sent

slide-22
SLIDE 22

More detailed analysis

· Illustration by packet traces

switched flows fixed flows

  • at time of link failure: losses of packets that

are being transported (switched flows only) Delay 0 ms:

  • almost immediately after failure: buffer
  • verflow on bottleneck link

(affects ALL flows)

  • TCP algortithm: duplicate ACKs cause source

to go into fast retransmit & fast recovery;

  • nly 1 packet is retransmitted per RTT
  • next buffer overflows: same applies, but less

packets per source are lost

slide-23
SLIDE 23

More detailed analysis

· Illustration by packet traces

switched flows fixed flows

  • no immediate

buffer overflow Delay 50 ms:

  • some sources

timeout and fall back to slow start ⇒ faster recovery!

  • fixed are not

affected until first buffer overflow

  • overall faster

recovery

slide-24
SLIDE 24

Outline

· Experiment set- up · Qualitative discussion · TCP goodput · More detailed analysis · Finding the "best" delay · Conclusion · Experiment set- up · Qualitative discussion · TCP goodput · More detailed analysis · Finding the "best" delay · Conclusion

slide-25
SLIDE 25

Finding the best delay

· Previous slides:

– indication of importance of delay for goodput – "special" circumstances: same RTT for all TCP flows, all TCP sources originated at same node

· Therefore:

– mixture of different RTTs – different source nodes for different flows

slide-26
SLIDE 26

Finding the best delay

· Experiment set- up:

– propagation delay:

  • first access link: random in [1ms,100ms[
  • all other links: 1ms

– number of sources: 10 fixed, 10 switched

· Scenario (times in s):

– TCP sources randomly start in [0.1,2.1] – [0,5[ link up; [5,10[ link down; [10,15[ link up

F0 F9 ... S0 S9 ... A B C D

: access node : LSR : working A- B : backup A- B : working C- D : access link : backbone link

slide-27
SLIDE 27

Finding the best delay

· Analysis:

– 240 different runs (other random seeds) – distrubution of f(x)= Good(x)/ Good(0),

  • Good(x)= total goodput over all flows during first 1.5

seconds after link failure for a protection switch delay

  • f x milliseconds

– interpretation of f(x):

  • if f(x)> 100%

then delay of x results in better goodput than no delay at all

  • if f(x)< 100%

then delay of x results in worse goodput than no delay at all

  • e.g. f(x)= 110%

means delay of x gives 10% more goodput than no delay at all

slide-28
SLIDE 28

Finding the best delay

· Analysis: distrubution of f(x)= Good(x)/ Good(0)

0% 5% 10% 70% 80% 90% 100% 110% 120% 130% 140% 150% 160% 170%

  • rel. amount
  • f goodput

0.000 0.050 0.250 0.500 1.000 fit fit fit fit fit access = 90% backbone TCP NewReno

  • all delays result in

better goodput than no delay at all: delay 250ms: 7.55% delay 1000ms: 3.98% delay 50ms: 11.89% delay 500ms: 6.91%

  • X- axis: f(x):

goodput compared to goodput for delay 0 ms (same random seed)

  • Y- axis: P[ f(x) ]:

probability of finding f(x) (histogram)

slide-29
SLIDE 29

Outline

· Experiment set- up · Qualitative discussion · TCP goodput · More detailed analysis · Finding the "best" delay · Conclusion · Experiment set- up · Qualitative discussion · TCP goodput · More detailed analysis · Finding the "best" delay · Conclusion

slide-30
SLIDE 30

Conclusion

· Conclusions:

  • We have studied the effect of recovery on TCP flows
  • From simulation results, we have inferred that

recovery time doesn't necessarily need to be as small as possible

  • For TCP traffic, introducing a protection switch delay

may be useful

· Future work:

  • Pursue detailed analysis of simulation results; e.g.

look at what happens after link recovery

  • Extend investigation to other (larger, more complex)

topologies.

slide-31
SLIDE 31

the

Thanks for your attention… Please feel free to ask any questions you might have!