[PDF] - Reliable Multicast topics critical applications may require some PDF Document

SLIDE 1

Rossi − Pagani A.A. 2003−2004 SRM

critical applications may require some guarantees about the

delivery of messages to the group members

✁

financial transactions, monitoring and management of industrial plants, file transfer, conference...

reliable multicast in real systems: SRM protocol what does "reliable multicast" really mean? ✁

formal problem definition; hierarchy of problems

how do failures affect reliable transmission? ✁

definition of (hierarchy of) failure models

example algorithms to solve the reliable multicast problem

Reliable Multicast − topics

Rossi − Pagani A.A. 2003−2004 SRM

TCP supplies reliable e2e unicast transport service

✂

more than reliable.... connection−oriented!

unsuitable for multicast: heterogeneous recipients
recipients may join/leave the session at different times

✁

membership monitoring for connection opening/closing

✁

decide whether joining receivers should start receiving data from the beginning of the transmission or not

✄

virtual synchrony: 1 tick every group membership change

(network) failures could affect several (neighbor) recipients

at one time

Reliable Multicast − introduction

SLIDE 2

Rossi − Pagani A.A. 2003−2004 SRM

recipients could receive different sets of messages and

have different requirements for congestion control

✁

ACKs implosion at the sender

✁

monitoring of the reception state for each recipient: n windows

✁

estimation of the round−trip delay for each recipient

hence: appropriate original protocols are needed
receiver−driven approach

✁

no ACKs : receivers ask for lost messages

✄

under the assumption that losses are not frequent!

Reliable Multicast − introduction

Rossi − Pagani A.A. 2003−2004 SRM

Scalable Reliable Multicast

compliance with/exploitation of the TCP/IP stack
minimal: "eventual delivery of all the data to all the group

members"

✁

Warning ! Which processes are group members?!

parametrized: performance can be optimized depending on

the application communication pattern and semantics

✁

adaptive algorithm for unknown topology or changing mship

no knowledge of the group membership or the src’s identity

SLIDE 3

Rossi − Pagani A.A. 2003−2004 SRM

System model

sources and recipients belong to the same group G
naming of the data units (persistent)

✁

no wrap around problem as with units numbering

applications such that operations are idempotent

✁

reception of duplicate msgs doesn’t jeopardize the application /* duplicate filtering easy to add */

IP multicast available; clocks synchronized via NTP

✁

symmetric paths assumed to estimate the round−trip time

Rossi − Pagani A.A. 2003−2004 SRM

SRM: communication pattern

☎

let d_XY be the e2e delay between two nodes X and Y

☎

rcvr A detects a lost msg m generated by the source S

✁

set random request timer tq_A in [C1 d_SA, (C1+C2) d_SA]

✁

if (req received for m from C before the timer expiration)

✄

then suppress your own request; tq_A = 2 tq_A; wait for the reply

✄

else multicast request and wait (2 tq_A) for the reply

☎

B such that it has received m and receives a repair request for m from A

✁

set random repair timer in [D1 d_AB, (D1+D2) d_AB]

✁

if (repair received for m from other node) then suppress repair

✁

else multicast repair; ignore further requests for 3 d_SB

SLIDE 4

Rossi − Pagani A.A. 2003−2004 SRM

SRM: discussion

lost msgs (last ACK) detected by exchanging session msgs

✁

Periodical state report (as RTCP), also used to estimate e2e delay

wait before sending request: duplicate suppression

✁

if other reqs multicast earlier, request timer increased to reduce duplication probability

✁

same mechanism to suppress duplicate repairs

every node can repair the loss: load distribution
successive requests temporarily ignored to overcome

network transmission delay (request sent while repair is on the way)

Rossi − Pagani A.A. 2003−2004 SRM

SRM: discussion

duplicate suppression reduces communication o/h

✁

the longer a node waits before sending req, the more efficient

long wait negatively affects repair promptness
C1, C2, D1, D2 values affect the network performance

✁

high C1: longer wait before repair ; high C2: lower probability of duplicate requests /* the same for D1, D2 */

✁

for regular topologies, optimal values can be found

✁

in the sequel we assume uniform topology: all links with delay 1

SLIDE 5

Rossi − Pagani A.A. 2003−2004 SRM

Optimization for bus topology: deterministic suppression

✆

C1=D1=1; C2=D2=0: all duplicate reqs/repairs suppressed

✆

1st node A after the loss point sends req at t + d_SA

✆

1st node before the loss point replies at t + d_SA +2

✆

R_k repairs at t+k+2+d_SA rather than at t+2d_SA+3k

1 2 3 4 5 6 7 8

X

src t t+1 t+2 t+3 t+4 t+5 loss detected t+2 t+4 req repair

Rossi − Pagani A.A. 2003−2004 SRM

Optimization for star topology: probabilistic suppression

☎

for all X, Y d_XY = 2 = d

☎

C1=D1=0; C2=D2 >= 1

☎

all nodes notice loss at t

☎

1st req sent at t+x : suppressed all reqs scheduled in [t+x+d, C2 d]

☎

#reqs scheduled in [t1, t2] are (G−1) (t2−t1)/(C2 d) /* uniform */

☎

1st req scheduled at d C2/(G− 1)

☎

sent only the reqs scheduled in [d C2/(G−1), d C2/(G−1) +d]

☎

#sent requests 1 + (G−2)/C2

(G−2)*[d C2/(G−1) + d − d C2/(G−1)]/(d C2) = (G−2) * (d/(d C2)) = (G−2)/C2

✝

the higher C2, the lower the # of duplicates, and the higher the repair delay

2 1 3 4 5 6 src

X

SLIDE 6

Rossi − Pagani A.A. 2003−2004 SRM

Optimization for tree topology

intermediate between bus

and star

tq_A in [t+dC1,

t+dC1+dC2]

downstream node B such

that d_AB=j receives A’s req at t+dC1+dC2+j at most

downstream node B

detects loss at t+j; tq_B expires not before than t+j+(d+j)C1

req of a downstream node

is suppressed when t+dC1+dU[C2]+j <= t+j+(d+j)C1+(d+j)U[C2]

req suppressed if

dC2/C1<=j

the smaller C2/C1, the

higher the # of suppressed reqs

S A B C

Rossi − Pagani A.A. 2003−2004 SRM

Adaptive algorithm

if unknown topology, difficult to estimate optimal C1, C2
IDEA: if high # duplicate reqs then increase timer interval

if low # duplicate reqs then decrease timer interval /* to increase repair promptness */ nodes close to both the loss point and the source should have lower C1 and C2 than other recipients

dynamic adaptation allows to trace both traffic congestion

and group membership dynamics

parameters updated upon request timer expiration or reset

SLIDE 7

Rossi − Pagani A.A. 2003−2004 SRM

Adaptive algorithm: variables

request_period = time between two successive tq settings
ave_req_del = average delay between timer set and reset
# duplicates estimated via an exponential−weighted

average: ave_dup_req = (1−α)ave_dup_req + α #_dup_req

ave_dup_req = average # duplicate reqs between two

successive timer settings

AveDup, AveDelay = upper bounds on the # of duplicates

and the repair delay

request from A carries d_SA

Rossi − Pagani A.A. 2003−2004 SRM

Adaptive algorithm: pseudocode

update ave_req_delay ; update ave_dup_req
if (sent request) decrease C1
if (received req from recipients farer from

the src than the current node) decrease C2

else if (ave_dup_req > AveDup) increase

both C1 and /* above all! */ C2

else if (ave_dup_req < AveDup−ε)

✞

if (ave_req_del > AveDelay) decrease C2

✞

if (ave_dup_req < α) decrease C1

else increase C1 /* AveDup−

ε<=ave_dup_req<=AveDup */

SLIDE 8

Rossi − Pagani A.A. 2003−2004 SRM

Adaptive algorithm: discussion

problem: how much should C1 and C2 be increased or

decreased? /* oscillations */

experiments show that the adaptive algorithm decreases

the # duplicate repairs w.r.t. the non−adaptive algorithm, but has more variable repair delay (competitive w.r.t. TCP)

the choice of AveDup and AveDelay allows to characterize

the tradeoff between duplicate suppression and repair promptness, depending on the application semantics

problem: how should timers be set if multiple failures?

Rossi − Pagani A.A. 2003−2004 SRM

Concluding remarks

parameters optimization may be a problem
example usage:

✁

BGP: reliable distribution of the routing information

✄

SRM avoids to establish and maintain O(n^2) connections

✁

news distribution, web mirrors: delay insensitive

✄

ptimization w.r.t. duplicate suppression
applications available that make use of SRM (e.g.

delivery of messages to the group members

financial transactions, monitoring and management of industrial plants, file transfer, conference...

formal problem definition; hierarchy of problems

definition of (hierarchy of) failure models

Reliable Multicast − topics

more than reliable.... connection−oriented!

membership monitoring for connection opening/closing

decide whether joining receivers should start receiving data from the beginning of the transmission or not

virtual synchrony: 1 tick every group membership change

at one time

Reliable Multicast − introduction

have different requirements for congestion control

ACKs implosion at the sender

monitoring of the reception state for each recipient: n windows

estimation of the round−trip delay for each recipient

no ACKs : receivers ask for lost messages

under the assumption that losses are not frequent!

Reliable Multicast − introduction

Scalable Reliable Multicast

members"

more complex problems (namely, ordering) left to upper layers

Warning ! Which processes are group members?!

the application communication pattern and semantics

adaptive algorithm for unknown topology or changing mship

System model

no wrap around problem as with units numbering

reception of duplicate msgs doesn’t jeopardize the application /* duplicate filtering easy to add */

symmetric paths assumed to estimate the round−trip time

SRM: communication pattern

let d_XY be the e2e delay between two nodes X and Y

rcvr A detects a lost msg m generated by the source S

set random request timer tq_A in [C1 d_SA, (C1+C2) d_SA]

if (req received for m from C before the timer expiration)

then suppress your own request; tq_A = 2 tq_A; wait for the reply

else multicast request and wait (2 tq_A) for the reply

B such that it has received m and receives a repair request for m from A

set random repair timer in [D1 d_AB, (D1+D2) d_AB]

if (repair received for m from other node) then suppress repair

else multicast repair; ignore further requests for 3 d_SB

SRM: discussion

Periodical state report (as RTCP), also used to estimate e2e delay

if other reqs multicast earlier, request timer increased to reduce duplication probability

same mechanism to suppress duplicate repairs

network transmission delay (request sent while repair is on the way)

SRM: discussion

the longer a node waits before sending req, the more efficient

high C1: longer wait before repair ; high C2: lower probability of duplicate requests /* the same for D1, D2 */

for regular topologies, optimal values can be found

in the sequel we assume uniform topology: all links with delay 1

Optimization for bus topology: deterministic suppression

C1=D1=1; C2=D2=0: all duplicate reqs/repairs suppressed

1st node A after the loss point sends req at t + d_SA

1st node before the loss point replies at t + d_SA +2

R_k repairs at t+k+2+d_SA rather than at t+2d_SA+3k

1 2 3 4 5 6 7 8

X

src t t+1 t+2 t+3 t+4 t+5 loss detected t+2 t+4 req repair

Optimization for star topology: probabilistic suppression

for all X, Y d_XY = 2 = d

C1=D1=0; C2=D2 >= 1

all nodes notice loss at t

1st req sent at t+x : suppressed all reqs scheduled in [t+x+d, C2 d]

#reqs scheduled in [t1, t2] are (G−1) (t2−t1)/(C2 d) /* uniform */

1st req scheduled at d C2/(G− 1)

sent only the reqs scheduled in [d C2/(G−1), d C2/(G−1) +d]

#sent requests 1 + (G−2)/C2

2 1 3 4 5 6 src

X

Optimization for tree topology

and star

t+dC1+dC2]

that d_AB=j receives A’s req at t+dC1+dC2+j at most

detects loss at t+j; tq_B expires not before than t+j+(d+j)C1

is suppressed when t+dC1+dU[C2]+j <= t+j+(d+j)C1+(d+j)U[C2]

dC2/C1<=j

higher the # of suppressed reqs

S A B C

Adaptive algorithm

if low # duplicate reqs then decrease timer interval /* to increase repair promptness */ nodes close to both the loss point and the source should have lower C1 and C2 than other recipients

and group membership dynamics