Reliable Multicast topics critical applications may require some - - PDF document

reliable multicast topics
SMART_READER_LITE
LIVE PREVIEW

Reliable Multicast topics critical applications may require some - - PDF document

Reliable Multicast topics critical applications may require some guarantees about the delivery of messages to the group members financial transactions, monitoring and management of


slide-1
SLIDE 1

Rossi − Pagani A.A. 2003−2004 SRM

critical applications may require some guarantees about the

delivery of messages to the group members

financial transactions, monitoring and management of industrial plants, file transfer, conference...

reliable multicast in real systems: SRM protocol what does "reliable multicast" really mean? ✁

formal problem definition; hierarchy of problems

how do failures affect reliable transmission? ✁

definition of (hierarchy of) failure models

example algorithms to solve the reliable multicast problem

Reliable Multicast − topics

Rossi − Pagani A.A. 2003−2004 SRM

  • TCP supplies reliable e2e unicast transport service

more than reliable.... connection−oriented!

  • unsuitable for multicast: heterogeneous recipients
  • recipients may join/leave the session at different times

membership monitoring for connection opening/closing

decide whether joining receivers should start receiving data from the beginning of the transmission or not

virtual synchrony: 1 tick every group membership change

  • (network) failures could affect several (neighbor) recipients

at one time

Reliable Multicast − introduction

slide-2
SLIDE 2

Rossi − Pagani A.A. 2003−2004 SRM

  • recipients could receive different sets of messages and

have different requirements for congestion control

ACKs implosion at the sender

monitoring of the reception state for each recipient: n windows

estimation of the round−trip delay for each recipient

  • hence: appropriate original protocols are needed
  • receiver−driven approach

no ACKs : receivers ask for lost messages

under the assumption that losses are not frequent!

Reliable Multicast − introduction

Rossi − Pagani A.A. 2003−2004 SRM

Scalable Reliable Multicast

  • compliance with/exploitation of the TCP/IP stack
  • minimal: "eventual delivery of all the data to all the group

members"

more complex problems (namely, ordering) left to upper layers

Warning ! Which processes are group members?!

  • parametrized: performance can be optimized depending on

the application communication pattern and semantics

adaptive algorithm for unknown topology or changing mship

  • no knowledge of the group membership or the src’s identity
slide-3
SLIDE 3

Rossi − Pagani A.A. 2003−2004 SRM

System model

  • sources and recipients belong to the same group G
  • naming of the data units (persistent)

no wrap around problem as with units numbering

  • applications such that operations are idempotent

reception of duplicate msgs doesn’t jeopardize the application /* duplicate filtering easy to add */

  • IP multicast available; clocks synchronized via NTP

symmetric paths assumed to estimate the round−trip time

Rossi − Pagani A.A. 2003−2004 SRM

SRM: communication pattern

let d_XY be the e2e delay between two nodes X and Y

rcvr A detects a lost msg m generated by the source S

set random request timer tq_A in [C1 d_SA, (C1+C2) d_SA]

if (req received for m from C before the timer expiration)

then suppress your own request; tq_A = 2 tq_A; wait for the reply

else multicast request and wait (2 tq_A) for the reply

B such that it has received m and receives a repair request for m from A

set random repair timer in [D1 d_AB, (D1+D2) d_AB]

if (repair received for m from other node) then suppress repair

else multicast repair; ignore further requests for 3 d_SB

slide-4
SLIDE 4

Rossi − Pagani A.A. 2003−2004 SRM

SRM: discussion

  • lost msgs (last ACK) detected by exchanging session msgs

Periodical state report (as RTCP), also used to estimate e2e delay

  • wait before sending request: duplicate suppression

if other reqs multicast earlier, request timer increased to reduce duplication probability

same mechanism to suppress duplicate repairs

  • every node can repair the loss: load distribution
  • successive requests temporarily ignored to overcome

network transmission delay (request sent while repair is on the way)

Rossi − Pagani A.A. 2003−2004 SRM

SRM: discussion

  • duplicate suppression reduces communication o/h

the longer a node waits before sending req, the more efficient

  • long wait negatively affects repair promptness
  • C1, C2, D1, D2 values affect the network performance

high C1: longer wait before repair ; high C2: lower probability of duplicate requests /* the same for D1, D2 */

for regular topologies, optimal values can be found

in the sequel we assume uniform topology: all links with delay 1

slide-5
SLIDE 5

Rossi − Pagani A.A. 2003−2004 SRM

Optimization for bus topology: deterministic suppression

C1=D1=1; C2=D2=0: all duplicate reqs/repairs suppressed

1st node A after the loss point sends req at t + d_SA

1st node before the loss point replies at t + d_SA +2

R_k repairs at t+k+2+d_SA rather than at t+2d_SA+3k

1 2 3 4 5 6 7 8

X

src t t+1 t+2 t+3 t+4 t+5 loss detected t+2 t+4 req repair

Rossi − Pagani A.A. 2003−2004 SRM

Optimization for star topology: probabilistic suppression

for all X, Y d_XY = 2 = d

C1=D1=0; C2=D2 >= 1

all nodes notice loss at t

1st req sent at t+x : suppressed all reqs scheduled in [t+x+d, C2 d]

#reqs scheduled in [t1, t2] are (G−1) (t2−t1)/(C2 d) /* uniform */

1st req scheduled at d C2/(G− 1)

sent only the reqs scheduled in [d C2/(G−1), d C2/(G−1) +d]

#sent requests 1 + (G−2)/C2

(G−2)*[d C2/(G−1) + d − d C2/(G−1)]/(d C2) = (G−2) * (d/(d C2)) = (G−2)/C2

the higher C2, the lower the # of duplicates, and the higher the repair delay

2 1 3 4 5 6 src

X

slide-6
SLIDE 6

Rossi − Pagani A.A. 2003−2004 SRM

Optimization for tree topology

  • intermediate between bus

and star

  • tq_A in [t+dC1,

t+dC1+dC2]

  • downstream node B such

that d_AB=j receives A’s req at t+dC1+dC2+j at most

  • downstream node B

detects loss at t+j; tq_B expires not before than t+j+(d+j)C1

  • req of a downstream node

is suppressed when t+dC1+dU[C2]+j <= t+j+(d+j)C1+(d+j)U[C2]

  • req suppressed if

dC2/C1<=j

  • the smaller C2/C1, the

higher the # of suppressed reqs

S A B C

Rossi − Pagani A.A. 2003−2004 SRM

Adaptive algorithm

  • if unknown topology, difficult to estimate optimal C1, C2
  • IDEA: if high # duplicate reqs then increase timer interval

if low # duplicate reqs then decrease timer interval /* to increase repair promptness */ nodes close to both the loss point and the source should have lower C1 and C2 than other recipients

  • dynamic adaptation allows to trace both traffic congestion

and group membership dynamics

  • parameters updated upon request timer expiration or reset
slide-7
SLIDE 7

Rossi − Pagani A.A. 2003−2004 SRM

Adaptive algorithm: variables

  • request_period = time between two successive tq settings
  • ave_req_del = average delay between timer set and reset
  • # duplicates estimated via an exponential−weighted

average: ave_dup_req = (1−α)ave_dup_req + α #_dup_req

  • ave_dup_req = average # duplicate reqs between two

successive timer settings

  • AveDup, AveDelay = upper bounds on the # of duplicates

and the repair delay

  • request from A carries d_SA

Rossi − Pagani A.A. 2003−2004 SRM

Adaptive algorithm: pseudocode

  • update ave_req_delay ; update ave_dup_req
  • if (sent request) decrease C1
  • if (received req from recipients farer from

the src than the current node) decrease C2

  • else if (ave_dup_req > AveDup) increase

both C1 and /* above all! */ C2

  • else if (ave_dup_req < AveDup−ε)

if (ave_req_del > AveDelay) decrease C2

if (ave_dup_req < α) decrease C1

  • else increase C1 /* AveDup−

ε<=ave_dup_req<=AveDup */

slide-8
SLIDE 8

Rossi − Pagani A.A. 2003−2004 SRM

Adaptive algorithm: discussion

  • problem: how much should C1 and C2 be increased or

decreased? /* oscillations */

  • experiments show that the adaptive algorithm decreases

the # duplicate repairs w.r.t. the non−adaptive algorithm, but has more variable repair delay (competitive w.r.t. TCP)

  • the choice of AveDup and AveDelay allows to characterize

the tradeoff between duplicate suppression and repair promptness, depending on the application semantics

  • problem: how should timers be set if multiple failures?

Rossi − Pagani A.A. 2003−2004 SRM

Concluding remarks

  • parameters optimization may be a problem
  • example usage:

BGP: reliable distribution of the routing information

SRM avoids to establish and maintain O(n^2) connections

news distribution, web mirrors: delay insensitive

  • ptimization w.r.t. duplicate suppression
  • applications available that make use of SRM (e.g.

whiteboard)