Static Worksharing Strategies for Heterogeneous Computers with - - PowerPoint PPT Presentation

static worksharing strategies for heterogeneous computers
SMART_READER_LITE
LIVE PREVIEW

Static Worksharing Strategies for Heterogeneous Computers with - - PowerPoint PPT Presentation

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion Static Worksharing Strategies for Heterogeneous Computers with Unrecoverable Failures Anne Benoit, Yves Robert, Arnold Rosenberg and Fr ed eric Vivien Ecole


slide-1
SLIDE 1

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Static Worksharing Strategies for Heterogeneous Computers with Unrecoverable Failures

Anne Benoit, Yves Robert, Arnold Rosenberg and Fr´ ed´ eric Vivien

´ Ecole Normale Sup´ erieure de Lyon, France Anne.Benoit@ens-lyon.fr http://graal.ens-lyon.fr/~abenoit

HeteroPar’2009, August 25

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 1/ 25

slide-2
SLIDE 2

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Problem

Large divisible computational workload Single-round distribution, one-port model Assemblage of p different-speed computers Unrecoverable interruptions A-priori knowledge of risk (failure probability) Goal: maximize expected amount of work done

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 2/ 25

slide-3
SLIDE 3

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Related work

Landmark paper by Bhatt, Chung, Leighton & Rosenberg

  • n cycle stealing

Hardware failures

Fault tolerant computing (hence scheduling) becomes

unavoidable

Well, same story told since very long!

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 3/ 25

slide-4
SLIDE 4

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Related work

Landmark paper by Bhatt, Chung, Leighton & Rosenberg

  • n cycle stealing

Hardware failures

Fault tolerant computing (hence scheduling) becomes

unavoidable

Well, same story told since very long!

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 3/ 25

slide-5
SLIDE 5

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Related work

Landmark paper by Bhatt, Chung, Leighton & Rosenberg

  • n cycle stealing

Hardware failures

Fault tolerant computing (hence scheduling) becomes

unavoidable

Well, same story told since very long!

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 3/ 25

slide-6
SLIDE 6

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Cycle-stealing scenario

Big job of size W to execute during week-end Enroll p computers P1 to Pp Assign load fraction to each Pi How to compute these load fractions? How to order communications? Risk increases with time Machines reclaimed at 8am on Monday with probability 1

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 4/ 25

slide-7
SLIDE 7

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Cycle-stealing scenario

Big job of size W to execute during week-end Enroll p computers P1 to Pp Assign load fraction to each Pi How to compute these load fractions? How to order communications? Risk increases linearly with time Machines reclaimed at 8am on Monday with probability 1

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 4/ 25

slide-8
SLIDE 8

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Cycle-stealing scenario

Big job of size W to execute during week-end Enroll p computers P1 to Pp Assign load fraction to each Pi How to compute these load fractions? How to order communications? Risk increases linearly with time Machines reclaimed at 8am on Monday with probability 1

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 4/ 25

slide-9
SLIDE 9

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Outline

1

Technical framework

2

Homogeneous computers, with communication costs

3

Heterogeneous computers, no communication costs

4

Heterogeneous computers, with communication costs

5

Conclusion

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 5/ 25

slide-10
SLIDE 10

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Outline

1

Technical framework

2

Homogeneous computers, with communication costs

3

Heterogeneous computers, no communication costs

4

Heterogeneous computers, with communication costs

5

Conclusion

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 6/ 25

slide-11
SLIDE 11

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Interruption model

dPr = κdt for t ∈ [0, 1/κ]

  • therwise

Pr(w) = min

  • 1,

w κdt

  • = min{1, κw}

Goal: maximize expected work production

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 7/ 25

slide-12
SLIDE 12

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Interruption model

dPr = κdt for t ∈ [0, 1/κ]

  • therwise

Pr(w) = min

  • 1,

w κdt

  • = min{1, κw}

Goal: maximize expected work production

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 7/ 25

slide-13
SLIDE 13

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Rules of the game

Single-round, no overlap, one-port communications Homogeneous network Different-speed computers Failure-rate per unit-load communication z = κ bw Failure-rate per unit-load computation by computer Pi xi = κ speedi

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 8/ 25

slide-14
SLIDE 14

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Rules of the game

Single-round, no overlap, one-port communications Homogeneous network Different-speed computers Failure-rate per unit-load communication z = κ bw Failure-rate per unit-load computation by computer Pi xi = κ speedi

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 8/ 25

slide-15
SLIDE 15

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

With two computers (1/2)

P1 z Y x1 Y First send P1 a chunk of size Y : E1 = Y (1 − (z + x1)Y ) Then send P2 the remaining load (of size W − Y ): E2 = (W − Y ) (1 − (zW + x2(W − Y )) Total expectation: E(Y ) = E1 + E2

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 9/ 25

slide-16
SLIDE 16

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

With two computers (1/2)

P1 z Y x1 Y P2 z (W − Y ) x2 (W − Y ) First send P1 a chunk of size Y : E1 = Y (1 − (z + x1)Y ) Then send P2 the remaining load (of size W − Y ): E2 = (W − Y ) (1 − (zW + x2(W − Y )) Total expectation: E(Y ) = E1 + E2

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 9/ 25

slide-17
SLIDE 17

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

With two computers (1/2)

P1 z Y x1 Y P2 z (W − Y ) x2 (W − Y ) First send P1 a chunk of size Y : E1 = Y (1 − (z + x1)Y ) Then send P2 the remaining load (of size W − Y ): E2 = (W − Y ) (1 − (zW + x2(W − Y )) Total expectation: E(Y ) = E1 + E2

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 9/ 25

slide-18
SLIDE 18

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

With two computers (2/2)

E(Y ) = Y (1 − (z + x1)Y ) + (W − Y ) (1 − (zW + x2(W − Y )) E(Y ) = W − (z + x2)W 2 − (z + x1 + x2)Y 2 + (z + 2x2)WY Y (opt) = z + 2x2 2(z + x1 + x2)W Eopt(W , 2) = E(Y (opt)) = W − 4x1x2 + 4(x1 + x2)z + 3z2 4(x1 + x2 + z)

  • W 2

Symmetric in x1 and x2 ⇒ ordering of the communications has no impact

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 10/ 25

slide-19
SLIDE 19

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

With two computers (2/2)

E(Y ) = Y (1 − (z + x1)Y ) + (W − Y ) (1 − (zW + x2(W − Y )) E(Y ) = W − (z + x2)W 2 − (z + x1 + x2)Y 2 + (z + 2x2)WY Y (opt) = z + 2x2 2(z + x1 + x2)W Eopt(W , 2) = E(Y (opt)) = W − 4x1x2 + 4(x1 + x2)z + 3z2 4(x1 + x2 + z)

  • W 2

Symmetric in x1 and x2 ⇒ ordering of the communications has no impact

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 10/ 25

slide-20
SLIDE 20

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

With two computers (2/2)

E(Y ) = Y (1 − (z + x1)Y ) + (W − Y ) (1 − (zW + x2(W − Y )) E(Y ) = W − (z + x2)W 2 − (z + x1 + x2)Y 2 + (z + 2x2)WY Y (opt) = z + 2x2 2(z + x1 + x2)W Eopt(W , 2) = E(Y (opt)) = W − 4x1x2 + 4(x1 + x2)z + 3z2 4(x1 + x2 + z)

  • W 2

Symmetric in x1 and x2 ⇒ ordering of the communications has no impact

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 10/ 25

slide-21
SLIDE 21

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Extra rule: distribute entire load

Total load W small enough so that we distribute it entirely Quite reasonable but dramatic impact on solution Definition Distrib(p): compute Eopt(W , p), the optimal value of expected total amount of work done when distributing entire workload W ≤

1 z+max(xi) to the p remote computers

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 11/ 25

slide-22
SLIDE 22

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

A sufficient condition

Proposition If W ≤

1 z+max(xi), there is a non-zero probability that the last

computer does not fail before or during its computation Proof

  • last computer Pi can start computing at time-step Y /bw,

where Y ≤ W is the total load sent to all preceding computers

  • introducing idle times cannot improve solution:

failure risk grows with time

  • then Pi needs V /speedi time-steps to execute its own chunk of

size V , where Y + V ≤ W

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 12/ 25

slide-23
SLIDE 23

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Outline

1

Technical framework

2

Homogeneous computers, with communication costs

3

Heterogeneous computers, no communication costs

4

Heterogeneous computers, with communication costs

5

Conclusion

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 13/ 25

slide-24
SLIDE 24

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Optimal solution

Theorem When xi = x (identical speeds), the optimal solution to Distrib(p) is obtained with same size chunks (hence of size W

p ),

and Eopt(W , p) = W − (p + 1)z + 2x 2p W 2 Closed-form formula Proof by induction

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 14/ 25

slide-25
SLIDE 25

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Proof (1/2)

Let fp = (p+1)z+2x

2p

We prove by induction on p that Eopt(W , p) = W − fpW 2, with same size chunks Case p = 1, f1 = z + x, Eopt(W , 1) = W (1 − (z + x)W ), OK From n to n + 1 computers:

  • chunk sent to Pn+1 of size W − Y
  • by induction Eopt(Y , n) = Y (1 − fnY ), with chunk sizes Y

n

  • for n + 1 computers, we have

E(Y ) = Y (1 − fnY ) + (W − Y ) (1 − zW − x(W − Y ))

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 15/ 25

slide-26
SLIDE 26

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Proof (1/2)

Let fp = (p+1)z+2x

2p

We prove by induction on p that Eopt(W , p) = W − fpW 2, with same size chunks Case p = 1, f1 = z + x, Eopt(W , 1) = W (1 − (z + x)W ), OK From n to n + 1 computers:

  • chunk sent to Pn+1 of size W − Y
  • by induction Eopt(Y , n) = Y (1 − fnY ), with chunk sizes Y

n

  • for n + 1 computers, we have

E(Y ) = Y (1 − fnY ) + (W − Y ) (1 − zW − x(W − Y ))

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 15/ 25

slide-27
SLIDE 27

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Proof (1/2)

Let fp = (p+1)z+2x

2p

We prove by induction on p that Eopt(W , p) = W − fpW 2, with same size chunks Case p = 1, f1 = z + x, Eopt(W , 1) = W (1 − (z + x)W ), OK From n to n + 1 computers:

  • chunk sent to Pn+1 of size W − Y
  • by induction Eopt(Y , n) = Y (1 − fnY ), with chunk sizes Y

n

  • for n + 1 computers, we have

E(Y ) = Y (1 − fnY ) + (W − Y ) (1 − zW − x(W − Y ))

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 15/ 25

slide-28
SLIDE 28

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Proof (2/2)

E(Y ) = W − (z + x)W 2 − (fn + x)Y 2 + (z + 2x)WY Y (opt) =

z+2x 2(fn+x)W

Eopt(W , n + 1) = E(Y (opt)) = W − αW 2, where α = z + x − (z+2x)2

4(fn+x)

By induction, fn + x = (n+1)z+2x

2n

+ x = (n+1)(z+2x)

2n

Finally, α = z + x − n(z+2x)

2(n+1) = (n+2)z+2x 2(n+1)

= fn+1 Y (opt) =

n n+1W , with chunk sizes Y (opt) n

= W

n+1

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 16/ 25

slide-29
SLIDE 29

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Proof (2/2)

E(Y ) = W − (z + x)W 2 − (fn + x)Y 2 + (z + 2x)WY Y (opt) =

z+2x 2(fn+x)W

Eopt(W , n + 1) = E(Y (opt)) = W − αW 2, where α = z + x − (z+2x)2

4(fn+x)

By induction, fn + x = (n+1)z+2x

2n

+ x = (n+1)(z+2x)

2n

Finally, α = z + x − n(z+2x)

2(n+1) = (n+2)z+2x 2(n+1)

= fn+1 Y (opt) =

n n+1W , with chunk sizes Y (opt) n

= W

n+1

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 16/ 25

slide-30
SLIDE 30

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Proof (2/2)

E(Y ) = W − (z + x)W 2 − (fn + x)Y 2 + (z + 2x)WY Y (opt) =

z+2x 2(fn+x)W

Eopt(W , n + 1) = E(Y (opt)) = W − αW 2, where α = z + x − (z+2x)2

4(fn+x)

By induction, fn + x = (n+1)z+2x

2n

+ x = (n+1)(z+2x)

2n

Finally, α = z + x − n(z+2x)

2(n+1) = (n+2)z+2x 2(n+1)

= fn+1 Y (opt) =

n n+1W , with chunk sizes Y (opt) n

= W

n+1

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 16/ 25

slide-31
SLIDE 31

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Outline

1

Technical framework

2

Homogeneous computers, with communication costs

3

Heterogeneous computers, no communication costs

4

Heterogeneous computers, with communication costs

5

Conclusion

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 17/ 25

slide-32
SLIDE 32

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Symmetric functions

Definition Given n ≥ 1, for 0 ≤ i ≤ n, σ(n)

i

denotes the i-th symmetric function of x1, x2, . . . , xn: σ(n)

i

=

  • 1≤j1<j2<···<ji≤n

i

  • k=1

xjk. By convention σ(n) = 1 For instance with n = 3, σ(3)

1

= x1 + x2 + x3, σ(3)

2

= x1x2 + x1x3 + x2x3 and σ(3)

3

= x1x2x3

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 18/ 25

slide-33
SLIDE 33

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Optimal solution

Theorem When z = 0 (no communication cost), the optimal solution to Distrib(p) is to send Pi a chunk of size

Q

k=i xk

σ(p)

p−1

W , and Eopt(W , p) = W − σ(p)

p

σ(p)

p−1

W 2 Closed-form formula Proof by induction

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 19/ 25

slide-34
SLIDE 34

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Outline

1

Technical framework

2

Homogeneous computers, with communication costs

3

Heterogeneous computers, no communication costs

4

Heterogeneous computers, with communication costs

5

Conclusion

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 20/ 25

slide-35
SLIDE 35

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Optimal solution (1/2)

Theorem When using the ordering P1, P2, . . . , Pp, the optimal solution is to send Pi a chunk of size αi,pW , and Eopt(W , p) = W − fpW 2

For p ≥ 1, fp = p

i=0 λiσ(p) p−izi

p−1

i=0 λiσ(p) p−i−1zi , with λi = 4(1+i) 2i

α1,1 = 1, and for p ≥ 2, αp,p = 2fp−1 − z 2(fp−1 + xp) α1,p = 1 − α2,p for p ≥ 2 αi,p = z + 2xi−1 2(fi−1 + xi)(1 − αi+1,p) for p > i ≥ 2

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 21/ 25

slide-36
SLIDE 36

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Optimal solution (2/2)

Theorem In the general case, the optimal solution to Distrib(p) does not depend upon the ordering of the communications from the master Easy algorithm but no closed-form formula Quite complicated proof (still by induction)

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 22/ 25

slide-37
SLIDE 37

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Outline

1

Technical framework

2

Homogeneous computers, with communication costs

3

Heterogeneous computers, no communication costs

4

Heterogeneous computers, with communication costs

5

Conclusion

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 23/ 25

slide-38
SLIDE 38

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Conclusion

First extension to master-slave divisible load approach with unrecoverable failures Nice set of results, similar to classical setting Turned out more difficult than expected ( or ?) Tractability of case with different link bandwidths?

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 24/ 25

slide-39
SLIDE 39

Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion

Perspectives

Resources with different risk functions (different owner categories?) Case with different speeds, different link bandwidths and different risk functions Combine with replication strategies Combine with multi-round techniques Comparison with dynamic approaches

Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 25/ 25