Numerically Stable Binary Gradient Coding Neophytos Charalambides - - PowerPoint PPT Presentation

numerically stable binary gradient coding
SMART_READER_LITE
LIVE PREVIEW

Numerically Stable Binary Gradient Coding Neophytos Charalambides - - PowerPoint PPT Presentation

Numerically Stable Binary Gradient Coding Neophytos Charalambides Hessam Mahdavifar Alfred Hero Department of Electrical Engineering and Computer Science, University of Michigan June, 2020 1 / 21 Outline for section 1 Introduction and


slide-1
SLIDE 1

Numerically Stable Binary Gradient Coding

Neophytos Charalambides Hessam Mahdavifar Alfred Hero

Department of Electrical Engineering and Computer Science, University of Michigan

June, 2020

1 / 21

slide-2
SLIDE 2

Outline for section 1

Introduction and Motivation Gradient Coding Problem Setup Binary Scheme Allocation to Heterogeneous Workers

2 / 21

slide-3
SLIDE 3

Issues and Motivation

Introduction and Motivation

Machine Learning Today: Curse of Dimensionality ◮ Large Datasets — many samples ◮ Complex Datasets — large dimension ◮ Problems become intractable Use distributed methods ◮ Distribute smaller computation assignments ◮ Multiple servers complete various tasks Drawbacks of Distributed Synchronous Computations ◮ Requires all servers to respond — communication overhead ◮ What if stragglers are present? ◮ Stragglers — servers with delays or non-responsive

3 / 21

slide-4
SLIDE 4

Gradient Coding1

Introduction and Motivation

  • 1. Speed up distributive computation — gradient methods
  • 2. Mitigate stragglers

1R Tandon et al. “Gradient Coding: Avoiding Stragglers in Synchronous

Gradient Descent”. In: stat 1050 (2017), p. 8.

4 / 21

slide-5
SLIDE 5

Benefits of our Binary Scheme

Introduction and Motivation

Few schemes deal with exact recovery Common issues with current exact recovery schemes

  • 1. construct and search through a decoding matrix 1 AT ∈ R(

n s)×n

  • 2. storage issue, and further delay
  • 3. work over R and C — further numerical instability
  • 4. have a strict assumption that (s + 1) | n

Our scheme

  • 1. faster online decoding
  • 2. only deal with {0, 1} encodings — view as “task assignments”
  • 3. ... this makes encoding and decoding numerically stable
  • 4. works for any pair s, n
  • 5. ... extend our construction to work for heterogeneous workers also

5 / 21

slide-6
SLIDE 6

Outline for section 2

Introduction and Motivation Gradient Coding Problem Setup Binary Scheme Allocation to Heterogeneous Workers

6 / 21

slide-7
SLIDE 7

Distributed Gradient Descent

Gradient Coding

◮ Dataset D = {(xi, yi)}N

i=1 Rp × R, or X ∈ RN×p; y ∈ RN

◮ Partition D =

k

  • j=1

Dj, s.t. Di ∩ Dj = ∅ and |Dj| = N

k

◮ Partial gradients gj — gradient on Dj

◮ Minimize the loss L(D; θ) =

k

  • j=1

ℓ(Dj; θ) ◮ Gradient descent updates: θ(t+1) = θ(t) − αtg(t)

◮ g (t) = ∇θL

  • D; θ(t)

=

k

  • j=1

g (t)

j

  • ∇θℓ
  • Dj; θ(t)

=

k

  • j=1

g (t)

j

◮ additive structure allows g (t) to be computed in parallel!

7 / 21

slide-8
SLIDE 8

Synchronous Distributed Computation

Gradient Coding

◮ Execute gradient descent distributively ◮ Need all workers to respond

Figure: Need all responses — g = g1 + g2 + g3

8 / 21

slide-9
SLIDE 9

Table of Contents

Introduction and Motivation Gradient Coding Problem Setup Binary Scheme Allocation to Heterogeneous Workers

9 / 21

slide-10
SLIDE 10

General Setup

Problem Setup

10 / 21

slide-11
SLIDE 11

Encoding matrix

Problem Setup

◮ Rows: workers {Wi}n

i=1

◮ bi = encoding vector for Wi ◮ Columns: partitions {Dj}k

i=1

  • 1. nonzero entries: assigned partitions
  • 2. redundancy in assigned Dj’s

◮ Stragglers ≡ erasing rows of B

11 / 21

slide-12
SLIDE 12

Table of Contents

Introduction and Motivation Gradient Coding Problem Setup Binary Scheme Allocation to Heterogeneous Workers

12 / 21

slide-13
SLIDE 13

Example of our Binary Scheme

Binary Scheme

n = k = 11, s = 3 = ⇒ r ≡ 3 mod (s + 1) r workers for B1, and (s + 1 − r) for B2

B1 =               1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1               ∈ {0, 1}9×11 B2 =

  • 1

1 1 1 1 1 1 1 1 1 1

  • ∈ {0, 1}2×11

13 / 21

slide-14
SLIDE 14

Example — Encoding and Decoding

Binary Scheme

Decoding: only take received workers of same color Example: aT

{2,6,10}B = 111×1

B =                   1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1                   aI ∈                                                      1 1 1                   ,                   1 1 1                   ,                   1 1 1                   ,                   1 1                                                     

14 / 21

slide-15
SLIDE 15

Main Idea of Our Binary Scheme

Binary Scheme

◮ Have B as sparse as possible = ⇒ nnzr(B) = k · (s + 1) ◮ Work with congruence classes (mods + 1)

◮ superposition of rows of each class results in 11×k

◮ Allocate tasks s.t. bi0 ≃ bj0 for all i, j ∈ {1, · · · , n}, while satisfying the above two constraints ◮ Formally, construct B that is a solution to min

B∈Nn×k

  • n
  • i=1
  • bi0 −(s +1)·k/n
  • s.t. nnzr(B) = k ·(s +1)

◮ Intuition: B is close to being block diagonal

15 / 21

slide-16
SLIDE 16

Construction and Decoding

Binary Scheme

◮ Congruence classes C1 = {[i]}r−1

i=0 and C2 = {[i]}s i=r :

  • 1. r ≡ n mod (s + 1)
  • 2. respectively identically
  • 3. within each C1, C2, cardinalities do not differ by more than one
  • 4. construct B1 and B2

◮ B = aggregation of B1 and B2 ◮ Decoding: By the pigeonhole principle, for any f workers, at least one complete residue system is present

16 / 21

slide-17
SLIDE 17

Larger Example: n = k = 165 and s = 15 = ⇒ r = 5

Binary Scheme

Do not want a lot of redundancy — close to block diagonal

17 / 21

slide-18
SLIDE 18

Outline for section 3

Introduction and Motivation Gradient Coding Problem Setup Binary Scheme Allocation to Heterogeneous Workers

18 / 21

slide-19
SLIDE 19

Setup a Linear System

Allocation to Heterogeneous workers

◮ Assume two groups of different machines T1, T2, s.t. : ti = E[time for Ti to compute gj] and t1 t2 ◮ Goal: Want same expectation time for each worker ◮ Let |JTi| = # of partitions allocated to Ti’s workers ◮ Let |Ti| = τi and τ1 = α

β · τ2

Solve the linear system:

  • 1. t1 · |JT1| = t2 · |JT2|
  • 2. |JT1| · τ1 + |JT2| · τ2 = (s + 1) · k
  • 3. τ2 = β

α · τ1

19 / 21

slide-20
SLIDE 20

Main Takeaways of Our Scheme

◮ Gave a simple gradient coding scheme ◮ Faster online decoding ◮ Numerically stable in encoding and decoding ◮ Works for any pair s, n ◮ Extended it to accommodate heterogeneous workers also

20 / 21

slide-21
SLIDE 21

Thank you for your attention!

slide-22
SLIDE 22

Outline for section 4

Additional Slides Details of the constructions Explicit Algorithms

22 / 21

slide-23
SLIDE 23

Idea Behind Binary Scheme

Details of the constructions

◮ When (s + 1) | n and k = n — B is block diagonal ◮ assign to each worker ℓ =

  • n

s+1

  • partitions in a repeated sense

◮ For (s + 1) ∤ n, each worker in blocks of (s + 1) rows corresponds to a distinct congruence class (c.c.) mod(s + 1) ◮ When any f workers send their computations, at least one congruence class is met in every block — pigeonhole ◮ ∃i ∈ Z/(s + 1) s.t.

  • i + j(s + 1)
  • ∈ I, for all j = 0, 1, · · · , ℓ − 1

◮ there received workers “always form a coset” ◮ Decoding: select any such i, and sum the vectors received by the workers of the c.c. i — aT =

ℓ−1

  • j=0

ei+j(s+1) ◮ Want “even” number of assignments — homogeneous servers

23 / 21

slide-24
SLIDE 24

Binary Scheme when (s + 1) ∤ n

Details of the constructions

◮ Determine the integer parameters

◮ n = ℓ · (s + 1) + r 0 ≤ r < s + 1 ◮ r = t · ℓ + q 0 ≤ q < ℓ ◮ n = λ · (ℓ + 1) + ˜ r 0 ≤ ˜ r < ℓ + 1

◮ Define: C1 := {[i]s+1}r−1

i=0

and C2 := {[i]s+1}s

i=r

◮ workers C1 lie in all (ℓ + 1) blocks, and C1 lie in first ℓ

◮ C1 load: {s + 1, s} if ℓ + r > s, o.w. {λ + 1, λ} ◮ C2 load: {s + t + 2, s + t + 1} if q > 0, o.w. all have s + t + 1

24 / 21

slide-25
SLIDE 25

Encoding C1

Explicit Algorithms

Algorithm 1: Determining ˜

BC1

Input: number of workers n and stragglers s, where s < n both positive integers Output: encoding matrix ˜ BC1 ∈ {0, 1}n×n ⊲ for simplicity, we assume n = k ˜ BC1 ← 0n×n use division algorithm to get parameters: n = ℓ · (s + 1) r = t · ℓ + q n = λ · (s + 1) + r denote the sets of classes C1 = {[i]s+1}r−1

i=0

for i ∈ C1 do if ℓ + r > s then for j = 1 to ℓ + r − s do ˜ BC1

  • (j − 1)(s + 1) + i, (j − 1)(s + 1) + 1 : j(s + 1)
  • = 1s+1

end for j = ℓ + r − s + 1 to ℓ + 1 do ˜ BC1

  • (j − 1)(s + 1) + i, (j − 1)s + (ℓ + r − s) + 1 : (j − 1)s + ℓ + r
  • = 1s

end end else if ℓ + r ≤ s then for j = 1 to ˜ r do ˜ BC1

  • (j − 1)(s + 1) + i, (j − 1)(λ + 1) + 1 : j(λ + 1)
  • = 1λ+1

end for j = ˜ r + 1 to ℓ + 1 do ˜ BC1

  • (j − 1)(s + 1) + i, (j − 1)λ + ˜

r + 1 : (j − 1)λ + ˜ r + λ

  • = 1λ

end end end

return ˜ BC1

25 / 21

slide-26
SLIDE 26

Encoding C2

Explicit Algorithms

Algorithm 2: Determining ˜

BC2

Input: number of workers n and stragglers s, where s < n both positive integers Output: encoding matrix ˜ BC2 ∈ {0, 1}n×n ⊲ for simplicity, we assume n = k ˜ BC2 ← 0n×n use division algorithm to get parameters: n = ℓ · (s + 1) r = t · ℓ + q n = λ · (s + 1) + r denote the sets of classes C2 = {[i]s+1}s

i=r

for i ∈ C2 do if q = 0 then for j = 1 to ℓ do ˜ BC2

  • (j − 1)(s + 1) + i, (j − 1)(s + t + 1) + 1 : j(s + t + 1)
  • = 1s+t+1

end end else if q > 0 then for j = 1 to q do ˜ BC2

  • (j − 1)(s + 1) + i, (j − 1)(s + t + 2) + 1 : j(s + t + 1)
  • = 1s+t+2

end for j = q + 1 to ℓ do B

  • (j − 1)(s + 1) + i, (j − 1)(s + t + 1) + q + 1 : j(s + t + 1) + q
  • = 1s+t+1

end end end return ˜ BC2 26 / 21

slide-27
SLIDE 27

Decoding Vector

Explicit Algorithms

Algorithm 3: Determining aI

Input: received indicator-vector recI Output: decoding vector aI if r=0 then for i = 0 to s do if (recI)i = 1 then l ← i if supp(al) ⊆ supp(recI) then a ← al break end end end end else if r > 0 then run the above for-loop for i = r to s and then for i = 0 to r − 1 end return aI ← a

27 / 21