Numerically Stable Binary Gradient Coding
Neophytos Charalambides Hessam Mahdavifar Alfred Hero
Department of Electrical Engineering and Computer Science, University of Michigan
June, 2020
1 / 21
Numerically Stable Binary Gradient Coding Neophytos Charalambides - - PowerPoint PPT Presentation
Numerically Stable Binary Gradient Coding Neophytos Charalambides Hessam Mahdavifar Alfred Hero Department of Electrical Engineering and Computer Science, University of Michigan June, 2020 1 / 21 Outline for section 1 Introduction and
Neophytos Charalambides Hessam Mahdavifar Alfred Hero
Department of Electrical Engineering and Computer Science, University of Michigan
June, 2020
1 / 21
Introduction and Motivation Gradient Coding Problem Setup Binary Scheme Allocation to Heterogeneous Workers
2 / 21
Introduction and Motivation
Machine Learning Today: Curse of Dimensionality ◮ Large Datasets — many samples ◮ Complex Datasets — large dimension ◮ Problems become intractable Use distributed methods ◮ Distribute smaller computation assignments ◮ Multiple servers complete various tasks Drawbacks of Distributed Synchronous Computations ◮ Requires all servers to respond — communication overhead ◮ What if stragglers are present? ◮ Stragglers — servers with delays or non-responsive
3 / 21
Introduction and Motivation
1R Tandon et al. “Gradient Coding: Avoiding Stragglers in Synchronous
Gradient Descent”. In: stat 1050 (2017), p. 8.
4 / 21
Introduction and Motivation
Few schemes deal with exact recovery Common issues with current exact recovery schemes
n s)×n
Our scheme
5 / 21
Introduction and Motivation Gradient Coding Problem Setup Binary Scheme Allocation to Heterogeneous Workers
6 / 21
Gradient Coding
◮ Dataset D = {(xi, yi)}N
i=1 Rp × R, or X ∈ RN×p; y ∈ RN
◮ Partition D =
k
Dj, s.t. Di ∩ Dj = ∅ and |Dj| = N
k
◮ Partial gradients gj — gradient on Dj
◮ Minimize the loss L(D; θ) =
k
ℓ(Dj; θ) ◮ Gradient descent updates: θ(t+1) = θ(t) − αtg(t)
◮ g (t) = ∇θL
=
k
g (t)
j
=
k
g (t)
j
◮ additive structure allows g (t) to be computed in parallel!
7 / 21
Gradient Coding
◮ Execute gradient descent distributively ◮ Need all workers to respond
Figure: Need all responses — g = g1 + g2 + g3
8 / 21
Introduction and Motivation Gradient Coding Problem Setup Binary Scheme Allocation to Heterogeneous Workers
9 / 21
Problem Setup
10 / 21
Problem Setup
◮ Rows: workers {Wi}n
i=1
◮ bi = encoding vector for Wi ◮ Columns: partitions {Dj}k
i=1
◮ Stragglers ≡ erasing rows of B
11 / 21
Introduction and Motivation Gradient Coding Problem Setup Binary Scheme Allocation to Heterogeneous Workers
12 / 21
Binary Scheme
n = k = 11, s = 3 = ⇒ r ≡ 3 mod (s + 1) r workers for B1, and (s + 1 − r) for B2
B1 = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ∈ {0, 1}9×11 B2 =
1 1 1 1 1 1 1 1 1 1
13 / 21
Binary Scheme
Decoding: only take received workers of same color Example: aT
{2,6,10}B = 111×1
B = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 aI ∈ 1 1 1 , 1 1 1 , 1 1 1 , 1 1
14 / 21
Binary Scheme
◮ Have B as sparse as possible = ⇒ nnzr(B) = k · (s + 1) ◮ Work with congruence classes (mods + 1)
◮ superposition of rows of each class results in 11×k
◮ Allocate tasks s.t. bi0 ≃ bj0 for all i, j ∈ {1, · · · , n}, while satisfying the above two constraints ◮ Formally, construct B that is a solution to min
B∈Nn×k
◮ Intuition: B is close to being block diagonal
15 / 21
Binary Scheme
◮ Congruence classes C1 = {[i]}r−1
i=0 and C2 = {[i]}s i=r :
◮ B = aggregation of B1 and B2 ◮ Decoding: By the pigeonhole principle, for any f workers, at least one complete residue system is present
16 / 21
Binary Scheme
Do not want a lot of redundancy — close to block diagonal
17 / 21
Introduction and Motivation Gradient Coding Problem Setup Binary Scheme Allocation to Heterogeneous Workers
18 / 21
Allocation to Heterogeneous workers
◮ Assume two groups of different machines T1, T2, s.t. : ti = E[time for Ti to compute gj] and t1 t2 ◮ Goal: Want same expectation time for each worker ◮ Let |JTi| = # of partitions allocated to Ti’s workers ◮ Let |Ti| = τi and τ1 = α
β · τ2
Solve the linear system:
α · τ1
19 / 21
◮ Gave a simple gradient coding scheme ◮ Faster online decoding ◮ Numerically stable in encoding and decoding ◮ Works for any pair s, n ◮ Extended it to accommodate heterogeneous workers also
20 / 21
Additional Slides Details of the constructions Explicit Algorithms
22 / 21
Details of the constructions
◮ When (s + 1) | n and k = n — B is block diagonal ◮ assign to each worker ℓ =
s+1
◮ For (s + 1) ∤ n, each worker in blocks of (s + 1) rows corresponds to a distinct congruence class (c.c.) mod(s + 1) ◮ When any f workers send their computations, at least one congruence class is met in every block — pigeonhole ◮ ∃i ∈ Z/(s + 1) s.t.
◮ there received workers “always form a coset” ◮ Decoding: select any such i, and sum the vectors received by the workers of the c.c. i — aT =
ℓ−1
ei+j(s+1) ◮ Want “even” number of assignments — homogeneous servers
23 / 21
Details of the constructions
◮ Determine the integer parameters
◮ n = ℓ · (s + 1) + r 0 ≤ r < s + 1 ◮ r = t · ℓ + q 0 ≤ q < ℓ ◮ n = λ · (ℓ + 1) + ˜ r 0 ≤ ˜ r < ℓ + 1
◮ Define: C1 := {[i]s+1}r−1
i=0
and C2 := {[i]s+1}s
i=r
◮ workers C1 lie in all (ℓ + 1) blocks, and C1 lie in first ℓ
◮ C1 load: {s + 1, s} if ℓ + r > s, o.w. {λ + 1, λ} ◮ C2 load: {s + t + 2, s + t + 1} if q > 0, o.w. all have s + t + 1
24 / 21
Explicit Algorithms
Algorithm 1: Determining ˜
BC1
Input: number of workers n and stragglers s, where s < n both positive integers Output: encoding matrix ˜ BC1 ∈ {0, 1}n×n ⊲ for simplicity, we assume n = k ˜ BC1 ← 0n×n use division algorithm to get parameters: n = ℓ · (s + 1) r = t · ℓ + q n = λ · (s + 1) + r denote the sets of classes C1 = {[i]s+1}r−1
i=0
for i ∈ C1 do if ℓ + r > s then for j = 1 to ℓ + r − s do ˜ BC1
end for j = ℓ + r − s + 1 to ℓ + 1 do ˜ BC1
end end else if ℓ + r ≤ s then for j = 1 to ˜ r do ˜ BC1
end for j = ˜ r + 1 to ℓ + 1 do ˜ BC1
r + 1 : (j − 1)λ + ˜ r + λ
end end end
return ˜ BC1
25 / 21
Explicit Algorithms
Algorithm 2: Determining ˜
BC2
Input: number of workers n and stragglers s, where s < n both positive integers Output: encoding matrix ˜ BC2 ∈ {0, 1}n×n ⊲ for simplicity, we assume n = k ˜ BC2 ← 0n×n use division algorithm to get parameters: n = ℓ · (s + 1) r = t · ℓ + q n = λ · (s + 1) + r denote the sets of classes C2 = {[i]s+1}s
i=r
for i ∈ C2 do if q = 0 then for j = 1 to ℓ do ˜ BC2
end end else if q > 0 then for j = 1 to q do ˜ BC2
end for j = q + 1 to ℓ do B
end end end return ˜ BC2 26 / 21
Explicit Algorithms
Algorithm 3: Determining aI
Input: received indicator-vector recI Output: decoding vector aI if r=0 then for i = 0 to s do if (recI)i = 1 then l ← i if supp(al) ⊆ supp(recI) then a ← al break end end end end else if r > 0 then run the above for-loop for i = r to s and then for i = 0 to r − 1 end return aI ← a
27 / 21