Communication trade-offs for synchronized distributed SGD with large step size
Aymeric DIEULEVEUT
EPLF, MLO
17 november 2017 Joint work with Kumar Kshitij Patel.
1
Communication trade-offs for synchronized distributed SGD with large - - PowerPoint PPT Presentation
Communication trade-offs for synchronized distributed SGD with large step size Aymeric DIEULEVEUT EPLF, MLO 17 november 2017 Joint work with Kumar Kshitij Patel. 1 Outline 1. Stochastic gradient descent - supervised machine learning -
1
2
3
3
3
4
4
5
6
6
7
7
7
8
k||gk(θk)||2
k||gk(θ⋆)||2
k||gk(θ⋆)||2 9
k||gk(θk)||2
k||gk(θ⋆)||2
k||gk(θ⋆)||2
1 µk , telescopic sum + Jensen:
9
10
10
10
11
11
11
11
11
12
12
13
13
14
14
15
16
C
t=1 Nt C
P
Nt
p,k−1 − θt p,k
k
p,k(θt p,k−1) − F ′(θt p,k−1)
p,k−1) − F ′′(θ⋆)(θt p,k−1 − θ⋆)
17
C
t=1 Nt C
P
Nt
p,k−1 − θt p,k
k
p,k(θt p,k−1) − F ′(θt p,k−1)
p,k−1) − F ′′(θ⋆)(θt p,k−1 − θ⋆)
p,k − θ⋆||2)
p,k − θ⋆||2 17
t − θ⋆
C
F ′′(θ⋆)
18
t − θ⋆
C
F ′′(θ⋆)
p,k − θ⋆
C
F ′′(θ⋆)
19
19
t−1 − θ⋆
1
∞η
1
p,k − θ⋆
1
+k θ0 − θ⋆2
∞η
1
1 µηP , then the second order moment of
p,k admits the same upper bound as the mini-batch iterate ˆ
Nt−1
1
+k MB
t−1 − θ⋆
1
∞η
1
p,k − θ⋆
1
+k θ0 − θ⋆2
∞η
1
1 µηP , then the second order moment of
p,k admits the same upper bound as the mini-batch iterate ˆ
Nt−1
1
+k MB
20
21
22
22
t−θ⋆
22
23
23
23
23
23