Fast Algorithms for Distributed Optimization
- ver Time-varying Graphs
Fast Algorithms for Distributed Optimization over Time-varying - - PowerPoint PPT Presentation
Fast Algorithms for Distributed Optimization over Time-varying Graphs Angelia.Nedich@asu.edu School of Electrical, Computer, and Energy Engineering Arizona State University at Tempe Collaborative work with Wei (Wilbur) Shi and Alexander
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
1
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
2
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
Wireless Sensor Networks (WSN), https://www.linkedin.com/pulse/internet-things-part-7-wireless-sensor-networks-mahendra-bhatia
∗An article by Mahendra Bhatia at https://www.linkedin.com/pulse/internet-things-part-7-wireless-sensor-networks-mahendra- bhatia 3
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
4
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
5
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
6
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
m
†For sake of discussion, convex and nondifferentiable will also work 7
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
m
m
m
t=0 αt = +∞ and ∞ t=0 α2 t < ∞, one can show that
t→∞ xi(t) = x∗
√ t).
i=1 fi(x) is strongly convex the rate is of the order of O(ln t t ) ‡AN and A. Ozdaglar 2009 8
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
9
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
10
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
11
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
n
m
§Gharesifard and Cort´ es, ”Distributed strategies for generating weight-balanced and doubly stochastic digraphs,” European Journal of Control, 18 (6), 539–557, 2012 ¶∗D. Kempe, A. Dobra, and J. Gehrke Gossip-based computation of aggregate information, In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science, pages 482–491, 2003
stochastic matrices, In Proceedings of the 2010 IEEE International Symposium on Information Theory, 2010 12
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
13
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
t→∞ At = π1′
14
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
t→∞ x(t) = lim t→∞ Atx(0) = π1′x(0) =
t→∞ y(t) = 1′y(0)
t→∞ zi(t) = (1′x(0)) πi
t→∞ zi(t) = 1
15
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
i ∪{i}
i ∪{i}
i
†D. Kempe, A. Dobra, and J. Gehrke ”Gossip-based computation of aggregate information” In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science, pages 482491, Oct. 2003
distributed averaging using non-doubly stochastic matrices” In Proceedings of the 2010 IEEE International Symposium on Information Theory, Jun. 2010. 16
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
i (t)∪{i}
i (t)∪{i}
17
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
i=1 xi(t)
t
i (t) ∪ {i} and 0 otherwise
18
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
i=1 fi(z) over
i (t)∪{i}
i (t)∪{i}
19
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
∞
∞
i=1 fi(z) over z ∈ Rn.
∗∗We note that we make use here of the assumption that node i knows its out-degree di(t). ††#AN and Olshevsky Distributed Optimization over Time-varying Directed Graphs IEEE Transactions on Automatic Control 60 (3) 601-615, 2015 AN and Olshevsky Stochastic Gradient-Push for Strongly Convex Functions on Time-Varying Directed Graphs arxiv 2015 20
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
i=1 fi(x) is strongly convex
m
21
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
m
m
m
22
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
m
m
23
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
i=1 ∇fi(z(t)).
m
i=1 ∇fi(z(t))
m
m
m
24
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
25
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
26
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
i=1 fi(x)
m
i=1 ∇fi(z(t))
m
27
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
28
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
1 —
2 —
m —
i = xT j , ∀i = j.
m
29
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
j∈Nin i (k) Wij(k)xj(k) − αyi(k);
j∈Nin i (k) Wij(k)yj(k)
30
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
m
i=1 fi is strongly convex with a coefficient ¯
t=k
31
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
2B
µ 1.5,
1.5
1+(1−δ2)J1−δJ1
¯ µJ1(J1+1)2
B
µJ1 1.5 + δ,
1.5
1+(1−δ2)J1−δJ1
¯ µJ1(J1+1)2
¯ µJ1
k≥B−1
¯ µJ1
¯ µ, L = maxi Li. 32
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
1 1+do j (k) when (j, i) ∈ Ek where
j(k) is the out-degree of node j at time k. 33
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
ℓ∈N(k) Wiℓ(k),
34
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
35
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
F
F
F
F
F
F
F
0≤k≤K
36
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
k
500 1000 1500 2000 2500
Residual
10-10 10-8 10-6 10-4 10-2 100
Gradient-Push, α(k) = 4.3/k1/2 DIGing, α = 0.39 DIGing-ATC, α = 0.47 Push-DIGing, α = 0.26
37
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
k
500 1000 1500 2000 2500
Residual
10-10 10-8 10-6 10-4 10-2 100
Gradient-Push, α(k) = 10/k1/2 DIGing, α = 0.37 DIGing-ATC, α = 0.89 Push-DIGing, α = 1.2
38
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
k
1000 2000 3000 4000 5000 6000
Residual
10-10 10-8 10-6 10-4 10-2 100
Gradient-Push, α(k) = 2.6/k1/2 Push-DIGing, α = 0.12 Gradient-Push, α(k) = 3.1/k1/2 Push-DIGing, α = 0.14
39
Rutgers University DIMACS Workshop on Distributed Optimization, Information Processing, and Learning
40