Regression via Iteratively Reweighted Least Squares Alina Ene, - - PowerPoint PPT Presentation
Regression via Iteratively Reweighted Least Squares Alina Ene, - - PowerPoint PPT Presentation
Improved Convergence for and 1 Regression via Iteratively Reweighted Least Squares Alina Ene, Adrian Vladu IRLS Method Basic primitive: min r i x i 2 Ax = b IRLS Method Basic primitive: min r i x i 2 Ax = b solution
IRLS Method min ∑rixi
2
Ax = b
Basic primitive:
IRLS Method min ∑rixi
2
Ax = b
Basic primitive:
x = R-1AT(ATR-1A)-1Ab
solution given by one linear system solve * R = diag(r)
IRLS Method min ∑rixi
2
Ax = b
Basic primitive:
min |x|p Ax = b x = R-1AT(ATR-1A)-1Ab
solution given by one linear system solve ** p = {1, ∞} * R = diag(r) “Hard” problem: * **
IRLS Method min ∑rixi
2
Ax = b
Basic primitive:
min |x|p Ax = b x = R-1AT(ATR-1A)-1Ab
solution given by one linear system solve ** p = {1, ∞} * R = diag(r) “Hard” problem: equivalent to linear programming * **
IRLS Method min ∑rixi
2
Ax = b
Basic primitive:
min |x|p Ax = b x = R-1AT(ATR-1A)-1Ab
solution given by one linear system solve ** p = {1, ∞} * R = diag(r) “Hard” problem: equivalent to linear programming * **
IRLS Method min ∑rixi
2
Ax = b
Basic primitive:
min |x|p Ax = b x = R-1AT(ATR-1A)-1Ab
solution given by one linear system solve ** p = {1, ∞} * R = diag(r) “Hard” problem: equivalent to linear programming * **
IRLS Method min ∑rixi
2
Ax = b
Basic primitive:
min |x|p Ax = b x = R-1AT(ATR-1A)-1Ab
solution given by one linear system solve ** p = {1, ∞} * R = diag(r) “Hard” problem: equivalent to linear programming * **
IRLS Method min ∑rixi
2
Ax = b
Basic primitive:
min |x|p Ax = b x = R-1AT(ATR-1A)-1Ab
solution given by one linear system solve ** p = {1, ∞} * R = diag(r) “Hard” problem: equivalent to linear programming * **
Benchmark: Optimization on Graphs min |x|∞ Ax = b
s t
Benchmark: Optimization on Graphs min |x|∞ Ax = b
minimize congestion of flow x
s t
Benchmark: Optimization on Graphs min |x|∞ Ax = b
minimize congestion of flow x boundary condition: x routes demand from s to t
s t
Benchmark: Optimization on Graphs min |x|∞ Ax = b
.5 .5 .5 .5 .5 .5
minimize congestion of flow x boundary condition: x routes demand from s to t
Maximum flow
s t
+1 +1
- 1
- 1
min |x|1 Ax = b Benchmark: Optimization on Graphs
+1 +1
- 1
- 1
min |x|1 Ax = b
minimize cost of flow x
Benchmark: Optimization on Graphs
+1 +1
- 1
- 1
min |x|1 Ax = b
minimize cost of flow x boundary condition: x routes demand from +1 to -1
Benchmark: Optimization on Graphs
Minimum cost flow
+1 +1
- 1
- 1
min |x|1 Ax = b
minimize cost of flow x boundary condition: x routes demand from +1 to -1
1 1 1 1
Benchmark: Optimization on Graphs
min |x|∞ Ax = b min |x|1 Ax = b
max flow min cost flow
Benchmark: Optimization on Graphs
min |x|∞ Ax = b min |x|1 Ax = b
max flow min cost flow Q: Are these problems really that hard?
Benchmark: Optimization on Graphs
min |x|∞ Ax = b min |x|1 Ax = b
max flow min cost flow Q: Are these problems really that hard? First order methods (gradient descent)
➜ running time strongly depends on matrix structure ➜ in general, takes time at least Ω(m1.5/poly(ε))
Second order methods (Newton method, IRLS)
➜ interior point method: Õ(m1/2) linear system solves ➜ can be made Õ(n1/2) with a lot of work [LS ’14]
“Hybrid” method
➜ [CKMST, STOC ’11] Õ(m1/3/ε11/3) linear system solves ➜ ~30 pages of description and proofs for complicated method
Benchmark: Optimization on Graphs
min |x|∞ Ax = b min |x|1 Ax = b
max flow min cost flow Q: Are these problems really that hard? First order methods (gradient descent)
➜ running time strongly depends on matrix structure ➜ in general, takes time at least Ω(m1.5/poly(ε))
Second order methods (Newton method, IRLS)
➜ interior point method: Õ(m1/2) linear system solves ➜ can be made Õ(n1/2) with a lot of work [Lee-Sidford ’14]
“Hybrid” method
➜ [CKMST, STOC ’11] Õ(m1/3/ε11/3) linear system solves ➜ ~30 pages of description and proofs for complicated method
Benchmark: Optimization on Graphs
min |x|∞ Ax = b min |x|1 Ax = b
max flow min cost flow Q: Are these problems really that hard? First order methods (gradient descent)
➜ running time strongly depends on matrix structure ➜ in general, takes time at least Ω(m1.5/poly(ε))
Second order methods (Newton method, IRLS)
➜ interior point method: Õ(m1/2) linear system solves ➜ can be made Õ(n1/2) with a lot of work [Lee-Sidford ’14]
“Hybrid” method
➜ [Christiano-Kelner-Madry-Spielman-Teng ’11] Õ(m1/3/ε11/3) linear system solves ➜ ~30 pages of description and proofs for complicated method
Benchmark: Optimization on Graphs
This work
Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations
This work
Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations * no matter what the structure of the underlying matrix is
This work
Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations
min |x|∞ Ax = b s t
≤ OPT
This work
Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations
min |x|∞ Ax = b s t
≤ OPT Guess OPT value (.5)
This work
Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations
min |x|∞ Ax = b s t
≤ OPT Guess OPT value (.5)
This work
Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations
min |x|∞ Ax = b s t r = 1
≤ OPT Guess OPT value Initialize
1 1 1 1 1 1
(.5)
This work
Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations
min |x|∞ Ax = b s t min ∑ rixi
2
Ax = b r = 1
≤ OPT Guess OPT value Initialize Solve least squares problem
1 1 1 1 1 1 .6 .4 .2 .4 .4 .4 .6
(.5)
This work
Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations
min |x|∞ Ax = b s t min ∑ rixi
2
Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}
≤ OPT Guess OPT value Initialize Solve least squares problem Update r
1 1 1.44 1 1 1.44 .6 .4 .2 .4 .4 .4 .6
(.5)
This work
Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations
min |x|∞ Ax = b s t min ∑ rixi
2
Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}
≤ OPT Guess OPT value Initialize Solve least squares problem Update r
1 1 1.44 1 1 1.44 .6 .4 .2 .4 .4 .4 .6
(.5)
This work
Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations
min |x|∞ Ax = b s t min ∑ rixi
2
Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}
≤ OPT Guess OPT value Initialize Solve least squares problem Update r
1 1 1.44 1 1 1.44
(.5)
This work
Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations
min |x|∞ Ax = b s t min ∑ rixi
2
Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}
≤ OPT Guess OPT value Initialize Solve least squares problem Update r
1 1 1.44 1 1 1.44 .55 .44 .11 .44 .44 .44 .55
(.5)
This work
Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations
min |x|∞ Ax = b s t min ∑ rixi
2
Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}
≤ OPT Guess OPT value Initialize Solve least squares problem Update r
1 1 1.75 1 1 1.75 .55 .44 .11 .44 .44 .44 .55
(.5)
This work
Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations
min |x|∞ Ax = b s t min ∑ rixi
2
Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}
≤ OPT Guess OPT value Initialize Solve least squares problem Update r
1 1 1.75 1 1 1.75
(.5)
This work
Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations
min |x|∞ Ax = b s t min ∑ rixi
2
Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}
≤ OPT Guess OPT value Initialize Solve least squares problem Update r (.5)
This work
Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations
min |x|∞ Ax = b s t min ∑ rixi
2
Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}
≤ OPT Guess OPT value Initialize Solve least squares problem Update r
1 1 2 1 1 2
(.5)
This work
Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations
min |x|∞ Ax = b s t min ∑ rixi
2
Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}
≤ OPT Guess OPT value Initialize Solve least squares problem Update r
1 1 1 1 .5 .5 .5 .5 .5 .5
(.5)
2 2
➜ Objective function is maxr≥0 minAx=b ∑rixi
2/∑ri
Similar analysis to packing/covering LP [Young ’01] ℓ1 version is a type of “slime mold dynamics” [Straszak- Vishnoi ’16, ‘17]
Nonstandard Optimization Primitive
➜ Objective function is maxr≥0 minAx=b ∑rixi
2/∑ri
➜ Similar analysis to packing/covering LP [Young ’01] ℓ1 version is a type of “slime mold dynamics” [Straszak- Vishnoi ’16, ‘17]
Nonstandard Optimization Primitive
➜ Objective function is maxr≥0 minAx=b ∑rixi
2/∑ri
➜ Similar analysis to packing/covering LP [Young ’01] ➜ ℓ1 version is a type of “slime mold dynamics” [Straszak- Vishnoi ’16, ‘17]
Nonstandard Optimization Primitive
➜ Objective function is maxr≥0 minAx=b ∑rixi
2/∑ri
➜ Similar analysis to packing/covering LP [Young ’01] ➜ ℓ1 version is a type of “slime mold dynamics” [Straszak- Vishnoi ’16, ‘17]
Nonstandard Optimization Primitive
➜ Objective function is maxr≥0 minAx=b ∑rixi
2/∑ri