[PPT] - Regression via Iteratively Reweighted Least Squares Alina Ene, PowerPoint Presentation

SLIDE 1

Improved Convergence for ℓ∞ and ℓ1 Regression via Iteratively Reweighted Least Squares Alina Ene, Adrian Vladu

SLIDE 2

IRLS Method min ∑rixi

2

Ax = b

Basic primitive:

SLIDE 3

IRLS Method min ∑rixi

2

Ax = b

Basic primitive:

x = R-1AT(ATR-1A)-1Ab

solution given by one linear system solve * R = diag(r)

SLIDE 4

IRLS Method min ∑rixi

2

Ax = b

Basic primitive:

min |x|p Ax = b x = R-1AT(ATR-1A)-1Ab

solution given by one linear system solve ** p = {1, ∞} * R = diag(r) “Hard” problem: * **

SLIDE 5

IRLS Method min ∑rixi

2

Ax = b

Basic primitive:

min |x|p Ax = b x = R-1AT(ATR-1A)-1Ab

solution given by one linear system solve ** p = {1, ∞} * R = diag(r) “Hard” problem: equivalent to linear programming * **

SLIDE 6

IRLS Method min ∑rixi

2

Ax = b

Basic primitive:

min |x|p Ax = b x = R-1AT(ATR-1A)-1Ab

solution given by one linear system solve ** p = {1, ∞} * R = diag(r) “Hard” problem: equivalent to linear programming * **

SLIDE 7

IRLS Method min ∑rixi

2

Ax = b

Basic primitive:

min |x|p Ax = b x = R-1AT(ATR-1A)-1Ab

solution given by one linear system solve ** p = {1, ∞} * R = diag(r) “Hard” problem: equivalent to linear programming * **

SLIDE 8

IRLS Method min ∑rixi

2

Ax = b

Basic primitive:

min |x|p Ax = b x = R-1AT(ATR-1A)-1Ab

solution given by one linear system solve ** p = {1, ∞} * R = diag(r) “Hard” problem: equivalent to linear programming * **

SLIDE 9

IRLS Method min ∑rixi

2

Ax = b

Basic primitive:

min |x|p Ax = b x = R-1AT(ATR-1A)-1Ab

solution given by one linear system solve ** p = {1, ∞} * R = diag(r) “Hard” problem: equivalent to linear programming * **

SLIDE 10

Benchmark: Optimization on Graphs min |x|∞ Ax = b

s t

SLIDE 11

Benchmark: Optimization on Graphs min |x|∞ Ax = b

minimize congestion of flow x

s t

SLIDE 12

Benchmark: Optimization on Graphs min |x|∞ Ax = b

minimize congestion of flow x boundary condition: x routes demand from s to t

s t

SLIDE 13

Benchmark: Optimization on Graphs min |x|∞ Ax = b

.5 .5 .5 .5 .5 .5

minimize congestion of flow x boundary condition: x routes demand from s to t

Maximum flow

s t

SLIDE 14

+1 +1

1
1

min |x|1 Ax = b Benchmark: Optimization on Graphs

SLIDE 15

+1 +1

1
1

min |x|1 Ax = b

minimize cost of flow x

Benchmark: Optimization on Graphs

SLIDE 16

+1 +1

1
1

min |x|1 Ax = b

minimize cost of flow x boundary condition: x routes demand from +1 to -1

Benchmark: Optimization on Graphs

SLIDE 17

Minimum cost flow

+1 +1

1
1

min |x|1 Ax = b

minimize cost of flow x boundary condition: x routes demand from +1 to -1

1 1 1 1

Benchmark: Optimization on Graphs

SLIDE 18

min |x|∞ Ax = b min |x|1 Ax = b

max flow min cost flow

Benchmark: Optimization on Graphs

SLIDE 19

min |x|∞ Ax = b min |x|1 Ax = b

max flow min cost flow Q: Are these problems really that hard?

Benchmark: Optimization on Graphs

SLIDE 20

min |x|∞ Ax = b min |x|1 Ax = b

max flow min cost flow Q: Are these problems really that hard? First order methods (gradient descent)

➜ running time strongly depends on matrix structure ➜ in general, takes time at least Ω(m1.5/poly(ε))

Second order methods (Newton method, IRLS)

➜ interior point method: Õ(m1/2) linear system solves ➜ can be made Õ(n1/2) with a lot of work [LS ’14]

“Hybrid” method

➜ [CKMST, STOC ’11] Õ(m1/3/ε11/3) linear system solves ➜ ~30 pages of description and proofs for complicated method

Benchmark: Optimization on Graphs

SLIDE 21

min |x|∞ Ax = b min |x|1 Ax = b

max flow min cost flow Q: Are these problems really that hard? First order methods (gradient descent)

➜ running time strongly depends on matrix structure ➜ in general, takes time at least Ω(m1.5/poly(ε))

Second order methods (Newton method, IRLS)

➜ interior point method: Õ(m1/2) linear system solves ➜ can be made Õ(n1/2) with a lot of work [Lee-Sidford ’14]

“Hybrid” method

➜ [CKMST, STOC ’11] Õ(m1/3/ε11/3) linear system solves ➜ ~30 pages of description and proofs for complicated method

Benchmark: Optimization on Graphs

SLIDE 22

min |x|∞ Ax = b min |x|1 Ax = b

max flow min cost flow Q: Are these problems really that hard? First order methods (gradient descent)

➜ running time strongly depends on matrix structure ➜ in general, takes time at least Ω(m1.5/poly(ε))

Second order methods (Newton method, IRLS)

➜ interior point method: Õ(m1/2) linear system solves ➜ can be made Õ(n1/2) with a lot of work [Lee-Sidford ’14]

“Hybrid” method

➜ [Christiano-Kelner-Madry-Spielman-Teng ’11] Õ(m1/3/ε11/3) linear system solves ➜ ~30 pages of description and proofs for complicated method

Benchmark: Optimization on Graphs

SLIDE 23

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

SLIDE 24

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations * no matter what the structure of the underlying matrix is

SLIDE 25

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t

≤ OPT

SLIDE 26

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t

≤ OPT Guess OPT value (.5)

SLIDE 27

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t

≤ OPT Guess OPT value (.5)

SLIDE 28

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t r = 1

≤ OPT Guess OPT value Initialize

1 1 1 1 1 1

(.5)

SLIDE 29

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t min ∑ rixi

2

Ax = b r = 1

≤ OPT Guess OPT value Initialize Solve least squares problem

1 1 1 1 1 1 .6 .4 .2 .4 .4 .4 .6

(.5)

SLIDE 30

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t min ∑ rixi

2

Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}

≤ OPT Guess OPT value Initialize Solve least squares problem Update r

1 1 1.44 1 1 1.44 .6 .4 .2 .4 .4 .4 .6

(.5)

SLIDE 31

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t min ∑ rixi

2

Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}

≤ OPT Guess OPT value Initialize Solve least squares problem Update r

1 1 1.44 1 1 1.44 .6 .4 .2 .4 .4 .4 .6

(.5)

SLIDE 32

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t min ∑ rixi

2

Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}

≤ OPT Guess OPT value Initialize Solve least squares problem Update r

1 1 1.44 1 1 1.44

(.5)

SLIDE 33

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t min ∑ rixi

2

Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}

≤ OPT Guess OPT value Initialize Solve least squares problem Update r

1 1 1.44 1 1 1.44 .55 .44 .11 .44 .44 .44 .55

(.5)

SLIDE 34

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t min ∑ rixi

2

Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}

≤ OPT Guess OPT value Initialize Solve least squares problem Update r

1 1 1.75 1 1 1.75 .55 .44 .11 .44 .44 .44 .55

(.5)

SLIDE 35

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t min ∑ rixi

2

Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}

≤ OPT Guess OPT value Initialize Solve least squares problem Update r

1 1 1.75 1 1 1.75

(.5)

SLIDE 36

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t min ∑ rixi

2

Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}

≤ OPT Guess OPT value Initialize Solve least squares problem Update r (.5)

SLIDE 37

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t min ∑ rixi

2

Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}

≤ OPT Guess OPT value Initialize Solve least squares problem Update r

1 1 2 1 1 2

(.5)

SLIDE 38

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t min ∑ rixi

2

Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}

≤ OPT Guess OPT value Initialize Solve least squares problem Update r

1 1 1 1 .5 .5 .5 .5 .5 .5

(.5)

2 2

SLIDE 39

➜ Objective function is maxr≥0 minAx=b ∑rixi

2/∑ri

Similar analysis to packing/covering LP [Young ’01] ℓ1 version is a type of “slime mold dynamics” [Straszak- Vishnoi ’16, ‘17]

Nonstandard Optimization Primitive

SLIDE 40

➜ Objective function is maxr≥0 minAx=b ∑rixi

2/∑ri

➜ Similar analysis to packing/covering LP [Young ’01] ℓ1 version is a type of “slime mold dynamics” [Straszak- Vishnoi ’16, ‘17]

Nonstandard Optimization Primitive

SLIDE 41

➜ Objective function is maxr≥0 minAx=b ∑rixi

2/∑ri

➜ Similar analysis to packing/covering LP [Young ’01] ➜ ℓ1 version is a type of “slime mold dynamics” [Straszak- Vishnoi ’16, ‘17]

Nonstandard Optimization Primitive

SLIDE 42

➜ Objective function is maxr≥0 minAx=b ∑rixi

2/∑ri

➜ Similar analysis to packing/covering LP [Young ’01] ➜ ℓ1 version is a type of “slime mold dynamics” [Straszak- Vishnoi ’16, ‘17]

Nonstandard Optimization Primitive

SLIDE 43

➜ Objective function is maxr≥0 minAx=b ∑rixi

2/∑ri

➜ Similar analysis to packing/covering LP [Young ’01] ➜ ℓ1 version is a type of “slime mold dynamics” [Straszak- Vishnoi ’16, ‘17] ➜ Any insights for new optimization methods?

Nonstandard Optimization Primitive

SLIDE 44