Regression via Iteratively Reweighted Least Squares Alina Ene, - - PowerPoint PPT Presentation

regression via iteratively reweighted least squares alina
SMART_READER_LITE
LIVE PREVIEW

Regression via Iteratively Reweighted Least Squares Alina Ene, - - PowerPoint PPT Presentation

Improved Convergence for and 1 Regression via Iteratively Reweighted Least Squares Alina Ene, Adrian Vladu IRLS Method Basic primitive: min r i x i 2 Ax = b IRLS Method Basic primitive: min r i x i 2 Ax = b solution


slide-1
SLIDE 1

Improved Convergence for ℓ∞ and ℓ1 Regression via Iteratively Reweighted Least Squares Alina Ene, Adrian Vladu

slide-2
SLIDE 2

IRLS Method min ∑rixi

2

Ax = b

Basic primitive:

slide-3
SLIDE 3

IRLS Method min ∑rixi

2

Ax = b

Basic primitive:

x = R-1AT(ATR-1A)-1Ab

solution given by one linear system solve * R = diag(r)

slide-4
SLIDE 4

IRLS Method min ∑rixi

2

Ax = b

Basic primitive:

min |x|p Ax = b x = R-1AT(ATR-1A)-1Ab

solution given by one linear system solve ** p = {1, ∞} * R = diag(r) “Hard” problem: * **

slide-5
SLIDE 5

IRLS Method min ∑rixi

2

Ax = b

Basic primitive:

min |x|p Ax = b x = R-1AT(ATR-1A)-1Ab

solution given by one linear system solve ** p = {1, ∞} * R = diag(r) “Hard” problem: equivalent to linear programming * **

slide-6
SLIDE 6

IRLS Method min ∑rixi

2

Ax = b

Basic primitive:

min |x|p Ax = b x = R-1AT(ATR-1A)-1Ab

solution given by one linear system solve ** p = {1, ∞} * R = diag(r) “Hard” problem: equivalent to linear programming * **

slide-7
SLIDE 7

IRLS Method min ∑rixi

2

Ax = b

Basic primitive:

min |x|p Ax = b x = R-1AT(ATR-1A)-1Ab

solution given by one linear system solve ** p = {1, ∞} * R = diag(r) “Hard” problem: equivalent to linear programming * **

slide-8
SLIDE 8

IRLS Method min ∑rixi

2

Ax = b

Basic primitive:

min |x|p Ax = b x = R-1AT(ATR-1A)-1Ab

solution given by one linear system solve ** p = {1, ∞} * R = diag(r) “Hard” problem: equivalent to linear programming * **

slide-9
SLIDE 9

IRLS Method min ∑rixi

2

Ax = b

Basic primitive:

min |x|p Ax = b x = R-1AT(ATR-1A)-1Ab

solution given by one linear system solve ** p = {1, ∞} * R = diag(r) “Hard” problem: equivalent to linear programming * **

slide-10
SLIDE 10

Benchmark: Optimization on Graphs min |x|∞ Ax = b

s t

slide-11
SLIDE 11

Benchmark: Optimization on Graphs min |x|∞ Ax = b

minimize congestion of flow x

s t

slide-12
SLIDE 12

Benchmark: Optimization on Graphs min |x|∞ Ax = b

minimize congestion of flow x boundary condition: x routes demand from s to t

s t

slide-13
SLIDE 13

Benchmark: Optimization on Graphs min |x|∞ Ax = b

.5 .5 .5 .5 .5 .5

minimize congestion of flow x boundary condition: x routes demand from s to t

Maximum flow

s t

slide-14
SLIDE 14

+1 +1

  • 1
  • 1

min |x|1 Ax = b Benchmark: Optimization on Graphs

slide-15
SLIDE 15

+1 +1

  • 1
  • 1

min |x|1 Ax = b

minimize cost of flow x

Benchmark: Optimization on Graphs

slide-16
SLIDE 16

+1 +1

  • 1
  • 1

min |x|1 Ax = b

minimize cost of flow x boundary condition: x routes demand from +1 to -1

Benchmark: Optimization on Graphs

slide-17
SLIDE 17

Minimum cost flow

+1 +1

  • 1
  • 1

min |x|1 Ax = b

minimize cost of flow x boundary condition: x routes demand from +1 to -1

1 1 1 1

Benchmark: Optimization on Graphs

slide-18
SLIDE 18

min |x|∞ Ax = b min |x|1 Ax = b

max flow min cost flow

Benchmark: Optimization on Graphs

slide-19
SLIDE 19

min |x|∞ Ax = b min |x|1 Ax = b

max flow min cost flow Q: Are these problems really that hard?

Benchmark: Optimization on Graphs

slide-20
SLIDE 20

min |x|∞ Ax = b min |x|1 Ax = b

max flow min cost flow Q: Are these problems really that hard? First order methods (gradient descent)

➜ running time strongly depends on matrix structure ➜ in general, takes time at least Ω(m1.5/poly(ε))

Second order methods (Newton method, IRLS)

➜ interior point method: Õ(m1/2) linear system solves ➜ can be made Õ(n1/2) with a lot of work [LS ’14]

“Hybrid” method

➜ [CKMST, STOC ’11] Õ(m1/3/ε11/3) linear system solves ➜ ~30 pages of description and proofs for complicated method

Benchmark: Optimization on Graphs

slide-21
SLIDE 21

min |x|∞ Ax = b min |x|1 Ax = b

max flow min cost flow Q: Are these problems really that hard? First order methods (gradient descent)

➜ running time strongly depends on matrix structure ➜ in general, takes time at least Ω(m1.5/poly(ε))

Second order methods (Newton method, IRLS)

➜ interior point method: Õ(m1/2) linear system solves ➜ can be made Õ(n1/2) with a lot of work [Lee-Sidford ’14]

“Hybrid” method

➜ [CKMST, STOC ’11] Õ(m1/3/ε11/3) linear system solves ➜ ~30 pages of description and proofs for complicated method

Benchmark: Optimization on Graphs

slide-22
SLIDE 22

min |x|∞ Ax = b min |x|1 Ax = b

max flow min cost flow Q: Are these problems really that hard? First order methods (gradient descent)

➜ running time strongly depends on matrix structure ➜ in general, takes time at least Ω(m1.5/poly(ε))

Second order methods (Newton method, IRLS)

➜ interior point method: Õ(m1/2) linear system solves ➜ can be made Õ(n1/2) with a lot of work [Lee-Sidford ’14]

“Hybrid” method

➜ [Christiano-Kelner-Madry-Spielman-Teng ’11] Õ(m1/3/ε11/3) linear system solves ➜ ~30 pages of description and proofs for complicated method

Benchmark: Optimization on Graphs

slide-23
SLIDE 23

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

slide-24
SLIDE 24

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations * no matter what the structure of the underlying matrix is

slide-25
SLIDE 25

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t

≤ OPT

slide-26
SLIDE 26

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t

≤ OPT Guess OPT value (.5)

slide-27
SLIDE 27

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t

≤ OPT Guess OPT value (.5)

slide-28
SLIDE 28

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t r = 1

≤ OPT Guess OPT value Initialize

1 1 1 1 1 1

(.5)

slide-29
SLIDE 29

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t min ∑ rixi

2

Ax = b r = 1

≤ OPT Guess OPT value Initialize Solve least squares problem

1 1 1 1 1 1 .6 .4 .2 .4 .4 .4 .6

(.5)

slide-30
SLIDE 30

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t min ∑ rixi

2

Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}

≤ OPT Guess OPT value Initialize Solve least squares problem Update r

1 1 1.44 1 1 1.44 .6 .4 .2 .4 .4 .4 .6

(.5)

slide-31
SLIDE 31

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t min ∑ rixi

2

Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}

≤ OPT Guess OPT value Initialize Solve least squares problem Update r

1 1 1.44 1 1 1.44 .6 .4 .2 .4 .4 .4 .6

(.5)

slide-32
SLIDE 32

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t min ∑ rixi

2

Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}

≤ OPT Guess OPT value Initialize Solve least squares problem Update r

1 1 1.44 1 1 1.44

(.5)

slide-33
SLIDE 33

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t min ∑ rixi

2

Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}

≤ OPT Guess OPT value Initialize Solve least squares problem Update r

1 1 1.44 1 1 1.44 .55 .44 .11 .44 .44 .44 .55

(.5)

slide-34
SLIDE 34

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t min ∑ rixi

2

Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}

≤ OPT Guess OPT value Initialize Solve least squares problem Update r

1 1 1.75 1 1 1.75 .55 .44 .11 .44 .44 .44 .55

(.5)

slide-35
SLIDE 35

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t min ∑ rixi

2

Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}

≤ OPT Guess OPT value Initialize Solve least squares problem Update r

1 1 1.75 1 1 1.75

(.5)

slide-36
SLIDE 36

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t min ∑ rixi

2

Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}

≤ OPT Guess OPT value Initialize Solve least squares problem Update r (.5)

slide-37
SLIDE 37

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t min ∑ rixi

2

Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}

≤ OPT Guess OPT value Initialize Solve least squares problem Update r

1 1 2 1 1 2

(.5)

slide-38
SLIDE 38

This work

Natural IRLS method runs in Õ(m1/3/ε2/3+1/ε2) iterations

min |x|∞ Ax = b s t min ∑ rixi

2

Ax = b r = 1 ri ← ri * max{(xi/OPT)2, 1}

≤ OPT Guess OPT value Initialize Solve least squares problem Update r

1 1 1 1 .5 .5 .5 .5 .5 .5

(.5)

2 2

slide-39
SLIDE 39

➜ Objective function is maxr≥0 minAx=b ∑rixi

2/∑ri

Similar analysis to packing/covering LP [Young ’01] ℓ1 version is a type of “slime mold dynamics” [Straszak- Vishnoi ’16, ‘17]

Nonstandard Optimization Primitive

slide-40
SLIDE 40

➜ Objective function is maxr≥0 minAx=b ∑rixi

2/∑ri

➜ Similar analysis to packing/covering LP [Young ’01] ℓ1 version is a type of “slime mold dynamics” [Straszak- Vishnoi ’16, ‘17]

Nonstandard Optimization Primitive

slide-41
SLIDE 41

➜ Objective function is maxr≥0 minAx=b ∑rixi

2/∑ri

➜ Similar analysis to packing/covering LP [Young ’01] ➜ ℓ1 version is a type of “slime mold dynamics” [Straszak- Vishnoi ’16, ‘17]

Nonstandard Optimization Primitive

slide-42
SLIDE 42

➜ Objective function is maxr≥0 minAx=b ∑rixi

2/∑ri

➜ Similar analysis to packing/covering LP [Young ’01] ➜ ℓ1 version is a type of “slime mold dynamics” [Straszak- Vishnoi ’16, ‘17]

Nonstandard Optimization Primitive

slide-43
SLIDE 43

➜ Objective function is maxr≥0 minAx=b ∑rixi

2/∑ri

➜ Similar analysis to packing/covering LP [Young ’01] ➜ ℓ1 version is a type of “slime mold dynamics” [Straszak- Vishnoi ’16, ‘17] ➜ Any insights for new optimization methods?

Nonstandard Optimization Primitive

slide-44
SLIDE 44

Thank You! More details at poster #208