Recent advances on the acceleration of first-order methods in convex - - PowerPoint PPT Presentation

recent advances on the acceleration of first order
SMART_READER_LITE
LIVE PREVIEW

Recent advances on the acceleration of first-order methods in convex - - PowerPoint PPT Presentation

Recent advances on the acceleration of first-order methods in convex optimization . Recent advances on the acceleration of first-order methods in convex optimization . Juan PEYPOUQUET Universidad T ecnica Federico Santa Mar a Second


slide-1
SLIDE 1

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

. .

Recent advances on the acceleration of first-order methods in convex optimization

Juan PEYPOUQUET Universidad T´ ecnica Federico Santa Mar´ ıa

Second Workshop on Algorithms and Dynamics for Games and Optimization Santiago, January 25, 2016

slide-2
SLIDE 2

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Content

Basic first-order descent methods Nesterov’s acceleration Dynamic interpretation

Damped Inertial Gradient System (DIGS)

Properties of DIGS trajectories and accelerated algorithms A first-order variant bearing second-order information in time and space

slide-3
SLIDE 3

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Content

Basic first-order descent methods Nesterov’s acceleration Dynamic interpretation

Damped Inertial Gradient System (DIGS)

Properties of DIGS trajectories and accelerated algorithms A first-order variant bearing second-order information in time and space

slide-4
SLIDE 4

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Content

Basic first-order descent methods Nesterov’s acceleration Dynamic interpretation

Damped Inertial Gradient System (DIGS)

Properties of DIGS trajectories and accelerated algorithms A first-order variant bearing second-order information in time and space

slide-5
SLIDE 5

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Content

Basic first-order descent methods Nesterov’s acceleration Dynamic interpretation

Damped Inertial Gradient System (DIGS)

Properties of DIGS trajectories and accelerated algorithms A first-order variant bearing second-order information in time and space

slide-6
SLIDE 6

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Content

Basic first-order descent methods Nesterov’s acceleration Dynamic interpretation

Damped Inertial Gradient System (DIGS)

Properties of DIGS trajectories and accelerated algorithms A first-order variant bearing second-order information in time and space

slide-7
SLIDE 7

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

BASIC DESCENT METHODS

slide-8
SLIDE 8

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Basic (first-order) descent methods

Steepest descent dynamics: ˙ x(t) = −∇ϕ(x(t)), x(0) = x0 x0 ∇ϕ(x0) S

b b

x(t) d dt ϕ(x(t)) = ⟨∇ϕ(x(t)), ˙ x(t)⟩ = −∥∇ϕ(x(t))∥2 = −∥ ˙ x(t)∥2

slide-9
SLIDE 9

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Basic (first-order) descent methods

Steepest descent dynamics: ˙ x(t) = −∇ϕ(x(t)), x(0) = x0 x0 ∇ϕ(x0) S

b b

x(t) d dt ϕ(x(t)) = ⟨∇ϕ(x(t)), ˙ x(t)⟩ = −∥∇ϕ(x(t))∥2 = −∥ ˙ x(t)∥2

slide-10
SLIDE 10

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Basic (first-order) descent methods

Explicit discretization → gradient method (Cauchy 1847): xk+1 − xk λ = −∇ϕ(xk) ⇐ ⇒ xk+1 = xk − λ∇ϕ(xk). Implicit discretization → proximal method (Martinet 1970): zk+1 − zk λ = −∇ϕ(zk+1) ⇐ ⇒ zk+1 + λ∇ϕ(zk+1) = zk.

slide-11
SLIDE 11

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Basic (first-order) descent methods

Explicit discretization → gradient method (Cauchy 1847): xk+1 − xk λ = −∇ϕ(xk) ⇐ ⇒ xk+1 = xk − λ∇ϕ(xk). Implicit discretization → proximal method (Martinet 1970): zk+1 − zk λ = −∇ϕ(zk+1) ⇐ ⇒ zk+1 + λ∇ϕ(zk+1) = zk.

slide-12
SLIDE 12

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Basic (first-order) descent methods

Gradient Proximal xk+1 = xk − λ∇ϕ(xk) zk+1 + λ∇ϕ(zk+1) = zk zk = xk zk+1 xk+1 S

b b b

slide-13
SLIDE 13

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Basic (first-order) descent methods

Gradient Proximal xk+1 = xk − λ∇ϕ(xk) zk+1 + λ∇ϕ(zk+1) = zk zk = xk zk+1 xk+1 S

b b b

slide-14
SLIDE 14

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Pros and cons

Gradient method + Lower computational cost per iteration (explicit formula), easy implementation − Convergence depends strongly on the regularity of the function (typically ϕ ∈ C1,1) and on the step sizes Proximal point algorithm + More stability, convergence certificate for a larger class of functions (∇ϕ → ∂ϕ), independent of the step size − Higher computational cost per iteration (implicit formula),

  • ften requires inexact computation
slide-15
SLIDE 15

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Pros and cons

Gradient method + Lower computational cost per iteration (explicit formula), easy implementation − Convergence depends strongly on the regularity of the function (typically ϕ ∈ C1,1) and on the step sizes Proximal point algorithm + More stability, convergence certificate for a larger class of functions (∇ϕ → ∂ϕ), independent of the step size − Higher computational cost per iteration (implicit formula),

  • ften requires inexact computation
slide-16
SLIDE 16

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Combining smooth and nonsmooth functions

Problem min{Φ(x) := F(x) + G(x) : x ∈ H}, where F is not smooth but G is. Forward-Backward Method (xk → xk+ 1

2 → xk+1)

xk+1 + λ∂F(xk+1) ∋ xk+ 1

2 = xk − λ∇G(xk)

xk+1 = ProxλF ◦ GradλG(xk)

slide-17
SLIDE 17

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Combining smooth and nonsmooth functions

Problem min{Φ(x) := F(x) + G(x) : x ∈ H}, where F is not smooth but G is. Forward-Backward Method (xk → xk+ 1

2 → xk+1)

xk+1 + λ∂F(xk+1) ∋ xk+ 1

2 = xk − λ∇G(xk)

xk+1 = ProxλF ◦ GradλG(xk)

slide-18
SLIDE 18

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Combining smooth and nonsmooth functions

Problem min{Φ(x) := F(x) + G(x) : x ∈ H}, where F is not smooth but G is. Forward-Backward Method (xk → xk+ 1

2 → xk+1)

xk+1 + λ∂F(xk+1) ∋ xk+ 1

2 = xk − λ∇G(xk)

xk+1 = ProxλF ◦ GradλG(xk)

slide-19
SLIDE 19

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Combining smooth and nonsmooth functions

Gradient projection: Goldstein 1964, Levitin-Polyak 1966, with F = δC General setting: Lions-Mercier 1979, Passty 1979 Iterative Shrinkage-Thresholding Algorithm (ISTA): Daubechies-Defrise-DeMol 2004, Combettes-Wajs 2005, for “ℓ1 + ℓ2” minimization Φ(x) = F(x) + G(x) = µ∥x∥1 + 1 2∥Ax − b∥2

slide-20
SLIDE 20

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Combining smooth and nonsmooth functions

Gradient projection: Goldstein 1964, Levitin-Polyak 1966, with F = δC General setting: Lions-Mercier 1979, Passty 1979 Iterative Shrinkage-Thresholding Algorithm (ISTA): Daubechies-Defrise-DeMol 2004, Combettes-Wajs 2005, for “ℓ1 + ℓ2” minimization Φ(x) = F(x) + G(x) = µ∥x∥1 + 1 2∥Ax − b∥2

slide-21
SLIDE 21

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Combining smooth and nonsmooth functions

Gradient projection: Goldstein 1964, Levitin-Polyak 1966, with F = δC General setting: Lions-Mercier 1979, Passty 1979 Iterative Shrinkage-Thresholding Algorithm (ISTA): Daubechies-Defrise-DeMol 2004, Combettes-Wajs 2005, for “ℓ1 + ℓ2” minimization Φ(x) = F(x) + G(x) = µ∥x∥1 + 1 2∥Ax − b∥2

slide-22
SLIDE 22

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Convergence of the forward-backward method

. Theorem . . Let Φ = F + G, where G is closed and convex, and F is convex with ∇F L-Lipschitz. Assume Φ has minimizers, and let (xk) be

  • btained by the FB method with λ ≤ 1/L. Then

As k → ∞, (xk) converges∗ to a minimizer of Φ; and Φ(xk) − min Φ = O(k−1): There is C > 0 such that Φ(xk) − min Φ ≤ C k .

slide-23
SLIDE 23

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Convergence of the forward-backward method

. Theorem . . Let Φ = F + G, where G is closed and convex, and F is convex with ∇F L-Lipschitz. Assume Φ has minimizers, and let (xk) be

  • btained by the FB method with λ ≤ 1/L. Then

As k → ∞, (xk) converges∗ to a minimizer of Φ; and Φ(xk) − min Φ = O(k−1): There is C > 0 such that Φ(xk) − min Φ ≤ C k .

slide-24
SLIDE 24

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Convergence of the forward-backward method

. Theorem . . Let Φ = F + G, where G is closed and convex, and F is convex with ∇F L-Lipschitz. Assume Φ has minimizers, and let (xk) be

  • btained by the FB method with λ ≤ 1/L. Then

As k → ∞, (xk) converges∗ to a minimizer of Φ; and Φ(xk) − min Φ = O(k−1): There is C > 0 such that Φ(xk) − min Φ ≤ C k .

slide-25
SLIDE 25

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Convergence ISTA

Let Φ : RN → R be defined by Φ(x) = ∥x∥1 + 1 2∥Ax − b∥2. Local linear convergence results have been found recently, as well as theoretical convergence rates. . Theorem (Bolte-Nguyen-P .-Suter 2015) . . Let (xk) be obtained by the FB method with step size λ. Then, there is an explicit constant d such that Φ(xk) − min Φ ≤ Φ(x0) − min Φ (1 + dλ)2k .

slide-26
SLIDE 26

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Convergence ISTA

Let Φ : RN → R be defined by Φ(x) = ∥x∥1 + 1 2∥Ax − b∥2. Local linear convergence results have been found recently, as well as theoretical convergence rates. . Theorem (Bolte-Nguyen-P .-Suter 2015) . . Let (xk) be obtained by the FB method with step size λ. Then, there is an explicit constant d such that Φ(xk) − min Φ ≤ Φ(x0) − min Φ (1 + dλ)2k .

slide-27
SLIDE 27

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Convergence ISTA

Let Φ : RN → R be defined by Φ(x) = ∥x∥1 + 1 2∥Ax − b∥2. Local linear convergence results have been found recently, as well as theoretical convergence rates. . Theorem (Bolte-Nguyen-P .-Suter 2015) . . Let (xk) be obtained by the FB method with step size λ. Then, there is an explicit constant d such that Φ(xk) − min Φ ≤ Φ(x0) − min Φ (1 + dλ)2k .

slide-28
SLIDE 28

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

NESTEROV’S ACCELERATION

slide-29
SLIDE 29

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Acceleration

The main idea is the following: Instead of doing xk−1 xk xk+1 S

b b b

slide-30
SLIDE 30

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Acceleration

Better try xk−1 xk yk S

b b b

slide-31
SLIDE 31

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Acceleration

Better try xk−1 xk yk xk+1 S

b b b b

slide-32
SLIDE 32

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Acceleration

Better try xk−1 xk yk xk+1 yk+1 xk+2 S

b b b b b b

slide-33
SLIDE 33

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Some remarks

Convergence and its rate are sensitive to the choice of yk This simple procedure (Nesterov 1983) can take the theoretical rate of worst-case convergence for the values from the typical O(1/k) down to O(1/k2) No convergence proof for the iterates xk Current common practice is yk = xk + ( 1 − 3

k

) (xk − xk−1) Keynote example in image processing: FISTA (Beck-Teboulle 2009)

slide-34
SLIDE 34

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Some remarks

Convergence and its rate are sensitive to the choice of yk This simple procedure (Nesterov 1983) can take the theoretical rate of worst-case convergence for the values from the typical O(1/k) down to O(1/k2) No convergence proof for the iterates xk Current common practice is yk = xk + ( 1 − 3

k

) (xk − xk−1) Keynote example in image processing: FISTA (Beck-Teboulle 2009)

slide-35
SLIDE 35

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Some remarks

Convergence and its rate are sensitive to the choice of yk This simple procedure (Nesterov 1983) can take the theoretical rate of worst-case convergence for the values from the typical O(1/k) down to O(1/k2) No convergence proof for the iterates xk Current common practice is yk = xk + ( 1 − 3

k

) (xk − xk−1) Keynote example in image processing: FISTA (Beck-Teboulle 2009)

slide-36
SLIDE 36

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Some remarks

Convergence and its rate are sensitive to the choice of yk This simple procedure (Nesterov 1983) can take the theoretical rate of worst-case convergence for the values from the typical O(1/k) down to O(1/k2) No convergence proof for the iterates xk Current common practice is yk = xk + ( 1 − 3

k

) (xk − xk−1) Keynote example in image processing: FISTA (Beck-Teboulle 2009)

slide-37
SLIDE 37

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

ISTA & FISTA

General case: FB: values O(k−1), convergent sequence. AFB: values O(k−2). ℓ1 + ℓ2 minimization: ISTA: values O(Qk), convergent sequence (proved). FISTA: values (observed, not proved) O( ˜ Qk), always strictly faster than ISTA, convergent sequence (observed, not proved).

slide-38
SLIDE 38

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

ISTA & FISTA

General case: FB: values O(k−1), convergent sequence. AFB: values O(k−2). ℓ1 + ℓ2 minimization: ISTA: values O(Qk), convergent sequence (proved). FISTA: values (observed, not proved) O( ˜ Qk), always strictly faster than ISTA, convergent sequence (observed, not proved).

slide-39
SLIDE 39

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Long-standing questions

Is Φ(xk) − min Φ = O( ˜ Qk) true for FISTA (ℓ1 + ℓ2)? Is AFB always strictly faster than FB? What about FISTA and ISTA? Is Φ(xk) − min Φ = O(k−2) optimal for AFB (in general)? Are AFB sequences convergent? What about FISTA?

slide-40
SLIDE 40

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Long-standing questions

Is Φ(xk) − min Φ = O( ˜ Qk) true for FISTA (ℓ1 + ℓ2)? Is AFB always strictly faster than FB? What about FISTA and ISTA? Is Φ(xk) − min Φ = O(k−2) optimal for AFB (in general)? Are AFB sequences convergent? What about FISTA?

slide-41
SLIDE 41

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Long-standing questions

Is Φ(xk) − min Φ = O( ˜ Qk) true for FISTA (ℓ1 + ℓ2)? Is AFB always strictly faster than FB? What about FISTA and ISTA? Is Φ(xk) − min Φ = O(k−2) optimal for AFB (in general)? Are AFB sequences convergent? What about FISTA?

slide-42
SLIDE 42

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Long-standing questions

Is Φ(xk) − min Φ = O( ˜ Qk) true for FISTA (ℓ1 + ℓ2)? Is AFB always strictly faster than FB? What about FISTA and ISTA? Is Φ(xk) − min Φ = O(k−2) optimal for AFB (in general)? Are AFB sequences convergent? What about FISTA?

slide-43
SLIDE 43

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Long-standing questions

Is Φ(xk) − min Φ = O( ˜ Qk) true for FISTA (ℓ1 + ℓ2)? Is AFB always strictly faster than FB? What about FISTA and ISTA? Is Φ(xk) − min Φ = O(k−2) optimal for AFB (in general)? Are AFB sequences convergent? What about FISTA?

slide-44
SLIDE 44

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Long-standing questions

Is Φ(xk) − min Φ = O( ˜ Qk) true for FISTA (ℓ1 + ℓ2)? Is AFB always strictly faster than FB? What about FISTA and ISTA? Is Φ(xk) − min Φ = O(k−2) optimal for AFB (in general)? Are AFB sequences convergent? What about FISTA?

slide-45
SLIDE 45

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

DYNAMIC INTERPRETATION

slide-46
SLIDE 46

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Discretization of DIGS

A finite-difference discretization of (DIGS) ¨ x(t) + α t ˙ x(t) + ∂F(x(t)) + ∇G(x(t)) ∋ 0. gives 1 h2 (xk+1−2xk +xk−1)+ α kh2 (xk −xk−1)+∂F(xk+1)+∇G(yk) ∋ 0, where yk (specified later) is related to the segment [xk−1, xk].

slide-47
SLIDE 47

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Discretization of DIGS

Rewriting 1 h2 (xk+1−2xk +xk−1)+ α kh2 (xk −xk−1)+∂F(xk+1)+∇G(yk) ∋ 0, with λ = h2, we obtain xk+1 + λ∂F(xk+1) ∋ xk + ( 1 − α k ) (xk − xk−1) − λ∇G(yk). Thus, if we set yk = xk + ( 1 − α

k

) (xk − xk−1), we obtain xk+1 + λ∂F(xk+1) ∋ yk − λ∇G(yk).

slide-48
SLIDE 48

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Discretization of DIGS

Rewriting 1 h2 (xk+1−2xk +xk−1)+ α kh2 (xk −xk−1)+∂F(xk+1)+∇G(yk) ∋ 0, with λ = h2, we obtain xk+1 + λ∂F(xk+1) ∋ xk + ( 1 − α k ) (xk − xk−1) − λ∇G(yk). Thus, if we set yk = xk + ( 1 − α

k

) (xk − xk−1), we obtain xk+1 + λ∂F(xk+1) ∋ yk − λ∇G(yk).

slide-49
SLIDE 49

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Discretization of DIGS

Therefore, a finite-difference discretization of ¨ x(t) + α t ˙ x(t) + ∂F(x(t)) + ∇G(x(t)) ∋ 0. naturally yields    yk = xk + ( 1 − α k ) (xk − xk−1) xk+1 = ProxλF ◦ GradλG(yk) Construction due to Su-Boyd-Cand` es 2014.

slide-50
SLIDE 50

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Discretization of DIGS

Therefore, a finite-difference discretization of ¨ x(t) + α t ˙ x(t) + ∂F(x(t)) + ∇G(x(t)) ∋ 0. naturally yields    yk = xk + ( 1 − α k ) (xk − xk−1) xk+1 = ProxλF ◦ GradλG(yk) Construction due to Su-Boyd-Cand` es 2014.

slide-51
SLIDE 51

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

PROPERTIES OF DIGS TRAJECTORIES

slide-52
SLIDE 52

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Basic properties

. Theorem (Attouch-Chbani-P .-Redont 2015) . . If α > 0, then lim

t→+∞ Φ(x(t)) = inf(Φ) ∈ R ∪ {−∞}.

Every weak limit point of x(t), as t → ∞, minimizes Φ. Either Φ has minimizers and all trajectories are bounded,

  • r it does not and all trajectories diverge to +∞ in norm.

If Φ is bounded from below, then lim

t→+∞ ∥ ˙

x(t)∥ = 0.

slide-53
SLIDE 53

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Rate of convergence

. Theorem (Su-Boyd-Cand` es 2014) . . . If α ≥ 3 and Φ has minimizers, then every solution satisfies Φ(x(t)) − min(Φ) ≤ C t2 , where C depends on α and the initial data.

slide-54
SLIDE 54

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Rate of convergence

The exponent 2 is sharp. More precisely, we have the following: . Theorem (ACPR) . . For each p > 2, there is Φ such that Φ has minimizers and every solution satisfies Φ(x(t)) − min(Φ) = C tp .

slide-55
SLIDE 55

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Rate of convergence

If Φ is strongly convex, convergence is arbitrarily fast, as α grows. . Theorem (ACPR) . . Let Φ be strongly convex and let x∗ be its unique minimizer. Every solution satisfies Φ(x(t)) − min(Φ) ≤ C t

2 3 α

and ∥x(t) − x∗∥ ≤ D t

1 3 α ,

where C and D depend on α, the strong convexity parameter and the initial data.

slide-56
SLIDE 56

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Convergence of the solutions

. Theorem (ACPR, May) . . If α > 3 and Φ has minimizers, then x(t) converges weakly, as t → +∞, to a minimizer of Φ. Convergence is strong if either Φ is uniformly convex, int(Argmin(Φ)) ̸= ∅, or Φ is even. ∥ ˙ x(t)∥ = o(t−1). Φ(x(t)) − min(Φ) = o(t−2).

slide-57
SLIDE 57

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Convergence of the solutions

. Theorem (ACPR, May) . . If α > 3 and Φ has minimizers, then x(t) converges weakly, as t → +∞, to a minimizer of Φ. Convergence is strong if either Φ is uniformly convex, int(Argmin(Φ)) ̸= ∅, or Φ is even. ∥ ˙ x(t)∥ = o(t−1). Φ(x(t)) − min(Φ) = o(t−2).

slide-58
SLIDE 58

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Convergence of the solutions

. Theorem (ACPR, May) . . If α > 3 and Φ has minimizers, then x(t) converges weakly, as t → +∞, to a minimizer of Φ. Convergence is strong if either Φ is uniformly convex, int(Argmin(Φ)) ̸= ∅, or Φ is even. ∥ ˙ x(t)∥ = o(t−1). Φ(x(t)) − min(Φ) = o(t−2).

slide-59
SLIDE 59

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Convergence of the solutions

. Theorem (ACPR, May) . . If α > 3 and Φ has minimizers, then x(t) converges weakly, as t → +∞, to a minimizer of Φ. Convergence is strong if either Φ is uniformly convex, int(Argmin(Φ)) ̸= ∅, or Φ is even. ∥ ˙ x(t)∥ = o(t−1). Φ(x(t)) − min(Φ) = o(t−2).

slide-60
SLIDE 60

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

PROPERTIES OF ACCELERATED ALGORITHMS

slide-61
SLIDE 61

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Back to accelerated algorithms

Recall that    yk = xk + ( 1 − α k ) (xk − xk−1) xk+1 = ProxλF ◦ GradλG(yk) . Theorem (ACPR) . . If α > 0, then lim

k→+∞ Φ(xk) = inf(Φ); and

every weak limit point of xk, as k → +∞, minimizes Φ.

slide-62
SLIDE 62

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Back to accelerated algorithms

Recall that    yk = xk + ( 1 − α k ) (xk − xk−1) xk+1 = ProxλF ◦ GradλG(yk) . Theorem (ACPR) . . If α > 0, then lim

k→+∞ Φ(xk) = inf(Φ); and

every weak limit point of xk, as k → +∞, minimizes Φ.

slide-63
SLIDE 63

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Back to accelerated algorithms

. Theorem (ACPR) . . If α ≥ 3 and Φ has minimizers, then Φ(xk) − min Φ = O(k−2) and ∥xk − xk−1∥ = O(k−1).

slide-64
SLIDE 64

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Back to accelerated algorithms

. Theorem (ACPR,AP) . . If α > 3 and Φ has minimizers, then: xk converges weakly, as k → +∞, to a minimizer of Φ. Strong convergence holds if Φ is even, uniformly convex,

  • r if Argmin(Φ) has nonempty interior.

∥xk − xk−1∥ = o(k−1). Φ(xk) − min Φ = o(k−2).

slide-65
SLIDE 65

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Back to accelerated algorithms

. Theorem (ACPR,AP) . . If α > 3 and Φ has minimizers, then: xk converges weakly, as k → +∞, to a minimizer of Φ. Strong convergence holds if Φ is even, uniformly convex,

  • r if Argmin(Φ) has nonempty interior.

∥xk − xk−1∥ = o(k−1). Φ(xk) − min Φ = o(k−2).

slide-66
SLIDE 66

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Back to accelerated algorithms

. Theorem (ACPR,AP) . . If α > 3 and Φ has minimizers, then: xk converges weakly, as k → +∞, to a minimizer of Φ. Strong convergence holds if Φ is even, uniformly convex,

  • r if Argmin(Φ) has nonempty interior.

∥xk − xk−1∥ = o(k−1). Φ(xk) − min Φ = o(k−2).

slide-67
SLIDE 67

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Back to accelerated algorithms

. Theorem (ACPR,AP) . . If α > 3 and Φ has minimizers, then: xk converges weakly, as k → +∞, to a minimizer of Φ. Strong convergence holds if Φ is even, uniformly convex,

  • r if Argmin(Φ) has nonempty interior.

∥xk − xk−1∥ = o(k−1). Φ(xk) − min Φ = o(k−2).

slide-68
SLIDE 68

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

A simple example

We consider the function Φ(x1, x2) = 1

2(x2 1 + 1000x2 2). We show

the behavior of a solution to ¨ x(t) + α t ˙ x(t) + ∇Φ(x(t)) = 0

  • n the interval [1, 20] with α = 3.1 .
slide-69
SLIDE 69

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Function values

slide-70
SLIDE 70

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Trajectory

slide-71
SLIDE 71

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

CAN WE DO BETTER?

slide-72
SLIDE 72

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Idea: Newton / Levenberg-Marquardt

Pros: Is fast. Compensates the effect of ill-conditioning. Cons: Requires higher regularity (to compute and invert the Hessian). Is costly to implement.

slide-73
SLIDE 73

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Idea: Newton / Levenberg-Marquardt

Pros: Is fast. Compensates the effect of ill-conditioning. Cons: Requires higher regularity (to compute and invert the Hessian). Is costly to implement.

slide-74
SLIDE 74

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

NDIGS

(NDIGS) ¨ x(t) + α t ˙ x(t) + β∇2Φ(x(t)) ˙ x(t) + ∇Φ(x(t)) = 0. Seems much more complicated, but . Proposition (APR 2015) . . System (NDIGS) is equivalent to    ˙ x(t) + β∇Φ(x(t)) − (

1 β − α t

) x(t) + 1

βy(t)

= ˙ y(t) − (

1 β − α t + αβ t2

) x(t) + 1

βy(t)

= 0.

slide-75
SLIDE 75

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

NDIGS

(NDIGS) ¨ x(t) + α t ˙ x(t) + β∇2Φ(x(t)) ˙ x(t) + ∇Φ(x(t)) = 0. Seems much more complicated, but . Proposition (APR 2015) . . System (NDIGS) is equivalent to    ˙ x(t) + β∇Φ(x(t)) − (

1 β − α t

) x(t) + 1

βy(t)

= ˙ y(t) − (

1 β − α t + αβ t2

) x(t) + 1

βy(t)

= 0.

slide-76
SLIDE 76

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

NDIGS

(NDIGS) ¨ x(t) + α t ˙ x(t) + β∇2Φ(x(t)) ˙ x(t) + ∇Φ(x(t)) = 0. Seems much more complicated, but . Proposition (APR 2015) . . System (NDIGS) is equivalent to    ˙ x(t) + β∇Φ(x(t)) − (

1 β − α t

) x(t) + 1

βy(t)

= ˙ y(t) − (

1 β − α t + αβ t2

) x(t) + 1

βy(t)

= 0.

slide-77
SLIDE 77

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Nonsmooth functions

Using variable Z = (x, y), this is ˙ Z(t) + ∇G(Z(t)) + D(t, Z(t)) ∋ 0, where G(Z) = βΦ(x) and D is a regular linear perturbation. So, we can consider (NDIGS′) ˙ Z(t) + ∂G(Z(t)) + D(t, Z(t)) ∋ 0, for nondifferentiable Φ.

slide-78
SLIDE 78

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Nonsmooth functions

Using variable Z = (x, y), this is ˙ Z(t) + ∇G(Z(t)) + D(t, Z(t)) ∋ 0, where G(Z) = βΦ(x) and D is a regular linear perturbation. So, we can consider (NDIGS′) ˙ Z(t) + ∂G(Z(t)) + D(t, Z(t)) ∋ 0, for nondifferentiable Φ.

slide-79
SLIDE 79

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Convergence results

. Theorem (APR) . . Let Φ be closed and convex, and let β > 0. All the conclusions obtained for the solutions of (DIGS) are also true for the solutions of (NDIGS’). But also limt→∞ ∥∇Φ(x(t))∥ = 0. If ∇Φ is locally Lipschitz-continuous, then limt→∞ ∥¨ x(t)∥ = 0.

slide-80
SLIDE 80

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Convergence results

. Theorem (APR) . . Let Φ be closed and convex, and let β > 0. All the conclusions obtained for the solutions of (DIGS) are also true for the solutions of (NDIGS’). But also limt→∞ ∥∇Φ(x(t))∥ = 0. If ∇Φ is locally Lipschitz-continuous, then limt→∞ ∥¨ x(t)∥ = 0.

slide-81
SLIDE 81

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Convergence results

. Theorem (APR) . . Let Φ be closed and convex, and let β > 0. All the conclusions obtained for the solutions of (DIGS) are also true for the solutions of (NDIGS’). But also limt→∞ ∥∇Φ(x(t))∥ = 0. If ∇Φ is locally Lipschitz-continuous, then limt→∞ ∥¨ x(t)∥ = 0.

slide-82
SLIDE 82

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

A simple example

We consider the function Φ(x1, x2) = 1

2(x2 1 + 1000x2 2). We show

the behavior of a solution to ¨ x(t) + α t ˙ x(t) + β∇2Φ(x(t)) ˙ x(t) + ∇Φ(x(t)) = 0

  • n the interval [1, 20] with α = 3.1 and β = 1.
slide-83
SLIDE 83

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Function values

slide-84
SLIDE 84

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Trajectory

slide-85
SLIDE 85

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Algorithmic implementation

Several discretizations are possible, giving different iterative algorithms. . Conjecture (Work in progress) . . An appropriate discretization defines an algorithm with the same convergence properties as the continuous-time system (NDIGS’).

slide-86
SLIDE 86

. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. .

Recent advances on the acceleration of first-order methods in convex optimization

Algorithmic implementation

Several discretizations are possible, giving different iterative algorithms. . Conjecture (Work in progress) . . An appropriate discretization defines an algorithm with the same convergence properties as the continuous-time system (NDIGS’).