Inexact variable metric proximal gradient methods with line-search - - PowerPoint PPT Presentation

▶

Jan 05, 2024 376 likes •570 views

Inexact variable metric proximal gradient methods with line-search for convex and nonconvex optimization Silvia Bonettini Dipartimento di Scienze Fisiche, Optimization Algorithms and Software Informatiche e Matematiche for Inverse problemS

SLIDE 1

Inexact variable metric proximal gradient methods with line-search for convex and nonconvex optimization Silvia Bonettini

Dipartimento di Scienze Fisiche, Informatiche e Matematiche Università di Modena e Reggio Emilia Optimization Algorithms and Software for Inverse problemS www.oasis.unimore.it

Computational Methods for Inverse Problems in Imaging Como, 16-18 July 2018

Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 1 / 17

SLIDE 2

Collaborators and main references

Joint works with: Marco Prato, Università di Modena e Reggio Emilia Federica Porta, Simone Rebegoldi, Valeria Ruggiero, Università di Ferrara Ignace Loris, Université Libre de Bruxelles Main references:

S. B., I. Loris, F

. Porta, M. Prato 2016, Variable metric inexact line–search based methods for nonsmooth optimization, SIAM J. Optim., 26(2), 891-921

S. B., F. Porta, V. Ruggiero 2016, A variable metric forward–backward method with

extrapolation, SIAM J. Sci. Comput.,38(4), A2558-A2584

S. B., I. Loris, F

. Porta, M. Prato, S. Rebegoldi 2017, On the convergence of a line–search base proximal-gradient method for nonconvex optimization, Inverse Probl., 33(5), 055005

S. B., S. Rebegoldi, V. Ruggiero, 2018, Inertial variable metric techniques for the

inexact forward-backward algorithm, submitted.

Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 2 / 17

SLIDE 3

A general nonsmooth problem

Several optimization problems arising from the Bayesian approach to inverse prob- lems have the following structure min

x∈Rn f(x) ≡ f0(x) + f1(x),

where: f0(x) continuously differentiable, possibly nonconvex.

usually expressing some kind of data discrepancy

f1(x) convex, possibly nondifferentiable

usually expressing regularization

Goal: develop a numerical optimization algorithm producing a good approximation

f the solution of the minimization problem in few, cheap iterations.

Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 3 / 17

SLIDE 4

The class of proximal gradient methods

Proximal gradient methods, aka forward-backward methods, exploit the smooth- ness of f0 and the convexity of f1 in problem min

x∈Rn f(x) ≡ f0(x) + f1(x),

Definition (Proximal gradient method) Any first order method based on the following two operations: Explicit Forward/Gradient step: computation of the gradient ∇f0(x) Implicit Backward/Proximal step: computation of the proximity (or resolvent)

perator:

proxf1(z) = arg min

x∈Rn f1(x) + 1

2x − z2 Example: If Ω ⊂ Rn is a closed convex set, we can define the indicator function ιΩ(x) = if x ∈ Ω +∞

therwise

⇒ proxιΩ(z) = ΠΩ(z) (orthogonal projection onto Ω). NB: gradient projection methods are special instances of proximal gradient meth-

Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 4 / 17

SLIDE 5

A basic forward-backward scheme

z(k) = x(k) − αk∇f0(x(k)) ← Forward step y(k) = proxαkf1(z(k)) ← Backward step d(k) = y(k) − x(k) x(k+1) = x(k) + λkd(k) NB: The steplength parameters αk, λk ∈ R>0, in standard convergence analysis, are related to the Lipschitz constant L of ∇f0(x) [Combettes-Wajs 2006], [Com- bettes, Wu, 2014] requiring that αk and/or λk ≤ C L A motivating problem: nonnegative image restoration from Poisson data min

x∈Rn KL(Hx, g)

f0(x)

+ ρ∇x + ιRn

≥0(x)

f1(x)

where KL(t, g) =

n

log gi ti

+ ti − gi

either ∇f0 is not Lipschitz or L is very large proxf1 is not available in closed form

Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 5 / 17

SLIDE 6

A line–search approach

We propose to compute λk with a line–search approach, starting from 1 and back- tracking until a sufficient decrease of the objective function is obtained. Generalized Armijo rule [Tseng, Yun, 2009, Porta, Loris, 2015, B. et al., 2016] f(x(k) + λkd(k)) ≤ f(x(k)) + βλkh(k)(y(k)), where β ∈ (0, 1) and h(k)(y) = ∇f0(x(k))T (y − x(k)) + 1 2αk y − x(k)2 + f1(y) − f1(x(k)),

NB1: We have y(k) = proxαkf1(x(k) − αk∇f0(x(k)) = arg miny∈Rn h(k)(y). Since h(k)(y(k)) < 0, we obtain a monotone decrease of the objective function. NB2: For f1 ≡ 0, dropping the quadratic term we obtain the standard Armijo rule for smooth

ptimization.

Pros: No need of any Lipschitz assumption Adaptive selection of λk (no user provided parameter) No assumptions on αk, just be bounded above and away from zero. Cons: Needs the evaluation of the function f at each backtracking loop (usually 1-2 per outer iteration).

Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 6 / 17

SLIDE 7

Inexact computation of the proximity operator (1)

Basic idea

Compute an approximation ˜ y(k) of y(k) by applying an iterative optimization method to the minimum problem defining the proximity operator: ˜ y(k) ≃ y(k) = arg min

y∈Rn h(k)(y)

with an increasing accuracy as k increases. This results in a two loop algorithm and the question now is How to stop the inner iterations to preserve the convergence of the iterates {x(k)} to a solution? We need to define a criterion to measure the accuracy of the approximate proximity

perator computation.

Crucial properties of this criterion: It has to preserve the convergence properties of the whole scheme. It must be based on computable quantities. Borrowing the ideas in [Salzo,Villa,2012], [Villa etal 2013] replace 0 ∈ ∂h(k)(y(k)) with 0 ∈ ∂ǫkh(k)(˜ y(k))

Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 7 / 17

SLIDE 8

Inexact computation of the proximity operator (2)

A well defined primal-dual procedure

Assume that f1(x) = g(Ax), A ∈ Rm×n (easy generalization to f1(x) = p

i=1 gi(Aix)).

The dual problem of the proximity operator computation is min

x∈Rn h(k)(x) = max v∈Rm Ψ(k)(v) ≡ − 1

2αk αkAT v − z(k)2 − g∗(v) + Ck where g∗ is the Fenchel convex conjugate of g. If v(k) = arg max Ψ(k)(v), then y(k) = z(k) − αkAT v(k). Compute ˜ y(k) as follows: apply a maximization method to the dual problem, generating the dual sequence {v(k,ℓ)}ℓ∈N converging to v(k) compute the corresponding primal sequence {˜ y(k,ℓ)}ℓ∈N, with formula ˜ y(k,ℓ) = z(k) − αkAT v(k,ℓ) stop the inner iterations when h(k)(˜ y(k,ℓ)) − Ψ(k)(v(k,ℓ)) ≤ ǫk where ǫk =   

C kq

with q > 1 prefixed sequence choice

ηh(k)(˜ y(k,ℓ)) with η ∈ (0, 1] adaptive choice

Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 8 / 17

SLIDE 9

Introducing Scaling

Add a new parameter, a s.p.d. scaling matrix Dk which determines a different metric at each iterate: replace x with xDk = xT Dkx Variable Metric Inexact Line–Search Algorithm (VMILA) z(k) = x(k) − αkD−1

k ∇f0(x(k)) ← Scaled Forward step

˜ y(k) ≈ proxDk

αkf1(z(k)) ≡ y(k) ← Scaled Inexact Backward step

d(k) = ˜ y(k) − x(k) x(k+1) = x(k) + λkd(k) ← Armijo-like line–search

Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 9 / 17

SLIDE 10

Summary of convergence results about VMILA

VMILA λk with line–search + inexact computation of the proximal point with increasing accuracy + αk bounded Convex case Assumption: Dk

k→∞

− → I like C/kp, p > 1 Convergence to a minimizer (without Lipschitz assumptions on ∇f0(x)) Convergence rate f(x(k)) − f ∗ = O(1/k) (proof with Lipschitz assumptions on ∇f0(x)) Nonconvex case Assumption: Dk has bounded eigenvalues. Every accumulation point of {x(k)}k∈Z is a stationary point If f satisfies the Kurdyka–Lojasiewicz property and ∇f0 is locally Lipschitz, then {x(k)}k∈Z converges to a stationary point (with exact proximal point computation). Block-coordinate version of VMILA proposed in [B., Prato, Rebegoldi, 2018, to appear]. NB: αk and Dk are required only to be bounded ⇒ use them to implement some acceleration strategy.

Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 10 / 17

SLIDE 11

Metric selection - A Majorization-Minimization approach

No theoretical results (same rate and lower complexity bound than nonscaled meth-

ds). No general recipe for selecting Dk.

Freedom to choose Dk, αk allows to use them for accelerating practical performances. Good numerical results with problem dependent practical strategies for Dk. Majorization-Minimization idea Define Dk such that x(k) − D−1

k ∇f0(x(k)) = arg min x∈Rn F(x, x(k))

where F(x, x(k)) is a (non necessarily quadratic) auxiliary function for f0, i.e.

x f0(x) x(k) F(x, x(k)) ¯ x

F(x(k), x(k)) = f0(x(k)) and F(x, x(k)) ≥ f0(x) ∀x ∈ Rn F(¯ x) ≤ F(¯ x, x(k)) ≤ F(x(k), x(k)) = f0(x(k)) This produces a diagonal Dk whose elements are obtained from the components of ∇f0(x(k)) in several relevant cases (discrepancy functions for Gaussian, Poisson, Cauchy, multiplicative noise,...) [Yang, Oja, 2011], [Chouzenoux, Pesquet, 2016]. The convergence condition Dk → I can be fulfilled by squeezing the elements of the diagonal matrix Dk to 1 as k increases.

Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 11 / 17

SLIDE 12

Stepsize selection - Barzilai-Borwein rules

Given Dk, we would choose αk such that 1 αk Dk ≃ ∇2f0(x(k)) simulating the Taylor’s equality ∇f0(x + d) = ∇f0(x) + 1 ∇2f0(x + td)d dt ∇f0(x(k)) − ∇f0(x(k−1))

w(k−1)

≃ 1 αk Dk(x(k) − x(k−1)

s(k−1)

) αk

BB1

= arg min

α 1

αDks(k−1) − w(k−1) = Dks(k−1)2 s(k−1)T Dkw(k−1) αk

BB2

= arg min

α s(k−1) − αD−1 k w(k−1) = s(k−1)T D−1 k w(k−1)

Dkw(k−1)2 Good results when the two values are alternated following an adaptive switching rule and projected onto a given interval [αmin, αmax], with 0 < αmin < αmax. Recent developments in steplength selection rules: Ritz values [Fletcher 2012]

Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 12 / 17

SLIDE 13

Numerical results

VMILA has been tested on a variety of convex and nonconvex image restoration problems. The numerical comparison shows that its performances are comparable with the ones of state-of-the-art methods such as: Chambolle-Pock (CP) method, preconditioned CP , ADMM, PidSplit+, iPiano, VMFB, FISTA... Illustration of the effects of a well tailored choice of αk, Dk: nonnegative image deconvolution in presence of Poisson noise with smooth TV regularization.

50 100 150 200 250 300 10−8 10−6 10−4 10−2 100

Iter. number

(f(x(k)) − f ∗)/f ∗

GP GP-BB SGP SGP-BB FISTA

50 100 150 200 250 300 10−6 10−5 10−4 10−3 10−2 10−1 100

Iter. number

(f(x(k)) − f ∗)/f ∗

GP GP-BB SGP SGP-BB FISTA

x∗ x(300) SGP x∗ x(300) SGP

Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 13 / 17

SLIDE 14

Two acceleration techniques: Extrapolation

FISTA iteration [Beck, Teboulle, 2008] ¯ x(k) = x(k) + γk(x(k) − x(k−1)) ← Extrapolation step z(k) = ¯ x(k) − αk∇f0(¯ x(k)) ← Forward step x(k+1) = proxαkf1(z(k)) ← Backward step It applies when f0, f1 are both convex and ∇f0 is L-Lipschitz continuous. αk can be chosen as αk = α ≤ 1

L or with a backtracking procedure, starting

from αk−1, guaranteeing that f0(x(k+1)) ≤ f0(¯ x(k)) + ∇f0(¯ x(k))T (x(k+1) − ¯ x(k)) + 1 2αk x(k+1) − ¯ x(k)2 Properly chosen extrapolation parameter γk

k→∞

− → 1 γk = k − 1 k + a, a ≥ 2 Quadratic convergence rate f(x(k)) − f ∗ = O(1/k2) with a ≥ 2 and f(x(k)) − f ∗ = o(1/k2) with a > 2 [Attouch-Peypouquet, 2016] Convergence of the iterates to a minimizer [Chambolle-Dossal, 2015] with a > 2.

Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 14 / 17

SLIDE 15

Combining Extrapolation with Scaling and Inexact computation of proximity operator

[B., Porta, Ruggiero, 2016], [B., Rebegoldi, Ruggiero,2018,submitted]

Scaled, inexact FISTA-like method ¯ x(k) = ΠDk

dom(f)(x(k) + γk(x(k) − x(k−1))) ← Extrapolation step with

scaled projection z(k) = ¯ x(k) − αkD−1

k ∇f0(¯

x(k)) ← Scaled Forward step x(k+1) ≃ proxDk

αkf1(z(k)) ← Scaled Inexact Backward step

Main features: FISTA-like extrapolation + projection + line–search computation of αk + inexact computation of the proximity operator + Dk

k→∞

− → I Convex case (∇f0(x) Lipschitz continuous): Quadratic convergence rate f(x(k)) − f ∗ = O(1/k2) with a ≥ 2 and f(x(k)) − f ∗ = o(1/k2) with a > 2 Convergence of the iterates to a minimizer with a > 2.

Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 15 / 17

SLIDE 16

Numerical results

Nonnegative image deconvolution with Total Variation regularization from Poisson data. Nonscaled (blue) vs. scaled (red,black) Scaled inexact FISTA-like method (blue) vs. inexact FISTA-like method state of the art

2 4 6 8 10 12 14 16 18 20

Time(s)

10-5 10-4 10-3 10-2 10-1 100

(f(x(k)) − f ∗)/f ∗

Relative error on the objective function

non scaled t1 = 1010 t2 = 2 t1 = 1010 t2 = 3 t1 = 1010 t2 = 4 t1 = 107 t2 = 2 t1 = 107 t2 = 3 t1 = 107 t2 = 4 5 10 15 20 25 30 Times 10-8 10-6 10-4 10-2 100 102

2 4 6 8 10 12 14 16 18 20

Time(s)

10-3 10-2 10-1 100

(f(x(k)) − f ∗)/f ∗

Relative error on the objective function

non scaled t1 = 1010 t2 = 2 t1 = 1010 t2 = 3 t1 = 1010 t2 = 4 t1 = 107 t2 = 2 t1 = 107 t2 = 3 t1 = 107 t2 = 4 5 10 15 20 25 30 Times 10-4 10-3 10-2 10-1 100 101

Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 16 / 17

SLIDE 17

Conclusions and perspectives

Line–search based algorithms allow the implementation of FB methods without the knowledge of the Lipschitz constant or even to non Lipschitz problems

To do: study a less expensive line–search strategy for the inexact FISTA, to avoid the computation of a new approximate proximal point at each trial step

the approximation of the proximity operator with an inner loop can be done in such a way that the basic convergence properties of the FB algorithms are not affected

To do: investigate other implementable criteria, especially for nonconvex problems.

variable metric techniques can be considered as acceleration techniques improving the behaviour of FB methods especially at the first iterations

To do: give a better insight on the Majorization-Minimization techniques

Codes available on www.oasis.unimore.it

Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 17 / 17