An Optimal Affine Invariant Smooth Minimization Algorithm.
Alexandre d’Aspremont, CNRS & ENS. Joint work with Cristobal Guzman & Martin Jaggi. Support from ERC SIPA.
Alex d’Aspremont ADGO, Santiago, Feb. 2016. 1/22
An Optimal Affine Invariant Smooth Minimization Algorithm. Alexandre - - PowerPoint PPT Presentation
An Optimal Affine Invariant Smooth Minimization Algorithm. Alexandre dAspremont , CNRS & ENS . Joint work with Cristobal Guzman & Martin Jaggi. Support from ERC SIPA. Alex dAspremont ADGO, Santiago, Feb. 2016. 1/22 A Basic Convex
Alex d’Aspremont ADGO, Santiago, Feb. 2016. 1/22
Here, f(x) is convex, smooth. Assume Q ⊂ Rn is compact, convex and simple. Alex d’Aspremont ADGO, Santiago, Feb. 2016. 2/22
the function f(x) is self-concordant, i.e. |f ′′′(x)| ≤ 2f ′′(x)3/2, the set Q has a self concordant barrier g(x).
x h(x) f(x) + t g(x)
Alex d’Aspremont ADGO, Santiago, Feb. 2016. 3/22
Empirically valid, up to constants. Independent from the dimension n. Affine invariant.
Form the Hessian. Solve the Newton (or KKT) system ∇2f(x)∆xnt = −∇f(x). Alex d’Aspremont ADGO, Santiago, Feb. 2016. 4/22
Identical Newton steps, with ∆xnt = A∆ynt Identical complexity bounds 375 (h(x0) − h∗) + 6 since h∗ = ˆ
Alex d’Aspremont ADGO, Santiago, Feb. 2016. 5/22
Newton’s method (and derivatives) solve all reasonably large problems. Beyond a certain scale, second order information is out of reach.
Alex d’Aspremont ADGO, Santiago, Feb. 2016. 6/22
s,x∈M, α∈[0,1], y=x+α(s−x)
Cf is affine invariant but the bound is suboptimal in ǫ. If f(x) has a Lipschitz gradient, the lower bound is O
√ǫ
Alex d’Aspremont ADGO, Santiago, Feb. 2016. 7/22
Choose a norm · . ∇f(x) Lipschitz with constant L w.r.t. ·
Choose a prox function d(x) for the set Q, with
Alex d’Aspremont ADGO, Santiago, Feb. 2016. 8/22
1: for k = 0, . . . , N do 2:
3:
2Ly − xk2
4:
i=0 αi[f(xi) + ∇f(xi), x − xi] + L σd(x)
5:
6: end for
Alex d’Aspremont ADGO, Santiago, Feb. 2016. 9/22
{1T x=1,x≥0}
{1T x=1,x≥0} xTAy
Euclidean prox. Pick · 2 and d(x) = x2
2/2, after regularization, the
Entropy prox. Pick · 1 and d(x) =
i xi log xi + log n, the bound becomes
Alex d’Aspremont ADGO, Santiago, Feb. 2016. 10/22
Q,
Q = xQ◦.
Alex d’Aspremont ADGO, Santiago, Feb. 2016. 11/22
Alex d’Aspremont ADGO, Santiago, Feb. 2016. 12/22
The prox p(x)2/2 has a Lipschitz continuous gradient w.r.t. the norm p(x),
The norm p(x) satisfies
Alex d’Aspremont ADGO, Santiago, Feb. 2016. 13/22
Q)
Alex d’Aspremont ADGO, Santiago, Feb. 2016. 14/22
the Lipschitz constant LαQ satisfies α2LQ ≤ LαQ. the smoothness term DQ remains unchanged. Given our choice of norm (hence LQ), LQDQ is the best possible bound.
The regularity constant decreases on a subspace F, i.e. DQ∩F ≤ DQ. From D regular spaces (Ei, · ), we can construct a 2D + 2 regular product
Alex d’Aspremont ADGO, Santiago, Feb. 2016. 15/22
Choosing · 1 as the norm and d(x) = log n + n
i=1 xi log xi as the prox
Symmetrizing the simplex into the ℓ1 ball. The space (Rn, · ∞) is 2 log n
α/2, with α = 2 log n/(2 log n − 1) and our complexity bound is
Alex d’Aspremont ADGO, Santiago, Feb. 2016. 16/22
The parameter LQ satisfies
Q,
For lp spaces for p ∈ [2, ∞], the unit balls Bp have low regularity constants,
Optimizing over cubes is harder. Alex d’Aspremont ADGO, Santiago, Feb. 2016. 17/22
Affine invariance does not imply that this complexity bound is tight. . . In fact, the worst choice of norm and prox. yields a bound in Ld(x⋆)
σ
Alex d’Aspremont ADGO, Santiago, Feb. 2016. 18/22
When p ∈ [2, ∞], we have Dp = n p−2 p . When p ∈ [1, 2], Juditsky et al. [2009, Ex. 3.2] show
2≤ρ<
p p−1
2 ρ−2(p−1) p
Alex d’Aspremont ADGO, Santiago, Feb. 2016. 19/22
Alex d’Aspremont ADGO, Santiago, Feb. 2016. 20/22
Alex d’Aspremont ADGO, Santiago, Feb. 2016. 21/22
Affine invariant complexity bound for the optimal algorithm [Nesterov, 1983]
Matches (up to polylog terms) best known lower bounds on ℓp-balls.
Optimality of product LQDQ in the general case? Matches curvature Cf? Best norm choice for non-symmetric sets Q? Systematic, tractable procedure for smoothing Q? Alex d’Aspremont ADGO, Santiago, Feb. 2016. 22/22
References Alexandre d’Aspremont, C. Guzman, and Martin Jaggi. An optimal affine invariant smooth minimization algorithm. arXiv preprint arXiv:1301.0465, 2013.
an and A. Nemirovski. On Lower Complexity Bounds for Large-Scale Smooth Convex Optimization. arXiv:1307.5001, 2013.
2008.
Optimization, 19(4):1574–1609, 2009.
372–376, 1983.
Mathematics, Philadelphia, 1994. Alex d’Aspremont ADGO, Santiago, Feb. 2016. 23/22