Low Rank Approximation Lecture 9
Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch
1
Low Rank Approximation Lecture 9 Daniel Kressner Chair for - - PowerPoint PPT Presentation
Low Rank Approximation Lecture 9 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Manifold optimization General setting: Aim at solving optimization problem X M r f ( X ) ,
Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch
1
General setting: Aim at solving optimization problem min
X∈Mr f(X),
where Mr is a manifold of rank-r matrices or tensors. Goal: Modify classical optimization algorithms (line search, Newton, quasi-Newton, ...) to produce iterates that stay on Mr. Advantages over ALS:
◮ No need to solve subproblems, at least for first-order methods; ◮ Can draw on concepts from classical smooth optimization (line
search strategies, convergence analysis, ...). Two valuable resources:
◮ Absil/Mahony/Sepulchre’2011:
Optimization Algorithms on Matrix
, 2008. Available from
https://press.princeton.edu/absil ◮ Manopt, a Matlab toolbox for optimization
https://manopt.org/
2
For open sets U ⊂ M, V ⊂ Rd chart is bijective function ϕ : U → V. Atlas of M into Rd is collection of charts (Uα, ϕα) such that:
◮ α Uα = M ◮ for any α, β with Uα ∩ Uβ = {∅}, change of coordinates
ϕβ ◦ ϕ−1
α
: Rd → Rd is smooth (C∞) on its domain ϕα(Uα ∩ Uβ).
Illustration taken from Wikipedia.
3
In the following, we assume that atlas is maximal. Proper definition of smooth manifold M needs further properties (topology induced by maximal atlas is Hausdorff and second-countable). See [Lee’2003] and [Absil et al.’2008]. Properties of M:
◮ finite-dimensional vector spaces are always manifolds; ◮ d = dimension of M; ◮ M does not need to be connected (in the context of smooth
◮ function f : M → R differentiable at point x ∈ M if and only if
f ◦ ϕ−1 : ϕ(U) ⊂ Rd → R is differentiable at ϕ(x) for some chart (U, ϕ) with x ∈ U.
4
Lemma
Let M be a smooth manifold and N ⊂ M an open subset. Then N is a smooth manifold (of equal dimension). Proof: Given atlas for M obtain atlas for N by selecting charts (U, ϕ) with U ⊂ N. Example: GL(n, R), the set of real invertible n × n matrices, is a smooth manifold.
EFY. Show that Rm×n ∗ , the set of real m × n matrices of full rank min{m, n}, is a smooth manifold. EFY. Show that the set of n × n symmetric positive definite matrices is a smooth manifold.
Two main classes of matrix manifolds:
◮ embedded submanifolds of Rm×n;
Example: Stiefel manifold of orthonormal bases.
◮ quotient manifolds;
Example: Grassmann manifold Rm×n
∗
/GL(n, R). Will focus on embedded submanifolds (much easier to work with).
5
Let M1, M2 be smooth manifolds and F : M1 → M2. Let x ∈ M1 and y = F(x) ∈ M2. Choose charts ϕ1, ϕ2 around x, y. Then coordinate representation of F given by ˆ F := ϕ2 ◦ F ◦ ϕ−1
1
: Rd1 → Rd2.
◮ F is called smooth if ˆ
F is smooth (that is, C∞).
◮ rank of F at x ∈ M1 defined as the rank of D ˆ
F(ϕ(x1)) (Jacobian
F at ϕ(x1))
◮ F is called an immersion if its rank equals d1 at every x ∈ M1. ◮ F is called a submersion if its rank equals d2 at every x ∈ M1.
6
Subset N ⊂ M is called an embedded submanifold of dimension k in M if for each point p ∈ N there is a chart (U, ϕ) in M such that all elements of U ∩ N are obtained by varying first k coordinates only. (See Chapter 5 of [Lee’2003] for more details.)
Theorem
Let M, N be smooth manifolds and let F : M → N be a smooth map with constant rank ℓ. Then each level set F −1(y) := {x ∈ M : F(x) = y} is a closed embedded submanifold of codimension ℓ in M. Corollaries:
◮ If F : M → N is a submersion then each level is a closed
embedded submanifold of codimension equal to the dimension of N.
◮ In fact, by open submanifold lemma, only need to check full rank
condition of submersion for points in the level set (replace M by the open set for which F has full rank).
7
For m ≥ n, consider the set of all m × n matrices with orthonormal columns: St(m, n) := {X ∈ Rm×n : X TX = In}.
Corollary
St(m, n) is an embedded submanifold of Rm×n. Proof: Define F : Rm×n → symm(n) as F : X → X TX, where symm(n) denotes set of n × n symmetric matrices. At X ∈ St(m, n), consider Jacobian DF(X) : H → X TH + HTX. Given symmetric Y ∈ Rn×n, set H = XY/2. Then DF(X)[H] = Y; thus DF(X) is surjective.
EFY. What is the dimension of the Stiefel manifold? 8
Locality of definition of embedded submanifolds implies the following lemma (Lemma 5.5 in [Lee’2003]).
Lemma
Let N be subset of smooth manifold M. Suppose every point p ∈ N has a neighborhood U ⊂ M such that U ∩ N is an embedded submanifold of U. Then N is an embedded submanifold of M.
Theorem
Given m ≥ n, the set Mk = {A ∈ Rm×n : rank(A) = k} is an embedded submanifold of Rm×n for every 0 ≤ k ≤ n.
9
Choose arbitrary A0 ∈ Mk. After a suitable permutation, may assume w.l.o.g. that A0 = A11 A12 A21 A22
A11 ∈ Rk×k is invertible. This property remains true in an open neighborhood U ⊂ Rm×n of A0. Factorize A ∈ U as A =
A21A−1
11
I A11 A22 − A21A−1
11 A12
I A−1
11 A12
I
Define F : U → R(m−k)×(n−k) as F : A → A22 − A21A−1
11 A12. Then
F −1(0) = U ∩ Mk.
10
For arbitrary Y ∈ R(m−k)×(n−k), we obtain that DF(A) Y
Thus, F is a submersion. In turn, U ∩ Mk is an embedded submanifold of U. By lemma, Mk is an embedded submanifold of Rm×n.
EFY. What is the dimension of Mk ? EFY. Is Mk connected? EFY. Prove that the set of symmetric rank-k matrices is an embedded submanifold of Rn×n. Is this manifold connected? 11
In the following, much of the discussion restricted to submanifolds M embedded in vector space V with inner product ·, · and induced norm · . Given smooth curve γ : R → M with x = γ(0), we call γ′(0) ∈ V a tangent vector at x. The tangent space TxM ⊂ V is the set of all tangent vectors at x.
Lemma
TxM is a subspace of V.
γ1, γ2 such that x = γ1(0) = γ2(0) and γ′
1(0) = v1, γ′ 2(0) = v2. To
show that αv1 + βv2 for α, β ∈ R is again a tangent vector, consider chart (U, ϕ) around x such that ϕ(x) = 0. Define γ(t) = ϕ−1(αϕ(γ1(t)) + βϕ(γ2(t))) for t sufficiently close to 0. Then γ(0) = x and γ′(0) = αv1 + βv2.
EFY. Prove that the dimension of Tx M equals the dimension of M using a coordinate chart. 12
Application of definition to Stiefel manifold. Let γ(t) = X + tY + O(t2) be a smooth curve with X ∈ St(m, n). To ensure that γ(t) ∈ St(m, n), we require In = γ(t)Tγ(t) = (X+tY)T(X+tY)+O(t2) = In+t(X TY+Y TX)+O(t2). Thus, X TY + Y TX = 0 characterizes tangent space: TxSt(m, n) = {Y ∈ Rm×n : X TY = −Y TX} = {XW + X⊥W⊥ : W ∈ Rn×n, W = −W T, W⊥ ∈ R(m−n)×n} where the columns of X⊥ form basis of span(X)⊥
13
When M is defined (at least locally) as level set of constant rank function F : V → RN, we have TxM = ker(DF(x)).
γ(0) = x and γ′(0) = v. Then, by chain rule, DF(x)[v] = DF(x)[γ′(0)] = ∂ ∂t F(γ(t))
= 0, because F is constant on M. Thus, TxM ⊂ ker(DF(x)), which completes the proof by counting dimensions.
14
Recall that Mk was obtained as level set of local submersion F : A → A22 − A21A−1
11 A12.
Given A ∈ Mk consider SVD A = U U⊥ Σ V V⊥ T . We have DF Σ
Thus, H is in the kernel if and only if H22 = 0. In terms of A this implies TAMk = ker(DF(A)) = U U⊥ Rk×k Rk×(n−k) R(m−k)×k V V⊥ T = {UMV T + UpV T + UV T
p : M ∈ Rk×k, UT p U = V T p V = 0}.
EFY. Compute the tangent space for the embedded submanifold of rank-k symmetric matrices. 15
For submanifold M embedded in vector space V: Inner product ·, ·
manifold.1 The (Riemannian) gradient of smooth f : M → R at x ∈ M is defined as the unique element grad f(x) ∈ TxM that satisfies grad f(x), ξ = Df(x)[ξ], ∀ξ ∈ TxM.
EFY. Prove that the Riemannian gradient satisfies the steepest ascent property grad f(x) grad f(x)2 = arg max ξ∈Tx M ξ=1 Df(x)[ξ].
1In general, for a Riemannian manifold one needs to have an inner product on TxM
that varies smoothly wrt x.
16
For submanifold M embedded in vector space V: The (Euclidean) gradient of f in V admits the decomposition ∇f(x) = Px∇f(x) + P⊥
x ∇f(x),
where Px, P⊥
x are the orthogonal projections onto TxM, T ⊥ x M. For
every ξ ∈ TxM we have Df(x)[ξ] = ∇f(x), ξ = Px∇f(x), ξ + P⊥
x ∇f(x), ξ
= Px∇f(x), ξ, where we used that P⊥
x ∇f(x) ⊥ ξ. Hence,
grad f(x) = Px∇f(x). The Riemannian gradient is the orthogonal projection of the Euclidean gradient onto the tangent space.
17
Example: Given symmetric n × n matrix A, consider trace
min
X∈St(n,k) trace(X TAX)
Study first-order perturbation trace((X + H)TA(X + H)) − trace(X TAX) = trace(HTAX) + trace(X TAH) + O(H2) = 2H, AX + O(H2). Euclidean gradient at X given by 2AX. Note that skew(W) = (W − W T)/2 is orth projection on skew-symmetric matrices. Thus, PX(Z) = (I − XX T)Z + X · skew(X TZ). grad f(X) = PX(∇f(X)) = 2(I − XX T)AX + 2X · skew(X TAX) = 2(AX − X X TAX).
18
Example: For A ∈ Mk consider SVD A = UΣV T with Σ ∈ Rk×k. Define orthogonal projections onto span(U), span(V), and their complements: PU = UUT, P⊥
U = I − UUT, PV = VV T, P⊥ V = I − VV T.
Recall that TAMk = {UMV T + UpV T + UV T
p : M ∈ Rk×k, UT p U = V T p V = 0}
The three terms of the sum are orthogonal to each other and can thus be considered separately Orthogonal projection onto TAMk given by PA(Z) = PUZPV + P⊥
U ZPV + PUZP⊥ V .
EFY. Compute the Riemannian gradient of f(A) = A − B2 F on Mk for given B ∈ Rm×n. 19
min
x∈RN f(x),
Line search is optimization algorithm of the form xj+1 = xj + αjηj with search direction ηj and step size αj > 0.
◮ First-order optimal choice of ηj: ηj = −∇f(xj) gradient
descent. Motivation for other choices: Faster local convergence (Newton-type methods), exact gradient computation too expensive, . . . Gradient-related search directions: ηj, ∇f(xj) < δ < 0 for all j.
20
◮ Exact line search chooses
αj = arg min
α
f(xj + αηj). Only in exceptional cases simple optimization problem, e.g., admitting closed form solution.
EFY. Derive the closed form solution for exact line search applied to min x∈Rn 1 2 xT Ax + bT x for symmetric positive A ∈ Rn×n and b ∈ Rn.
Alternative: Armijo rule. Let β ∈ (0, 1) (typically β = 1/2) and c ∈ (0, 1) (e.g., c = 10−4) be fixed parameters. Determine largest αj ∈ {1, β, β2, β3, . . .} such that f(xj + αjηj) − f(xj) ≤ cαj∇f(xj)Tηj
direction, i.e., when ηj, ∇f(xj) < 0.) More details in [J. Nocedal and S. J. Wright. Numerical optimization. Second edition. Springer Series in Operations Research and Financial Engineering. Springer, 2006].
21
min
x∈M f(x)
Cannot use line search xk+1 = xj + αjηj, simply because addition is not well defined in M. Idea: Search along smooth curve γ(α) ∈ M with γ(0) = xj and γ′(0) = ηj ∈ TxjM. Step in direction xj + αηj ∈ xj + TxjM and go back to manifold via retraction:
22
Tangent bundle TM :=
x∈M TxM
Definition
Mapping R : TM → M is called a retraction on M if for every X0 ∈ M there exists neighborhood U around (X0, 0) ∈ TM such that:
Will write Rx = R(x, ·) : TxM → M in the following. Intuition behind definition: Property 2 = retraction does nothing to elements on manifold. Property 3 = retraction preserves direction of curves. Equivalent characterization: For every tangent vector ξ ∈ TxM, the curve γ : α → Rx(αξ) satisfies γ′(0) = ξ.
EFY. What is a retraction for the manifold of invertible n × n matrices (trick question)? 23
Exponential maps are most natural choice of retraction from theoretical point of view but often too expensive/too cumbersome to compute. In practice for matrix manifolds: Retractions are often built from matrix decompositions and metric projections. Example St(n, k): Given Y ∈ Rn×k
∗
(i.e., rank(Y) = k), the economy-sized QR decomposition Y = XR, X TX = Ik, R = ❅ ❅ ❅ is unique provided that diagonal elements of R are positive. This defines a diffeomorphism φ : St(n, k) × triu+(k) → Rn×k
∗
, φ : (X, R) → XR, where triu+(k) denotes upper triangular matrix with positive diagonal
dim St(n, k) + dim triu+(k) = dim Rn×k
∗
.
24
Abstract setting: Let M be embedded submanifold of vector space V and N smooth manifold such that dim(M) + dim(N) = dim(V). Assume there is diffeomorphism φ : M × N → V∗ : (x, y) → φ(x, y) for some open subset V∗ of V. Moreover, assume ∃ neutral element id ∈ N such that φ(x, id) = x for all x ∈ M.
Lemma
Under above assumptions, Rx(η) := π1(φ−1(x + η)) is a retraction on M, where π1 is projection onto first component: π1(x, y) = x.
25
Proof of lemma. Need to verify three properties of retraction. Property 1: Immediately follows from assumptions that Rx(ξ) is defined and smooth for all ξ in a neighborhood of 0 ∈ TxM. Property 2: Rx(0) := π1(φ−1(x)) = π1(x, id) = x. Property 3: Differentiating x = π1 ◦ φ−1(φ(x, id)) we obtain for any ξ ∈ TxM that ξ = D(π1 ◦ φ−1)[Dφ(x, id)[ξ, 0]] = D(π1 ◦ φ−1)(x)[ξ] = DRx(0)[ξ].
26
For z ∈ V sufficiently close to M, metric projection is well defined: PM(z) := arg min
x∈M
z − x.
Corollary (Lewis/Malick’2008)
The map Rx(η) := PM(x + η) defines a retraction. Examples for retractions based on metric projection:
◮ For St(n, k), polar factor Y(Y TY)−1/2 of Y ∈ Rn×k ∗
defines a retraction.
◮ For rank-k matrix manifold Mk, best rank-k approximation Tk
defines a retraction. There are other choices; see [Absil/Oseledets’2015: Low-rank retractions: a survey and new results].
EFY. For all examples discussed so far, develop algorithms that efficiently realize the retraction by exploiting the structure of x + η. EFY. Find a retraction for the manifold of symmetric rank-k matrices. 27
xj+1 = Rxj(αjηj).
lim sup
k→∞
grad f(xj), ηj < 0. Canonical choice: ηj = −grad f(xj). Extension of Armijo rule. Let β ∈ (0, 1) and c ∈ (0, 1) (e.g., c = 10−4) be fixed parameters. Determine largest αj ∈ {1, β, β2, β3, . . .} such that f(Rxj(αjηj)) − f(xj) ≤ cαjgrad f(xj), ηj (1) holds.
EFY. Show that the Armijo condition (1) can always be satisfied for sufficiently small αj . 28
1: for j = 0,1,2,. . . do 2:
Pick ηj ∈ TxjM such that sequence {ηj} is gradient-related.
3:
Choose αj ∈ {1, β, β2, β3, . . .} such that Armijo condition is satisfied.
4:
Set xj+1 = Rxj(αjηj).
5: end for
Convergence theory in Section 4.3 of [Absil’2008]. We call x∗ ∈ M a critical point of f if grad f(x∗) = 0.
Theorem
Every accumulation point of {xj} is a critical point of cost function f. More can be said if manifold (or at least level set) is compact.
Corollary
Assume that L = {x ∈ M : f(x) ≤ f(x0)} is compact. Then limj→∞ grad f(xj) → 0. Note that Mk is not compact and it is not clear a priori whether L is compact..
29
30
PΩ A =
recover?
PΩ : Rm×n → Rm×n, PΩ X =
if (i, j) ∈ Ω, else. Applications: image reconstruction, image inpainting, Netflix problem
Low-rank matrix completion:
min
X
rank(X) , X ∈ Rm×n subject to PΩ X = PΩ A
31
Low-rank matrix completion: ( NP-Hard)
min
X
rank(X) , X ∈ Rm×n subject to PΩ X = PΩ A
Nuclear norm relaxation: ( convex, but expensive)
min
X
X∗ =
σi , X ∈ Rm×n subject to PΩ X = PΩ A
Robust low-rank completion: (Assume rank is known)
min
X
1 2 PΩ X − PΩ A2
F ,
X ∈ Rm×n subject to rank(X) = k
Huge body of work! Overview: http://perception.csl.illinois.edu/matrix-rank/
32
minimize
X
f(X) := 1 2PΩ(X − A)2
F
subject to X ∈ Mk :=
Rm×n → Rm×n Xij → Xij if (i, j) ∈ Ω, if (i, j) ∈ Ω. Riemannian gradient given by: grad f(X) = PTX Mk
33
Input: Initial guess X0 ∈ Mk. η0 ← −grad f(X0) α0 ← argminα f(X0 + αη0) X1 ← RX0(α0η0) for i = 1, 2, . . . do Compute gradient: ξi ← grad f(Xi) Conjugate direction by PR+ updating rule: ηi ← −ξi + βiTXi−1→Xiηi−1 Initial step size from linearized line search: αi ← argminα f(Xi + αηi) Armijo backtracking for sufficient decrease: Find smallest integer m ≥ 0 such that f(Xi) − f(RXi(2−mαiηi)) ≥ −1 · 10−4ξi, 2−mαiηi Obtain next iterate: Xi+1 ← RXi(2−mαiηi) end for Cost/iteration: O((m + n)k2 + |Ω|k) ops.
34
Conjugate gradient method requires combination of gradients for subsequent iterates: grad f(X) ∈ TXMk, grad f(Y) ∈ TYMk ⇒ grad f(X) + grad f(Y) ??? Can be addressed by vector transport: TX→Y : TXMk → TYMk TX→Y(ξ) = PTY Mk (ξ). TX→Y(ξ) Mk X η ξ TYMk TXMk Can be implemented in O((m + n)k2) ops.
35
◮ Comparison to LMAFit [Wen/Yin/Zhang’2010].
http://lmafit.blogs.rice.edu/ .
◮ Oversampling factor OS = |Ω|/(k(2n − k)). ◮ Purely academic example A = ALAT R with AL, AR = randn.
36
50 100 150 200 10
−10
10
−5
10 iteration relative residual Convergence curve: k = 40, OS = 3 n=1000 n=2000 n=4000 n=8000 n=16000 n=32000
◮ Dashed lines: LMAFit. Solid lines: Nonlinear CG. ◮ time(1 iteration of Nonlinear CG)
≈ 2× time(1 iteration of LMAFit)
37
50 100 150 200 10
−10
10
−5
10 iteration relative residual Convergence curve: n = 8000, OS = 3 k=10 k=20 k=30 k=40 k=50 k=60
◮ Dashed lines: LMAFit. Solid lines: Nonlinear CG. ◮ time(1 iteration of Nonlinear CG)
≈ 2× time(1 iteration of LMAFit)
38
◮ Comparison to LMAFit [Wen/Yin/Zhang’2010].
http://lmafit.blogs.rice.edu/ .
◮ Oversampling factor OS = |Ω|/(k(2n − k)) = 8. ◮ 8 000 × 8 000 matrix A is obtained from evaluating
f(x, y) = 1 1 + |x − y|2
39
◮ Hom: Start with k = 1 and subsequently increase k, using
previous result as initial guess.
40
Low-rank tensor completion:
min
X
rank(X) , X ∈ Rn1×n2×...×nd subject to PΩ X = PΩ A Applications:
◮ Completion of multidimensional data,
e.g. hyperspectral images, CT Scans
◮ Compression of multivariate
functions with singularities
◮ . . .
41
Mk :=
dim(Mk) =
d
kj +
d
2
◮ Mk is a smooth manifold. Discussed for more general formats in
[Holtz/Rohwedder/Schneider’2012], [Uschmajew/Vandereycken’2012]
◮ Riemannian with metric induced by standard inner product
X, Y = X(1), Y(1) (sum of element-wise product) Manifold structure used in
◮ dynamical low-rank approximation
[Koch/Lubich’2010], [Arnold/Jahnke’2012], [Lubich/Rohwedder/Schneider/Vandereycken’2012], [Khoromskij/Oseledets/Schneider’2012], . . .
◮ best multilinear approximation [Eldén/Savas’2009], [Ishteva/Absil/Van
Huffel/De Lathauwer’2011], [Curtef/Dirr/Helmke’2012]
42
Every ξ in the tangent space TX Mk at X = C ×1 U ×2 V ×3 W can be written as: ξ = S ×1 U ×2 V ×3 W + C ×1 U⊥ ×2 V ×3 W + C ×1 U ×2 V⊥ ×3 W + C ×1 U ×2 V ×3 W⊥ for some S ∈ Rk1×k2×k3, U⊥ ∈ Rn1×k1 with UT
⊥U = 0, . . .
Again, we obtain the Riemannian gradient of the objective function f(X) := 1 2 PΩ X − PΩ A2
F
by projecting the Euclidean gradient into the tangent space: grad f(X) = PTX Mk(PΩ X − PΩ A)
43
Candidate for retraction: Metric projection RX (ξ) = PX (X + ξ) = arg min
Z∈Mk
X + ξ − Z. No closed-form solution available
◮ Replaced by HOSVD truncation. ◮ Seems to work fine. ◮ HOSVD truncation is a retraction
[K./Steinlechner/Vandereycken’13].
44
Input: Initial guess X0 ∈ Mk. η0 ← −grad f(X0) α0 ← argminα f(X0 + αη0) X1 ← RX0(α0η0) for i = 1, 2, . . . do Compute gradient: ξi ← grad f(Xi) Conjugate direction by PR+ updating rule: ηi ← −ξi + βiTXi−1→Xiηi−1 Initial step size from linearized line search: αi ← argminα f(Xi + αηi) Armijo backtracking for sufficient decrease: Find smallest integer m ≥ 0 such that f(Xi) − f(RXi(2−mαiηi)) ≥ −1 · 10−4ξi, 2−mαiηi Obtain next iterate: Xi+1 ← RXi(2−mαiηi) end for Cost/iteration: O(nkd + |Ω|kd−1) ops.
45
199 × 199 × 150 tensor from scaled CT data set “INCISIX”, (taken from OSIRIX MRI/CT data base
[www.osirix-viewer.com/datasets/]) Slice of original tensor HOSVD approx. of rank 21 Sampled tensor (6.7%) Low-rank completion of rank 21
Compares very well with existing results w.r.t. low-rank recovery and speed, e.g., [Gandy/Recht/Yamada/’2011].
46
Set of photographs, (204 × 268 px) taken across a large range of
Foster et al.’2004]
Stacked into a tensor of size 204 × 268 × 33
Completed Tensor, 16th Slice Final Rank is k = [50 50 6] 10% of the Original Hyperspectral Imega Tensor, 16th Slice Size of Tensor is [204, 268, 33]
Here: Only 10% of entries known; [Signoretti et al.’2011] use 50%.
47
Matrix case: O(n · logβ n) samples suffice!
[Candès/Tao’2009]
⇒ Completion of tensor by applying matrix completion to matricization: O(n2 log(n)). Gives upper bound! Tensor case: Certainly: |Ω| ≪ O(n2) In all cases of convergence exact reconstruction. Conjecture: |Ω| = O(n · logβ n)
4 4.5 5 5.5 6 6.5 7 8.5 9 9.5 10 10.5 11 11.5 12 12.5 13 log(n) log( Smallest size of needed to converge ) y = 1.2*x + 4
O(n2) O(n) O(n log n) 48
5 10 15 20 25 30 10
1610
1410
1210
1010
810
610
410
210 10
2Iteration
Noisy completion, n = 100, k = [4, 5, 6]
◮ Random 100 × 100 × 100 tensor of multilinear rank (4, 5, 6)
perturbed by white noise.
◮ Upon convergence reconstruction up to noise level.
49
◮ Only discussed first-order methods. Fine for well-conditioned
problems but slow convergence for ill-conditioned problems.
◮ Second-order methods (Newton-like) require Riemannian
Hessian: painful and:
◮ not of much help for well-conditioned problems (low-rank matrix
completion).
◮ linearized equations hard to solve efficiently for low-rank matrix and
tensor manifolds.
◮ Low-rank matrices/tensors can also be viewed as products of
quotient manifolds. Requires careful choice of metric to stay robust wrt small singular value σk [Ngo/Saad’2012], [Kasai/Mishra, ICML ’2016].
◮ Lots of open problems concerning convergence analysis of
low-rank Riemannian optimization!
50