Low-Rank Tensor Techniques for High-Dimensional Problems
Daniel Kressner CADMOS Chair for Numerical Algorithms and HPC MATHICSE, EPFL
1
Low-Rank Tensor Techniques for High-Dimensional Problems Daniel - - PowerPoint PPT Presentation
Low-Rank Tensor Techniques for High-Dimensional Problems Daniel Kressner CADMOS Chair for Numerical Algorithms and HPC MATHICSE, EPFL 1 Contents What is a tensor? Applications Matrices and low rank CP and Tucker
Daniel Kressner CADMOS Chair for Numerical Algorithms and HPC MATHICSE, EPFL
1
◮ What is a tensor? ◮ Applications ◮ Matrices and low rank ◮ CP and Tucker ◮ Hierarchical Tucker ◮ Algorithms based on low-rank tensors ◮ Conclusions
2
◮ Vectors, matrices, and tensors ◮ Basic calculus with tensors ◮ Vectorization and matricization ◮ µ-mode matrix products ◮ Two classes of tensor problems
3
◮ scalar = tensor of order 0 ◮ (column) vector = tensor of order 1 ◮ matrix = tensor of order 2 ◮ tensor of order 3
= n1n2n3 numbers arranged in n1 × n2 × n3 array
4
A d-th order tensor X of size n1 × n2 × · · · × nd is a d-dimensional array with entries Xi1,i2,...,id, iµ ∈ {1, . . . , nµ} for µ = 1, . . . , d. In the following, entries of X are real (for simplicity) X ∈ Rn1×n2×···×nd. Multi-index notation: I = {1, . . . , n1} × {1, . . . , n2} × · · · × {1, . . . , nd}. Then i ∈ I is a tuple of d indices: i = (i1, i2, . . . , id). Allows to write entries of X as Xi for i ∈ I.
5
A : Rn → Rm, A : x → A · x. There is no such (unique and natural) interpretation for tensors! fundamental difficulty to define meaningful general notion of eigenvalues and singular values of tensors.
Curse of dimensionality. Example: Tensor of order 30 with n1 = n2 = · · · = nd = 10 has 1030 entries = 8 × 1012 Exabyte storage!1 For d ≫ 1: Cannot afford to store tensor explicitly (in terms of its entries).
1Global data storage calculated at 295 exabyte, see
http://www.bbc.co.uk/news/technology-12419672.
6
◮ Addition of two equal-sized tensors X, Y:
Z = X + Y ⇔ Zi = Xi + Yi ∀i ∈ I.
◮ Scalar product with α ∈ R:
Z = αX ⇔ Zi = αXi ∀i ∈ I. vector space structure.
◮ Inner product of two equal-sized tensors X, Y:
X, Y :=
xiyi. Induced norm X :=
i∈I
x2
i
1/2 For a 2nd order tensor (= matrix) this corresponds to the Frobenius norm.
7
Tensor X of size n1 × n2 × · · · × nd has n1 · n2 · · · nd entries many ways to stack entries in a (loooong) column vector. One possible choice: The vectorization of X is denoted by vec(X), where vec : Rn1×n2×···×nd → Rn1·n2···nd stacks the entries of a tensor in reverse lexicographical order into a long column vector. Remark: For d = 2, this is the usual way how matrices are vectorized. A = a11 a12 a21 a22 a31 a32 ⇒ vec(A) = a11 a21 a31 a12 a22 a32
8
Example: d = 3, n1 = 3, n2 = 2, n3 = 3. vec(X) = x111 x112 x113 x121 . . . . . . x321 x322 x323
9
◮ A matrix has two modes (column mode and row mode). ◮ A dth-order tensor X has d modes (µ = 1, µ = 2, . . ., µ = d).
Let us fix all but one mode, e.g., µ = 1: Then X(:, i2, i3, . . . , id) (abuse of MATLAB notation) is a vector of length n1 for each choice of i2, . . . , id. View tensor X as a bunch of column vectors:
10
Stack vectors into an n1 × (n2 · · · nd) matrix: X ∈ Rn1×n2×···×nd X (1) ∈ Rn1×(n2n3···nd) For µ = 1, . . . , d, the µ-mode matricization of X is a matrix X (µ) ∈ Rnµ×(n1···nµ−1nµ+1···nd) with entries
iµ1,(i1,...,iµ−1,iµ+1...id) = Xi
∀i ∈ I.
11
In MATLAB: a = rand(2,3,4,5);
◮ 1-mode matricization:
reshape(a,2,3*4*5)
◮ 2-mode matricization:
b = permute(a,[2 1 3 4]); reshape(b,3,2*4*5)
◮ 3-mode matricization:
b = permute(a,[3 1 2 4]); reshape(b,4,2*3*5)
◮ 4-mode matricization:
b = permute(a,[4 1 2 3]); reshape(b,5,2*3*4) For a matrix A ∈ Rn1×n2: A(1) = A, A(2) = AT.
12
Consider 1-mode matricization X (1) ∈ Rn1×(n2···nd): Seems to make sense to multiply an m × n1 matrix A from the left: Y (1) := A X (1) ∈ Rm×(n2···nd). Can rearrange Y (1) back into an m × n2 × · · · × nd tensor Y. This is called 1-mode matrix multiplication Y = A ◦1 X ⇔ Y (1) = AX (1) More formally (and more ugly): Yi1,i2,...,id =
n1
ai1,kXk,i2,...,id.
13
General definition of a µ-mode matrix product with A ∈ Rm×n1: Y = A ◦µ X ⇔ Y (µ) = AX (µ). More formally (and more ugly): Yi1,i2,...,id =
n1
aiµ,kXi1,...,iµ−1,k,iµ+1,...,id. For matrices:
◮ 1-mode multiplication = multiplication from the left:
Y = A ◦1 X = A X.
◮ 2-mode multiplication = transposed multiplication from the right:
Y = A ◦2 X = X AT.
14
For m × n matrix A and k × ℓ matrix B, Kronecker product defined as B ⊗ A := b11A · · · b1ℓA . . . . . . bk1A · · · bkℓA ∈ Rkm×ℓn. Most important properties (for our purposes):
15
By definition, vec(X) = vec
. Consequently, also vec(A ◦1 X) = vec
. Vectorized version of 1-mode matrix product: vec(A ◦1 X) = (In2···nd ⊗ A)vec(X) = (Ind ⊗ · · · ⊗ In2 ⊗ A) vec(X). Relation between µ-mode matrix product and matrix-vector product: vec(A ◦µ X) = (Ind ⊗ · · · ⊗ Inµ+1 ⊗ A ⊗ Inµ−1 ⊗ · · · ⊗ In1) vec(X)
16
Class 1: function-related tensors Consider a function u(ξ1, . . . , ξd) ∈ R in d variables ξ1, . . . , ξd. Tensor U ∈ Rn1×···×nd represents discretization of u:
◮ U contains function values of u evaluated on a grid; or ◮ U contains coefficients of truncated expansion in tensorized
basis functions: u(ξ1, . . . , ξd) ≈
Ui φi1(ξ1)φi2(ξ2) · · · φid(ξd). Typical setting:
◮ U only given implicitly, e.g., as the solution of a discretized PDE; ◮ seek approximations to U with very low storage and tolerable
accuracy.
◮ d may become very large.
Focus of this lecture on function-related tensors!
17
Discretization of function in d variables ξ1, . . . , ξd ∈ [0, 1] #function values grows exponentially with d
18
Ideal situation: Function f separable: f(ξ1, ξ2, . . . , ξd) = f1(ξ1)f2(ξ2) . . . fd(ξd) Kronecker product diskretized f discretized f j O(nd) memory O(dn) memory Of course: Exact separability rarely satisfied in practice.
19
Class 2: data-related tensors Tensor U ∈ Rn1×···×nd contains multi-dimensional data. Example 1: U2011,3,2 denotes the number of papers published 2011 by author 3 in the mathematical journal 2. Example 2: A video of 1000 frames with resolution 640 × 480 can be viewed as a 640 × 480 × 1000 tensor. Typical setting:
◮ entries of U given explicitly (at least partially). ◮ extraction of dominant features from U. ◮ usually moderate values for d.
20
◮ Tensor X ∈ Rn1×···×nd is a d-dimensional array. ◮ Various ways of reshaping entries of a tensor X into a vector or
matrix.
◮ µ-mode matrix multiplication can be expressed with Kronecker
products Further reading:
◮ T. Kolda and B. W. Bader. Tensor decompositions and
Software:
◮ MATLAB offers basic functionality to work with d-dimensional
arrays.
◮ MATLAB Tensor Toolbox: http://www.csmr.ca.sandia.
gov/~tgkolda/TensorToolbox/
21
◮ High-dimensional elliptic PDEs ◮ High-dimensional PDE-eigenvalue problems ◮ Quantum many-body problems ◮ Stochastic Automata Networks ◮ further applications
22
◮ Consider
−∆u = f in Ω, u|∂Ω = 0,
◮ Discretize on tensor grid.
Uniform grid for simplicity: ξ(j)
µ = jh,
h = 1 n + 1 for µ = 1, 2, 3.
◮ Approximate solution tensor U ∈ Rn×n×n:
Ui1,i2,i3 ≈ u
1 , ξ(i2) 2 , . . . , ξ(id) d
23
◮ Discretization of 1D-Laplace:
−∂xx ≈ 2 −1 −1 2 ... ... ... −1 −1 2 =: A.
◮ Application in each coordinate direction:
−∂ξ1ξ1u(ξ1, ξ2, ξ3) ≈ A ◦1 U, −∂ξ2ξ2u(ξ1, ξ2, ξ3) ≈ A ◦2 U, −∂ξ3ξ3u(ξ1, ξ2, ξ3) ≈ A ◦3 U.
◮ Hence,
−∆u ≈ A ◦1 U + A ◦2 U + A ◦3 U
−∆u ≈ (I ⊗ I ⊗ A + I ⊗ A ⊗ I + A ⊗ I ⊗ I)u.
24
Finite difference discretization of model problem −∆u = f in Ω, u|∂Ω = 0 for Ω = [0, 1]3 takes the form (I ⊗ I ⊗ A + I ⊗ A ⊗ I + A ⊗ I ⊗ I)u = f. Similar structure for finite element discretization with tensorized FEs: V ⊗ W ⊗ Z = αijkvi(ξ1)wj(ξ2)zk(ξ3) : αijk ∈ R
V = {v1(ξ1), . . . , vn(ξ1)}, W = {w1(ξ2), . . . , wn(ξ2)}, Z = {z1(ξ3), . . . , zn(ξ3)}
Galerkin discretization (KV ⊗ MW ⊗ MZ + MV ⊗ KW ⊗ MZ + MV ⊗ MW ⊗ KZ)u = f, with 1D mass/stiffness matrices MV, MW, MZ, KV, KW, KZ.
25
Finite difference discretization of model problem −∆u = f in Ω, u|∂Ω = 0 for Ω = [0, 1]d takes the form
I ⊗ · · · ⊗ I ⊗ A ⊗ I ⊗ · · · ⊗ I
To obtain such Kronecker structure in general:
◮ tensorized domain; ◮ highly structured grid; ◮ coefficients that can be written/approximated as sum of
separable functions.
26
PDE-eigenvalue problem ∆u(ξ) + V(ξ)u(ξ) = λ u(ξ) in Ω = [0, 1]d, u(ξ) =
Assumption: Potential represented as V(ξ) =
s
V (1)
j
(ξ1)V (2)
j
(ξ2) · · · V (d)
j
(ξd). finite difference discretization Au = (AL + AV)u = λu, with AL =
d
I ⊗ · · · ⊗ I
⊗AL ⊗ I ⊗ · · · ⊗ I
, AV =
s
A(d)
V,j ⊗ · · · ⊗ A(2) V,j ⊗ A(1) V,j.
27
◮ spin-1/2 particles: proton, neutron, electron, and quark. ◮ two states: spin-up, spin-down ◮ quantum state for each spin represented by vector in C2 (spinor) ◮ quantum state for system of d spins represented by vector in C2d ◮ quantum mechanical operators expressed in terms of Pauli
matrices Px = 1 1
Py = −i i
Pz = 1 −1
◮ spin Hamiltonian: sum of Kronecker products of Pauli matrices
and identities each term describes physical (inter)action of spins
◮ interaction of spins described by graph ◮ Goal: Compute ground state of spin Hamiltonian.
28
Example: 1d chain of 5 spins with periodic boundary conditions
Hamiltonian describing pairwise interaction between nearest neighbors: H = Pz ⊗ Pz ⊗ I ⊗ I ⊗ I + I ⊗ Pz ⊗ Pz ⊗ I ⊗ I + I ⊗ I ⊗ Pz ⊗ Pz ⊗ I + I ⊗ I ⊗ I ⊗ Pz ⊗ Pz + Pz ⊗ I ⊗ I ⊗ I ⊗ Pz
29
◮ Ising (ZZ) model for 1d chain of d spins with open boundary
conditions: H =
p−1
I ⊗ · · · ⊗ I ⊗ Pz ⊗ Pz ⊗ I ⊗ · · · ⊗ I +λ
p
I ⊗ · · · ⊗ I ⊗ Px ⊗ I ⊗ · · · ⊗ I λ = ratio between strength of magnetic field and pairwise interactions
◮ 1d Heisenberg (XY) model ◮ Current research: 2d models. ◮ More details in:
Huckle/Waldherr/Schulte-Herbrüggen: Computations in Quantum Tensor Networks. Schollwöck: The density-matrix renormalization group in the age
30
◮ 3 stochastic automata A1, A2, A3 having 3 states each. ◮ Vector x(i) t
∈ R3 describes probabilities of states (1), (2), (3) in Ai at time t
◮ No coupling between automata local transition x(i) t
→ x(i)
t+1
described by Markov chain: x(i)
t+1 = Eix(i) t ,
with a stochastic matrix Ei.
◮ Stationary distribution of Ai = Perron vector of Ei (eigenvector for
eigenvalue 1).
31
◮ 3 stochastic automata A1, A2, A3 having 3 states each. ◮ Coupling between automata local transition x(i) t
→ x(i)
t+1 not
described by Markov chain.
◮ Need to consider all possible combinations of states in
(A1, A2, A3): (1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 2, 1), (1, 2, 2), . . . .
◮ Vector xt ∈ R33 (or tensor X(t) ∈ R3×3×3) describes probabilities
32
◮ Transition xt → xt+1 described by Markov chain:
xt+1 = Ext, with a large stochastic matrix E.
◮ Oversimplified example:
E = I ⊗ I ⊗ E1 + I ⊗ E2 ⊗ I + E3 ⊗ I ⊗ I
. + I ⊗ E21 ⊗ E12
+ E32 ⊗ E23 ⊗ I
◮ Goal: Compute stationary distribution = Perron vector of E. ◮ More details in:
Stewart: Introduction to the Numerical Solution of Markov
Buchholz: Product Form Approximations for Communicating Markov Processes.
33
Other applications in scientific computing featuring low-rank tensor concepts:
◮ Boltzmann equation [Ibragimov/Rjasanow’2009]. ◮ Dynamical systems [Koch/Lubich’2009]. ◮ Parabolic PDEs [Andreev/Tobler’2011], [Khoromskij’2009]. ◮ Stochastic PDEs [Khoromskij/Schwab’2010],
[Matthies/Zander’2011], [Kressner/Tobler’2011], [Ballani/Grasedyck/Kluge’2011], . . .
◮ Electronic structure calculation [Chinnamsetty et al.’2007], [Flad
et al.’2009], [Khoromskij/Khoromskaja’2009], [Limpanuparb/Gill’2009], [Benedikt et al.’2011], [Mohlenkamp’2011], . . .
◮ Evaluation of boundary integrals (in BEM): [Grasedyck],
[Khoromskij/Sauter/Veit’2011].
◮ . . .
34
◮ Large diversity of applications leading to linear systems /
eigenvalue problems with Kronecker product structures.
◮ For many problems of practical interest:
Explicit storage / computation of solution infeasible.
◮ Increasing use of low-rank tensor techniques.
Heaviest use currently: DMRG for quantum many-body problems.
◮ Remark: For PDE-related applications, high dimensionality can
also be addressed during the discretization phase (sparse grids, adaptive sparse discretization, . . .). Has advantages and disadvantages.
35
◮ Singular value decomposition ◮ Separability and low rank ◮ Separability by polynomial interpolation ◮ Separability by exponential sums ◮ Low rank of snapshot matrices
36
Setting: Matrix X ∈ Rn×m, m and n too large to compute/store X explicitly. Idea: Replace X by RST with R ∈ Rn×r, S ∈ Rm×r and r ≪ m, n. X RST Memory nm nr + rm Cost
r min{m,n} (?)
min
= σk+1. with singular values σ1 ≥ σ2 ≥ · · · ≥ σmin{m,n} of X.
37
SVD: Let matrix X ∈ Rn×m and k = min{m, n}. Then ∃ orthonormal matrices U =
V =
such that X = UΣV T, Σ = diag(σ1, σ2, . . . , σk). Choose r ≤ k and partition X =
Σ1 Σ2 V1, V2 T = U1 Σ1
=:R
V T
1
+ U2Σ2V T
2 .
Then X − RST2 = Σ22 = σr+1. Good low rank approximation if singular values decay sufficiently fast. Also: span(X) ≈ span(R), span(X T) ≈ span(ST)
38
◮ Bivariate function: f(x, y) :
◮ Function values on tensor grid [x1, . . . , xn] × [y1, . . . , ym]:
F =
f(x1, y1) f(x1, y2) · · · f(x1, yn) f(x2, y1) f(x2, y2) · · · f(x2, yn) . . . . . . . . . f(xm, y1) f(xm, y2) · · · f(xm, yn)
Basic but crucial observation: f(x, y) = g(x)h(y) F =
g(x1)h(y1) · · · g(x1)h(yn) . . . . . . g(xm)h(y1) · · · g(xm)h(yn)
=
g(x1) . . . g(xm)
[ h(y1)
· · · h(yn) ]
Separability implies rank 1.
39
Approximation by sum of separable functions f(x, y) = g1(x)h1(y) + · · · + gr(x)hr(y)
+ error. Define Fr =
fr(x1, y1) · · · fr(x1, yn) . . . . . . fr(xm, y1) · · · fr(xm, yn)
. Then Fr has rank ≤ r and F − FrF ≤ √mn × error.
√ mn × error. Semi-separable approximation implies low-rank approximation.
40
Solution of approximation problem f(x, y) = g1(x)h1(y) + · · · + gr(x)hr(y) + error. not trivial; gj, hj can be chosen arbitrarily! General construction by polynomial interpolation:
Iy[f](x, y) =
r
f(x, θj)Lj(y) with Lagrange polynomials Lj of degree r − 1 on [xmin, xmax].
Ix[Iy[f]](x, y) =
r
f(ξi, θj)Li(x)Lj(y) ˆ =
r
Li,x(x)Lj,y(y), where f[f(ξi, θj)]i,j is “diagonalized” by SVD.
41
error ≤ f − Ix[Iy[f]]∞ = f − Ix[f] + Ix[f] − Ix[Iy[f]]∞ ≤ f − Ix[f]∞ + Ix∞f − Iy[f]∞ with Lebesgue constant Ix∞ ∼ log r when using Chebyshev interpolation nodes. Polynomial interpolation error typically much too pessimistic
◮ Lebesgue constants hit hard in high dimensions: (log r)d−1. ◮ Severe theoretical barriers for general smooth multivariate
functions:
zniakowski: Tractability of Multivariate Problems, Volume I and II. EMS.
42
Consider f(x, y) = 1 x + y , x, y ∈ [α, β], 0 < α < β. Apply numerical quadrature: 1 z = ∞ e−tz dt =
r
ωje−γjz + error. Inserting z = x + y 1 x + y =
r
ωje−γj(x+y) + error =
r
ωje−γjxe−γjy + error. Choice of nodes γj > 0 and weights ωj > 0 as in [Stenger’93, Braess’86, Braess/Hackbusch’05] error ≤ 8 |α| exp
rπ2 log(8β/α)
43
◮ Consider more general case of function f(x, y) := g(x + y). ◮ Approximation of g(z) with z := x + y by exponential sum
g(z) ≈
r
ωj exp(γjz) (1) for some coefficients γj, ωj ∈ R.
◮ (1) gives semi-separable approximation for f:
f(x, y) = g(x + y) ≈
r
ωj exp(γj(x + y)) =
r
ωj exp(γjx) exp(γjy).
◮ Naturally extends to arbitrarily many variables. ◮ Problem: (1) nontrivial approx problem [Braess’1986],
[Hackbusch’2006], . . .
44
Vector-valued function x(α) : [αmin, αmax] → Rn Sampling at α1, . . . , αm ∈ [αmin, αmax]: Snapshot matrix X = =
Stationary heat equation with pw constant heat conductivity σ(x, α): −∇(σ(x, α)∇u) = f in Ω = [−1, 1]2 u = 0
◮ σ(baking tray) = 1 ◮ σ(cookie) = 1 + α ◮ Undetermined parameter
α ∈ [αmin, αmax].
0.5 1 1.5 2 0.5 1 1.5 2 # Vertices : 455, # Elements : 825, # Edges : 1279
Standard FE discretization results in linearly parameter-dependent linear system (A0 + αA1)x(α) = b.
46
◮ 1 Cookie: n = 371, m = 101.
log10(singular values of snapshot matrix)
20 40 60 80 100 −20 −15 −10 −5 5
◮ Foundation of Proper Orthogonal Decomposition and Reduced
Basis Methods.
47
Polynomial approximation: x(α) = x0 + αx1 + α2x2 + · · · + αk−1xk−1 + error. Approximation error:
◮ Assume b(·), A(·) analytic x(·) analytic. ◮ Then
error ρ−k, where ρ > 1 depends on domain of analyticity of A, b. (Proof: Direct extension of classical result for scalar-valued functions.)
48
Polynomial approximation: x(α) = x0 + αx1 + α2x2 + · · · + αk−1xk−1 + error. Snapshot matrix: X =
1 1 . . . 1 α1 α2 . . . αm . . . . . . . . . αk−1
1
αk−1
2
. . . αk−1
m
+ error = matrix of rank k + error σk+1(X) ≤ error ρ−k Remark: Trivially extends to pw analytic case.
49
Example: Consider smallest singular value σ(z) and corresponding right singular vector v(z) of B(z) = A − izI for z ∈ [−1, 1].
◮ s(z) only Lipschitz
cont, but pw anal.
◮ v(z) discontinuous,
but pw anal.
◮ A = 2 × 2 block diag randn, n = 400. ◮ Snapshot matrix of singular vectors:
X =
σ(z) Singular values of X
−1 −0.5 0.5 1 0.005 0.01 0.015 0.02 0.025 0.03 z 20 40 60 80 100 10
−20
10
−15
10
−10
10
−5
10 10
5
50
Need strong singular value decay for good low-rank approximations. For function-related matrices/tensors: Strong link to semi-separable approximations. Smoothness seems to be important... at least somehow.
◮ Fortunately, smoothness is not necessary.
Piecewise smoothness can be enough.
◮ Unfortunately, smoothness is not sufficient for higher-order
tensors.
◮ Need to impose stronger regularity as dimension/order d
increases, based, e.g., on mixed weak derivatives [Yserentant: Regularity and approximability of electronic wave functions. 2010].
51
◮ CP ◮ Tucker ◮ Higher-order SVD ◮ Tensor networks
52
◮ Aim: Generalize concept of low rank from matrices to tensors. ◮ One possibility motivated by
X =
T = = a1bT
1 + a2bT 2 + · · · + aRbT R.
vectorization vec(X) = b1 ⊗ a1 + b2 ⊗ a2 + · · · + bR ⊗ aR. Canonical Polyadic decomposition of tensor X ∈ Rn1×n2×n3 defined via vec(X) = c1 ⊗ b1 ⊗ a1 + c2 ⊗ b2 ⊗ a2 + · · · + cR ⊗ bR ⊗ aR for vectors aj ∈ Rn1, bj ∈ Rn2, cj ∈ Rn3. CP directly corresponds to semi-separable approximation. Tensor rank of X = minimal possible R
53
Illustration of CP decomposition vec(X) = c1 ⊗ b1 ⊗ a1 + c2 ⊗ b2 ⊗ a2 + · · · + cR ⊗ bR ⊗ aR.
c1 a1 b1 cr ar br
X
54
◮ CP decomposition offers low data-complexity; for constant R:
linear complexity in d.
◮ For matrices:
◮ rank r is upper semi-continuous closedness property:
sequence of rank= r matrices can only converge to rank≤ r matrix.
◮ best low-rank approximation possible by successive rank-1
approximations.
◮ Robust black-box algorithms/software available (svd, Lanczos).
For tensors of order d ≥ 3:
◮ tensor rank R is not upper
semi-continuous lack of closedness
◮ successive rank-1 approximations fail ◮ all algorithms based on optimization
techniques (ALS, Gauss-Newton)
Picture taken from [Kolda/Bader’2009].
55
◮ Aim: Generalize concept of low rank from matrices to tensors. ◮ Alternative possibility motivated by
A = U · Σ · V T, U ∈ Rn1×r, V ∈ Rn2×r, Σ ∈ Rr×r. vectorization vec(X) =
Ignore diagonal structure of Σ and call it C. Tucker decomposition of tensor X ∈ Rn1×n2×n3 defined via vec(X) =
with U ∈ Rn1×r1, V ∈ Rn2×r2, W ∈ Rn3×r3, and core tensor C ∈ Rr1×r2×r3. In terms of µ-mode matrix products: X = U ◦1 V ◦2 W ◦3 C =: (U, V, W) ◦ C.
56
Illustration of Tucker decomposition X = (U, V, W) ◦ C X C
U V W
57
Consider all three matricizations: X (1) = U · C(1) ·
T, X (2) = V · C(2) ·
T, X (3) = W · C(3) ·
T. These are low rank decompositions rank
≤ r1, rank
≤ r2, rank
≤ r3. Multilinear rank of tensor X ∈ Rn1×n2×n3 defined by tuple (r1, r2, r3), with ri = rank
.
58
Goal: Approximate given tensor X by Tucker decomposition with prescribed multilinear rank (r1, r2, r3).
X (µ) = Uµ Σµ V T
µ
for µ = 1, 2, 3.
Uµ := Uµ(:, 1 : rµ) for µ = 1, 2, 3.
vec(C) :=
3 ⊗ UT 2 ⊗ UT 1
Truncated tensor produced by HOSVD [Lathauwer/De Moor/Vandewalle’2000]: vec X
Remark: Orthogonal projection X :=
µ ◦µ X.
59
Tensor X resulting from HOSVD satisfies quasi-optimality condition X − X ≤ √ dX − Xbest, where Xbest is best approximation of X with multilinear ranks (r1, . . . , rd). Proof: X − X2 = X − (π1 ◦ π2 ◦ π3)X2 = X − π1X2 + π1X − (π1 ◦ π2)X2 + · · · · · · + (π1 ◦ π2)X − (π1 ◦ π2 ◦ π3)X2 ≤ X − π1X2 + X − π2X2 + X − π3X2 Using X − πµX ≤ X − Xbest for µ = 1, 2, 3 leads to X − X2 ≤ 3 · X − Xbest2. Best approximation: See [Kolda/Bader’09].
60
For general tensors:
◮ multilinear rank r is upper semi-continuous closedness
property.
◮ HOSVD – simple and robust algorithm to obtain quasi-optimal
low-rank approximation.
◮ quasi-optimality good enough for most applications in scientific
computing.
◮ robust black-box algorithms/software available (e.g., Tensor
Toolbox).
Storage of core tensor ∼ r d curse of dimensionality
61
Tensor network = undirected graph with:
◮ each node is a tensor; ◮ each outgoing edge is a mode; ◮ each connected edge represents a contraction; example:
Zi1,i2,i3,i4 =
r
Xi1,i2,jYj,i3,i4.
2 1 3 1 2 3
◮ number of free edges = order of tensor represented by entire
network Researchers on quantum many-body problems think2 in terms of tensor networks!
2and dream
62
Examples: 1 2 3 3 1 2 1 2 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 (v) (i) (ii) (iii) (iv) (i) vector; (ii) matrix; (iii) matrix-matrix multiplication; (iv) Tucker decomposition; (v) hierarchical Tucker decomposition.
63
◮ Intro of Hierarchical Tucker Decomposition (HTD) ◮ MATLAB toolbox htucker ◮ Basic operations: µ-mode matrix multiplication, addition, . . . ◮ Advanced Operations: inner product, elementwise multiplication,
. . .
64
◮ CP offers low data complexity but difficult truncation; ◮ Tucker offers simple truncation but high data complexity.
Recently developed formats:
◮ Matrix Product State (MPS), ◮ TT decomposition, ◮ Hierarchical Tucker decomposition (HTD).
Aim to offer compromise between CP and Tucker. Focus in this lecture: HTD.
◮ L. Grasedyck. Hierarchical singular value decomposition of tensors.
SIAM J. Matrix Anal. Appl., 31(4):2029–2054, 2010.
◮ W. Hackbusch and S. Kühn. A new scheme for the tensor
◮ D. Kressner and C. Tobler. htucker – A MATLAB toolbox for the
hierarchical Tucker decomposition. In preparation. See http://www.math.ethz.ch/~ctobler.
65
Recall: µ-mode matricization for tensor X, X (µ) ∈ Rnµ×(n1···nµ−1nµ+1···nd), µ = 1, . . . , d. It is getting ugly... General matricization for mode de- composition {1, . . . , d} = t ∪ s: X (t) ∈ R(nt1···ntk )×(ns1···nsd−k ) with
(it1,...,itk ),(is1,...,isd−k ) := Xi1,...,id.
X X (1) X (1,2)
66
Singular value decomposition: X (t) = UtΣtUT
s .
Column spaces are nested t = t1 ∪ t2 ⇒ span(Ut) ⊂ span(Ut2 ⊗ Ut1) ⇒ ∃Bt : Ut = (Ut2 ⊗ Ut1)Bt. Size of Ut: Ut ∈ Rnt1···ntk ×rt with rt = rank
. For d = 4: U12 = (U2 ⊗ U1)B12 U34 = (U4 ⊗ U3)B34 vec(X) = X (1234) = (U34 ⊗ U12)B1234 ⇒ vec(X) = (U4 ⊗ U3 ⊗ U2 ⊗ U1)(B34 ⊗ B12)B1234.
67
Tree structure for d = 4:
B12 U1 U2 U3 U4 B34 B1234 (n2 × r2) (n3 × r3) (n4 × r4) (n1 × r1) (r1r2 × r12) (r1r2 × r12) (r3r4 × r34) (r12r34 × 1)
Reshape: B12 ∈ Rr1r2×r12 ⇒ B12 ∈ Rr1×r2×r12 B34 ∈ Rr3r4×r34 ⇒ B34 ∈ Rr3×r4×r34 B1234 ∈ Rr12r34×1 ⇒ B1234 ∈ Rr12×r34
68
B34 B12 U4 U3 U2 U1 B1234
◮ Often, U1, U2, U3, U4 are orthonormal. This is advantageous but
not required.
◮ Storage requirements for general d:
O(dnr) + O(dr 3), where r = max{rt}, n = max{nµ}.
69
x = htensor([4 5 6 7]) constructs zero htensor of size 4 × 5 × 6 × 7, with a balanced dimension tree. x = htensor([4 5 6 7], ’TT’) constructs zero htensor
x = htensor({U1, U2, U3}) constructs htensor from tensor in CP decomp X(i1, i2, i3) =
j U1(i1, j)U2(i2, j)U3(i3, j).
x = htenrandn([4 5 6 7]) constructs htensor of size 4 × 5 × 6 × 7, with random ranks and random entries. x = htenones([4 5 6 7]) constructs htensor of size 4 × 5 × 6 × 7, with all entries one. ...
70
Example: x is in htensor of order 4. x(1, 3, 4, 2) returns entry of X. x(1, 3, :, :) returns slice of X as an htensor. full(x) returns full tensor represented by X. (use with care) disp_tree(htenrand([5 4 6 3])) returns tree structure/ranks:
ans is an htensor of size 5 x 4 x 6 x 3 1-4 1; 6 3 1 1-2 2; 3 4 6 1 4; 5 3 2 5; 4 4 3-4 3; 3 3 3 3 6; 6 3 4 7; 3 3
spy(x) displays spy plots of Ut, Bt, on the dimension tree. change_root(x, i) switches root node.
71
plot_sv(x) plots singular values of corresponding matricizations in the dimension tree of a tensor X. Example: Singular value tree of solution to elliptic PDE with 4 parameters.
Remark: Singular values are computed from Gramians.
72
Application of matrix A ∈ Rm×nµ to mode µ of X ∈ Rn1×···×nd: Y = A ◦µ X ⇔ Y (µ) = AX (µ). Nearly trivial if X is in H-Tucker format: A ◦µ X = A ◦µ
(U1, . . . , Uµ−1, AUµ, Uµ+1, . . . , Ud) ◦ C
◮ Almost no operations required. ◮ Ranks stay the same. ◮ Orthogonality destroyed.
ttm(x, A, 2) applies matrix A to htensor X in mode 2. y = ttm(x, {A, B, C}, [2, 3, 4]) y = ttm(x, @(x)(fft(x)), 2) applies FFT in mode 2. y = ttm(x, {A, B, C}, [2, 3, 4], ’h’) successively applies matrices AT, BT, CT in modes 2, 3, 4.
73
Addition of two matrices in low-rank format: A = U1ΣAUT
2 ,
B = V1ΣBV T
2
⇒ A + B = U1 V1 ΣA ΣB U2 V2 T
◮ No operations required. ◮ Rank increases. ◮ Orthogonality destroyed.
74
Addition of four tensors X1, X2, X3, X4 in H-Tucker format: X1 + X2 + X3 + X4. Proceed as in matrix case by embedding factors in larger matrices.
◮ No operations required. ◮ H-Tucker rank increases. ◮ Orthogonality destroyed.
Command in htucker: x1 + x2 + x3 + x4
75
U[4]
1
U[4]
2
U[4]
3
U[4]
4
B[1]
12
B[2]
12
B[3]
12
B[4]
12
B[1]
34
B[2]
34
B[3]
34
B[4]
34
B[1]
1234
B[2]
1234
B[3]
1234
B[4]
1234
U[3]
1
U[2]
1
U[1]
1
U[3]
3
U[3]
2
U[2]
2
U[2]
3
U[1]
3
U[1]
2
U[3]
4
U[2]
4
U[1]
4
76
Any tensor X in H-Tucker format can be orthogonalized in the sense that all factors in the dimension tree, except for the root node, contain
Example: vec(X) = (U4 ⊗ U3 ⊗ U2 ⊗ U1)(B34 ⊗ B12)B1234. Step 1: QR decompositions Ut = QtRt vec(X) = (Q4 ⊗ Q3 ⊗ Q2 ⊗ Q1)( B34 ⊗ B12)B1234 with B34 := (R4 ⊗ R3)B34, B12 := (R2 ⊗ R1)B12. Step 2: QR decompositions B34 = Q34R34, B12 = Q12R12 vec(X) = (Q4 ⊗ Q3 ⊗ Q2 ⊗ Q1)(Q34 ⊗ Q12) B1234 with B1234 := (R34 ⊗ R12)B1234.
Command in htucker: x = orthog(x)
77
Inner product of two tensors X, Y ∈ Rn1×···nd: X, Y = vec(X), vec(Y) =
n1
· · ·
nd
xi1,...,idyi1,...,id. Can be performed efficiently in H-Tucker, provided that X, Y have compatible dimension trees. Example: Two tensors of order 4 X, Y = (Bx
1234)T(Bx 34 ⊗ Bx 12)T(Ux 4 ⊗ Ux 3 ⊗ Ux 2 ⊗ Ux 1 )T · · ·
· · · (Uy
4 ⊗ Uy 3 ⊗ Uy 2 ⊗ Uy 1 )(By 34 ⊗ By 12)By 1234
Norm: After X has been orthogonalized: X =
12···dF.
Possibly most accurate way to compute norm. Used in norm(x).
78
X, Y =
n1
· · ·
nd
xi1,...,idyi1,...,id.
79
80
81
82
83
(Bx
t )T
(Ux
t2)TUy t2
(Ux
t1)TUy t1
By
t
(Ux
t )TUy t = (Bx t )T
(Ux
t2)TUy t2 ⊗ (Ux t1)TUy t1
t . ◮ htucker command: innerprod(x,y) ◮ Overall cost: O(dnr 2) + O(dr 4).
84
t Ut Gt t Ut
X (t) = UtV T
t
⇒ X (t)(X (t))T = Ut V T
t Vt =:Gt
UT
t
If Ut orthonormal svd
=
85
86
87
88
89
90
Implemented in htucker command gramians(x).
91
◮ Truncation ◮ Combined addition + truncation ◮ Elementwise multiplication ◮ Elementwise reciprocal
92
Let X ∈ Rn1×n2×···×nd be explicitly given.
◮ For each tree node t, let Wt contain rt dominant left singular
vectors of X (t) and define projection πtX = WtW T
t ◦t X
⇔ πtX (t) = WtW T
t X (t). ◮ Truncated tensor:
˜ X :=
t∈TL
πt
t∈T1
πt
where Tℓ contains all nodes on level ℓ.
◮ [Grasedyck’2010]: X − ˜
X ≤ √ 2d − 3 X − Xbest. Proof similar as for HOSVD.
93
Example: vec ˜ X = (W4W T
4 ⊗ W3W T 3 ⊗ W2W T 2 ⊗ W1W T 1 )(W34W T 34 ⊗ W12W T 12)vecX
= (W4 ⊗ W3 ⊗ W2 ⊗ W1) · · · ([W T
4 ⊗ W T 3 ]W34
⊗ [W T
2 ⊗ W T 1 ]W12
) ([W T
34 ⊗ W T 12]vecX)
.
Condition max_rank takes precedence over rel_eps and abs_eps. xt = htensor.truncate_rtl(x, opts) returns truncated tensor ˜ X of a multidimensional array. Remark: There is also a significantly faster htensor.truncate_ltr (proceeds successively from leafs to roots), for which the same error bound holds [Tobler’10].
94
Let X ∈ Rn1×n2×···×nd be in H-Tucker format and orthogonalized.
◮ Compute left singular vectors of X (t) = UtV T t from eigenvectors
X (t) X (t)T = Ut V T
t Vt =Gt
UT
t ,
with reduced Gramian Gt. If St contains rt dominant eigenvectors of Gt Wt = UtSt.
◮ Traverse tree from root to leafs. In each step:
Btp StST
t
Bt Bt Btp ST
t
St ST
t ◦ Btp
St ◦ Bt
◮ In htucker: truncate(x,opts). Complexity O(dnr 2 + dr 4).
95
Sum of more than two tensors: Y = X1 + X2 + · · · + Xs. Two possibilities to incorporate truncation operator T :
Option 2 is usually significantly cheaper but may suffer from severe cancellation. Artificial example: X1, X2, X3 ∈ R101×101×101 truncated tensor grid discretizations for summands of f(x1, x2, x3) = tan(x1 + x2 + x3) + (x1 + x2 + x3)−1 − tan(x1 + x2 + x3). Error(Option 1) ≈ 10−7. Error(Option 2) ≈ 1.3. What is wrong with Option 1?
96
U[4]
1
U[4]
2
U[4]
3
U[4]
4
B[1]
12
B[2]
12
B[3]
12
B[4]
12
B[1]
34
B[2]
34
B[3]
34
B[4]
34
B[1]
1234
B[2]
1234
B[3]
1234
B[4]
1234
U[3]
1
U[2]
1
U[1]
1
U[3]
3
U[3]
2
U[2]
2
U[2]
3
U[1]
3
U[1]
2
U[3]
4
U[2]
4
U[1]
4
◮ Orthogonalization (needed before truncation) destroys block
diagonal structure.
◮ Complexity O(dns2r 2 + ds4r 4) for s summands.
97
Idea: New variant delays orthogonalization to keep block diagonal structure in transfer tensors as long as possible. Reduces O(dns2r 2 + ds4r 4) to O(dns2r 2 + ds2r 4 + ds3r 3)
10 10
1
10
−2
10
−1
10 10
1
10
2
Number of summands Runtime [s] time truncate std time truncate sum time truncate succ. O(t4) O(t2) O(t) ◮ htucker command: add_truncate(x1 x2 x3 x4, opts).
98
Elementwise multiplication (also called Hadamard or Schur product)
2 , B = V1ΣBV T 2 :
A ⋆ B = (U1 ⊙ V1)(ΣA ⊗ ΣB)(U2 ⊙ V2)T, with the row-wise Khatri-Rao product C ⊙ D = cT
1
. . . cT
n
⊙ dT
1
. . . dT
n
= cT
1 ⊗ dT 1
. . . cT
n ⊗ dT n
◮ Orthogonality destroyed. ◮ Rank increases significantly.
But: singular value decay of ΣA ⊗ ΣB may become significantly stronger additional opportunities for truncation.
99
Elementwise multiplication of two tensors X, Y in H-Tucker format:
◮ Row-wise Khatri-Rao product of leaf matrices. ◮ “Kronecker product” of non-leaf tensors. ◮ Optional: Products are only formed after suitable truncation to
avoid excessive memory requirements. Commands in htucker: x.*y (without truncation) x.ˆ2 (without truncation) elem_mult( x, y, opt ) (with truncation)
100
Goal: Compute reciprocal of each entry in tensor X. Basic idea: Newton-Schultz iteration y0 = 1, yi+1 = yi + yi(1 − x yi), (2) converges to 1/x for 0 < x < 2. Apply (2) simultaneously to all entries. Code snippet of elem_reciprocal( x, opt ) in htucker: all_ones = htenones(size(x)); y = all_ones; for it=1:maxit xy = elem_mult( x, y ); xy = truncate( all_ones - xy ); xy = elem_mult( xy, y ); y = truncate( y + xy ); end See also [Oseledets et al. 2009].
101
Example: (x1 + x2 + x3 + x4)−1 with xi ∈ [10−3, 1].
c = laplace_core(4); U = [ones(100, 1), linspace(1e-3, 1, 100)’]; x = ttm(c, {U, U, U, U}); inv_x = elem_reciprocal(x, opts);
2 4 6 8 10 12 10
−5
10 ||y*xk − 1||/||1||
Convergence of X ⋆ Yk − 1.
Singular value tree upon conver- gence.
102
◮ HTD offers good compromise between CP and Tucker. ◮ Algorithms often quite technical but conceptually simple. ◮ Computational complexity ∼ d but often ∼ r 4:
Curse of dimensionality ⇒ curse of rank ?
◮ Important to keep in mind:
Unless d is tiny, tensor X can/should never be formed explicitly. All operations need to be performed implicitly in HTD. Can pose severe problems even for seemingly simple operations: min(X), max(X), abs(X), 1./X, . . .
103
104
◮ Inexact LOBPCG ◮ ALS / MALS
105
◮ In many practical situations, tensor X is given implicitly as
solution to linear system A(X) = B, eigenvalue problem A(X) = λX, nonlinear system, ODE, . . . Two main strategies to use low-rank tensor techniques:
with repeated low-rank truncation of iterates ( inexact CG).
◮ Straightforward to derive and implement (based, e.g., on
htucker).
◮ Hard to analyze impact of nonnegligible truncations on accuracy
and convergence.
◮ Intermediate rank growth may result in excessive computing times
and/or harm accuracy+convergence.
iteratively optimize wrt individual factors of low-rank format.
◮ Works well in practice. ◮ Convergence theory not well understood. ◮ Not straightforward to implement. 106
Goal: Compute smallest eigenvalue for ∆u(ξ) + V(ξ)u(ξ) = λ u(ξ) in Ω = [0, 1]d, u(ξ) =
Assumption: Potential represented as V(ξ) =
s
V (1)
j
(ξ1)V (2)
j
(ξ2) · · · V (d)
j
(ξd). finite difference discretization Au = (AL + AV)u = λu, with AL =
d
I ⊗ · · · ⊗ I
⊗AL ⊗ I ⊗ · · · ⊗ I
, AV =
s
A(d)
V,j ⊗ · · · ⊗ A(2) V,j ⊗ A(1) V,j.
107
LOBPCG with block size 1 [Knyazev’01] for computing smallest eigenvalue of Ax = λx, A symmetric. λ0 = x0, x0A, p0 = 0 for k = 0, 1, . . . (until converged) do rk = B−1(Axk − λkx) U =
A = UTAU, ˜ M = UTU Find eigenpair (λk+1, y), with y2 = 1, for smallest eigenvalue
A − λ ˜ M. pk+1 = y2 · rk + y3 · pk xk+1 = y1 · xk + pk+1 xk+1 ← xk+1/xk+12 end for Return (λmin, x) = (λk+1, xk+1).
108
Truncated LOBPCG with block size 1 for computing smallest eigenvalue of A(X) = λX, A symmetric, X tensor. λ0 = X0, X0A, P0 = 0 · X for k = 0, 1, . . . (until converged) do Rk = B−1(A(Xk) − λkXk), Rk ← T (Rk) U1 = Xk, U2 = Rk, U3 = Pk ˜ Aij = Ui, UjA, ˜ Mij = Ui, Uj Find eigenpair (λk+1, y), with y2 = 1, for smallest eigenvalue
A − λ ˜ M. Pk+1 = y2 · Rk + y3 · Pk Pk+1 ← T (Pk+1) Xk+1 = y1 · Xk + Pk+1 Xk+1 ← T (Xk+1) Xk+1 ← Xk+1/
end for Return (λmin, X) = (λk+1, Xk+1). T = truncation to hierarchical low rank
109
Orthogonalization In standard LOBPCG, orthogonalization of U is recommended [Knyazev 2010]. This is not practical with low-rank tensors, as ranks would grow and truncation would destroy orthogonality. Truncation Xk, Rk, Pk are truncated in every step. Moreover, application of A(·) and preconditioner B−1(·) may involve truncation during the application of these operators. Inner product Reduced matrix ˜ A is very sensitive to truncation in A(·). The computation of ˜ Ai,j = Ui, UjA must be exact.
110
PDE-eigenvalue problem with Ω = [0, π]d and sine potential V(ξ) = q ·
d
sin(ξi) for some constant q > 0. We choose d = 10, n = 128. Preconditioner: [Grasedyck 2004] A−1
L
= ∞ exp(−tAL)dt ≈
M
ωj exp(−αjA(d)
L ) ⊗ · · · ⊗ exp(−αjA(1) L ) =: B−1,
for a certain, optimized and tabulated choice of coefficients αj, ωj > 0. We choose M = 10.
111
q = 1
10 20 30 40 10
−8
10
−6
10
−4
10
−2
10 10
2
10
4
Residual Iterations 10 20 30 40 10 20 30 40 50 Maximal rank eps 1e−2 eps 1e−4 eps 1e−8
q = 1000
10 20 30 40 10
−8
10
−6
10
−4
10
−2
10 10
2
10
4
Residual Iterations 10 20 30 40 10 20 30 40 50 Maximal rank eps 1e−2 eps 1e−4 112
Originally from computational quantum physics [Schollwöck 2011], recently investigated by [Huckle et al. 2010; Oseledets, Khoromskij 2010; Holtz et al. 2010; Dolgov, Oseledets 2011] Goal: min X, A(X) X, X : X ∈ H-Tucker
node t to minimize Rayleigh quotient X,A(X)
X,X . This is done for all
nodes (a sweep), and sweeps are continued until convergence. Sketch: X (t) = UtV T
t =
t ,
vec(X) =
⇒ min yT(UT
t A Ut)y
yT(UT
t Ut)y
: y ∈ Rrtl rtr rt, y = 0
113
Consider A = Ad ⊗ · · · ⊗ A1 (Other operators can be treated similarly) Compute ˜ At := UT
t AUt =
TA
At ⊗ ˜ Atr ⊗ ˜ Atl, where ˜ Atl = UT
tl i∈tl
Ai
˜ Atr = UT
tr i∈tr
Ai
ˆ At = V T
t i∈t
Ai
Additionally ˜ Mt := UT
t Ut = V T t Vt ⊗ UT tr Utr ⊗ UT tl Utl = Mt ⊗ Mtr ⊗ Mtl,
114
A1 A3 A5 A6 A7 A8 A2 A4 ˜ A12 ˜ A34 ˆ A1234
115
Method:
◮ Select edge of tensor network. ◮ Combine tensors at the adjacent nodes to form a higher-order
tensor.
◮ Set this tensor to minimize the Rayleigh quotient. ◮ Use low-rank approximation to split new combined tensor into
two tensors at adjacent nodes of selected edge.
116
117
PDE-eigenvalue problem with Ω = [0, π]d and sine potential V(ξ) = q ·
d
sin(ξi) for some constant q > 0. Choose d = 10, n = 128, q = 1000. Preconditioner: [Grasedyck 2004] A−1
L
= ∞ exp(−tAL)dt ≈
M
ωj exp(−αjA(d)
L ) ⊗ · · · ⊗ exp(−αjA(1) L ) =: B−1,
for a certain, optimized choice of coefficients αj, ωj > 0. We choose M = 10.
118
ALS
100 200 300 400 500 10
−15
10
−10
10
−5
10 10
5
Execution time [s] 100 200 300 400 500 15 20 25 30 35 40 45 err_lambda res nr_iter
Hierarchical ranks 40. MALS
100 200 300 400 500 10
−15
10
−10
10
−5
10 10
5
Execution time [s] 100 200 300 400 500 20 40 60 80 100 err_lambda res eps rank nr_iter
Maximal hierarchical rank 30.
119
120
◮ Scientific computing with low-rank tensors rapidly evolving field
and highly technical.
◮ Precise scope of applications far from clear; many applications
remain to be explored. More analysis and comparison to alternative techniques (sparse grids, adaptive tensor discretization, Monte Carlo, . . .) needed. Some current trends:
◮ Tensorization of vectors + low rank (discrete Chebfun?) by
Hackbusch, Khoromskij, Oseledets, Tyrtishnikov, . . .
◮ Computational differential geometry on low-rank tensor manifolds
by Koch, Lubich, Schneider, Uschmajew, Vandereycken, . . .
◮ Robust low rank (Candes et al.) for tensors suitable way of
dealing with singularities?
◮ . . .
Acknowledgments: Presentation heavily benefited from joint work with Christine Tobler (ETH Zurich).
121