[PPT] - Low-Rank Tensor Techniques for High-Dimensional Problems Daniel PowerPoint Presentation

SLIDE 1

Low-Rank Tensor Techniques for High-Dimensional Problems

Daniel Kressner CADMOS Chair for Numerical Algorithms and HPC MATHICSE, EPFL

1

SLIDE 2

What is a tensor?

◮ Vectors, matrices, and tensors ◮ Basic calculus with tensors ◮ Vectorization and matricization ◮ µ-mode matrix products ◮ Two classes of tensor problems

3

SLIDE 4

Vectors, matrices, and tensors

Vector Matrix Tensor

◮ scalar = tensor of order 0 ◮ (column) vector = tensor of order 1 ◮ matrix = tensor of order 2 ◮ tensor of order 3

= n1n2n3 numbers arranged in n1 × n2 × n3 array

4

SLIDE 5

Tensors of arbitrary order

A d-th order tensor X of size n1 × n2 × · · · × nd is a d-dimensional array with entries Xi1,i2,...,id, iµ ∈ {1, . . . , nµ} for µ = 1, . . . , d. In the following, entries of X are real (for simplicity) X ∈ Rn1×n2×···×nd. Multi-index notation: I = {1, . . . , n1} × {1, . . . , n2} × · · · × {1, . . . , nd}. Then i ∈ I is a tuple of d indices: i = (i1, i2, . . . , id). Allows to write entries of X as Xi for i ∈ I.

5

SLIDE 6

Two important points

1. A matrix A ∈ Rm×n has a natural interpretation as a linear
perator in terms of matrix-vector multiplications:

A : Rn → Rm, A : x → A · x. There is no such (unique and natural) interpretation for tensors! fundamental difficulty to define meaningful general notion of eigenvalues and singular values of tensors.

2. Number of entries in tensor grows exponentially with d

Curse of dimensionality. Example: Tensor of order 30 with n1 = n2 = · · · = nd = 10 has 1030 entries = 8 × 1012 Exabyte storage!1 For d ≫ 1: Cannot afford to store tensor explicitly (in terms of its entries).

1Global data storage calculated at 295 exabyte, see

http://www.bbc.co.uk/news/technology-12419672.

6

SLIDE 7

Basic calculus

◮ Addition of two equal-sized tensors X, Y:

Z = X + Y ⇔ Zi = Xi + Yi ∀i ∈ I.

◮ Scalar product with α ∈ R:

Z = αX ⇔ Zi = αXi ∀i ∈ I. vector space structure.

◮ Inner product of two equal-sized tensors X, Y:

X, Y :=

i∈I

xiyi. Induced norm X :=

i∈I

x2

i

1/2 For a 2nd order tensor (= matrix) this corresponds to the Frobenius norm.

7

SLIDE 8

Vectorization

Tensor X of size n1 × n2 × · · · × nd has n1 · n2 · · · nd entries many ways to stack entries in a (loooong) column vector. One possible choice: The vectorization of X is denoted by vec(X), where vec : Rn1×n2×···×nd → Rn1·n2···nd stacks the entries of a tensor in reverse lexicographical order into a long column vector. Remark: For d = 2, this is the usual way how matrices are vectorized. A =   a11 a12 a21 a22 a31 a32   ⇒ vec(A) =         a11 a21 a31 a12 a22 a32        

8

SLIDE 9

Vectorization

Example: d = 3, n1 = 3, n2 = 2, n3 = 3. vec(X) =                 x111 x112 x113 x121 . . . . . . x321 x322 x323                

9

SLIDE 10

Matricization

◮ A matrix has two modes (column mode and row mode). ◮ A dth-order tensor X has d modes (µ = 1, µ = 2, . . ., µ = d).

Let us fix all but one mode, e.g., µ = 1: Then X(:, i2, i3, . . . , id) (abuse of MATLAB notation) is a vector of length n1 for each choice of i2, . . . , id. View tensor X as a bunch of column vectors:

10

SLIDE 11

Matricization

Stack vectors into an n1 × (n2 · · · nd) matrix: X ∈ Rn1×n2×···×nd X (1) ∈ Rn1×(n2n3···nd) For µ = 1, . . . , d, the µ-mode matricization of X is a matrix X (µ) ∈ Rnµ×(n1···nµ−1nµ+1···nd) with entries

X (µ)

iµ1,(i1,...,iµ−1,iµ+1...id) = Xi

∀i ∈ I.

11

SLIDE 12

Matricization

In MATLAB: a = rand(2,3,4,5);

◮ 1-mode matricization:

reshape(a,2,3*4*5)

◮ 2-mode matricization:

b = permute(a,[2 1 3 4]); reshape(b,3,2*4*5)

◮ 3-mode matricization:

b = permute(a,[3 1 2 4]); reshape(b,4,2*3*5)

◮ 4-mode matricization:

b = permute(a,[4 1 2 3]); reshape(b,5,2*3*4) For a matrix A ∈ Rn1×n2: A(1) = A, A(2) = AT.

12

SLIDE 13

µ-mode matrix products

Consider 1-mode matricization X (1) ∈ Rn1×(n2···nd): Seems to make sense to multiply an m × n1 matrix A from the left: Y (1) := A X (1) ∈ Rm×(n2···nd). Can rearrange Y (1) back into an m × n2 × · · · × nd tensor Y. This is called 1-mode matrix multiplication Y = A ◦1 X ⇔ Y (1) = AX (1) More formally (and more ugly): Yi1,i2,...,id =

n1

k=1

ai1,kXk,i2,...,id.

13

SLIDE 14

µ-mode matrix products

General definition of a µ-mode matrix product with A ∈ Rm×n1: Y = A ◦µ X ⇔ Y (µ) = AX (µ). More formally (and more ugly): Yi1,i2,...,id =

n1

k=1

aiµ,kXi1,...,iµ−1,k,iµ+1,...,id. For matrices:

◮ 1-mode multiplication = multiplication from the left:

Y = A ◦1 X = A X.

◮ 2-mode multiplication = transposed multiplication from the right:

Y = A ◦2 X = X AT.

14

SLIDE 15

Kronecker product

For m × n matrix A and k × ℓ matrix B, Kronecker product defined as B ⊗ A :=    b11A · · · b1ℓA . . . . . . bk1A · · · bkℓA    ∈ Rkm×ℓn. Most important properties (for our purposes):

1. vec(A X) = (I ⊗ A) vec(X).
2. vec(X AT) = (A ⊗ I) vec(X).
3. (B ⊗ A)(D ⊗ C) = (BD ⊗ AC).
4. Im ⊗ In = Imn.

15

SLIDE 16

µ-mode matrix products and vectorization

By definition, vec(X) = vec

X (1)

. Consequently, also vec(A ◦1 X) = vec

A X (1)

. Vectorized version of 1-mode matrix product: vec(A ◦1 X) = (In2···nd ⊗ A)vec(X) = (Ind ⊗ · · · ⊗ In2 ⊗ A) vec(X). Relation between µ-mode matrix product and matrix-vector product: vec(A ◦µ X) = (Ind ⊗ · · · ⊗ Inµ+1 ⊗ A ⊗ Inµ−1 ⊗ · · · ⊗ In1) vec(X)

16

SLIDE 17

Two classes of tensor problems

Class 1: function-related tensors Consider a function u(ξ1, . . . , ξd) ∈ R in d variables ξ1, . . . , ξd. Tensor U ∈ Rn1×···×nd represents discretization of u:

◮ U contains function values of u evaluated on a grid; or ◮ U contains coefficients of truncated expansion in tensorized

basis functions: u(ξ1, . . . , ξd) ≈

i∈I

Ui φi1(ξ1)φi2(ξ2) · · · φid(ξd). Typical setting:

◮ U only given implicitly, e.g., as the solution of a discretized PDE; ◮ seek approximations to U with very low storage and tolerable

accuracy.

◮ d may become very large.

Focus of this lecture on function-related tensors!

17

SLIDE 18

Discretization of function in d variables ξ1, . . . , ξd ∈ [0, 1] #function values grows exponentially with d

18

SLIDE 19

Separability helps

Ideal situation: Function f separable: f(ξ1, ξ2, . . . , ξd) = f1(ξ1)f2(ξ2) . . . fd(ξd) Kronecker product diskretized f discretized f j O(nd) memory O(dn) memory Of course: Exact separability rarely satisfied in practice.

19

SLIDE 20

Two classes of tensor problems

Class 2: data-related tensors Tensor U ∈ Rn1×···×nd contains multi-dimensional data. Example 1: U2011,3,2 denotes the number of papers published 2011 by author 3 in the mathematical journal 2. Example 2: A video of 1000 frames with resolution 640 × 480 can be viewed as a 640 × 480 × 1000 tensor. Typical setting:

◮ entries of U given explicitly (at least partially). ◮ extraction of dominant features from U. ◮ usually moderate values for d.

20

SLIDE 21

Summary

◮ Tensor X ∈ Rn1×···×nd is a d-dimensional array. ◮ Various ways of reshaping entries of a tensor X into a vector or

matrix.

◮ µ-mode matrix multiplication can be expressed with Kronecker

products Further reading:

◮ T. Kolda and B. W. Bader. Tensor decompositions and

applications. SIAM Rev. 51 (2009), no. 3, 455–500.

Software:

◮ MATLAB offers basic functionality to work with d-dimensional

arrays.

◮ MATLAB Tensor Toolbox: http://www.csmr.ca.sandia.

gov/~tgkolda/TensorToolbox/

21

SLIDE 22

Applications in scientific computing

◮ High-dimensional elliptic PDEs ◮ High-dimensional PDE-eigenvalue problems ◮ Quantum many-body problems ◮ Stochastic Automata Networks ◮ further applications

22

SLIDE 23

High-dimensional elliptic PDEs: 3D model problem

◮ Consider

−∆u = f in Ω, u|∂Ω = 0,

n unit cube Ω = [0, 1]3.

◮ Discretize on tensor grid.

Uniform grid for simplicity: ξ(j)

µ = jh,

h = 1 n + 1 for µ = 1, 2, 3.

◮ Approximate solution tensor U ∈ Rn×n×n:

Ui1,i2,i3 ≈ u

ξ(i1)

1 , ξ(i2) 2 , . . . , ξ(id) d

.

23

SLIDE 24

High-dimensional elliptic PDEs: 3D model problem

◮ Discretization of 1D-Laplace:

−∂xx ≈       2 −1 −1 2 ... ... ... −1 −1 2       =: A.

◮ Application in each coordinate direction:

−∂ξ1ξ1u(ξ1, ξ2, ξ3) ≈ A ◦1 U, −∂ξ2ξ2u(ξ1, ξ2, ξ3) ≈ A ◦2 U, −∂ξ3ξ3u(ξ1, ξ2, ξ3) ≈ A ◦3 U.

◮ Hence,

−∆u ≈ A ◦1 U + A ◦2 U + A ◦3 U

r in vectorized form with u = vec(U):

−∆u ≈ (I ⊗ I ⊗ A + I ⊗ A ⊗ I + A ⊗ I ⊗ I)u.

24

SLIDE 25

High-dimensional elliptic PDEs: 3D model problem

Finite difference discretization of model problem −∆u = f in Ω, u|∂Ω = 0 for Ω = [0, 1]3 takes the form (I ⊗ I ⊗ A + I ⊗ A ⊗ I + A ⊗ I ⊗ I)u = f. Similar structure for finite element discretization with tensorized FEs: V ⊗ W ⊗ Z = αijkvi(ξ1)wj(ξ2)zk(ξ3) : αijk ∈ R

with

V = {v1(ξ1), . . . , vn(ξ1)}, W = {w1(ξ2), . . . , wn(ξ2)}, Z = {z1(ξ3), . . . , zn(ξ3)}

Galerkin discretization (KV ⊗ MW ⊗ MZ + MV ⊗ KW ⊗ MZ + MV ⊗ MW ⊗ KZ)u = f, with 1D mass/stiffness matrices MV, MW, MZ, KV, KW, KZ.

25

SLIDE 26

High-dimensional elliptic PDEs: Arbitrary dimensions

Finite difference discretization of model problem −∆u = f in Ω, u|∂Ω = 0 for Ω = [0, 1]d takes the form

d
j=1

I ⊗ · · · ⊗ I ⊗ A ⊗ I ⊗ · · · ⊗ I

u = f.

To obtain such Kronecker structure in general:

◮ tensorized domain; ◮ highly structured grid; ◮ coefficients that can be written/approximated as sum of

separable functions.

26

SLIDE 27

High-dimensional PDE-eigenvalue problems

PDE-eigenvalue problem ∆u(ξ) + V(ξ)u(ξ) = λ u(ξ) in Ω = [0, 1]d, u(ξ) =

n ∂Ω.

Assumption: Potential represented as V(ξ) =

s

j=1

V (1)

j

(ξ1)V (2)

j

(ξ2) · · · V (d)

j

(ξd). finite difference discretization Au = (AL + AV)u = λu, with AL =

d

j=1

I ⊗ · · · ⊗ I

d−j times

⊗AL ⊗ I ⊗ · · · ⊗ I

j−1 times

, AV =

s

j=1

A(d)

V,j ⊗ · · · ⊗ A(2) V,j ⊗ A(1) V,j.

27

SLIDE 28

Quantum many-body problems

◮ spin-1/2 particles: proton, neutron, electron, and quark. ◮ two states: spin-up, spin-down ◮ quantum state for each spin represented by vector in C2 (spinor) ◮ quantum state for system of d spins represented by vector in C2d ◮ quantum mechanical operators expressed in terms of Pauli

matrices Px = 1 1

,

Py = −i i

,

Pz = 1 −1

.

◮ spin Hamiltonian: sum of Kronecker products of Pauli matrices

and identities each term describes physical (inter)action of spins

◮ interaction of spins described by graph ◮ Goal: Compute ground state of spin Hamiltonian.

28

SLIDE 29

Quantum many-body problems

Example: 1d chain of 5 spins with periodic boundary conditions

1 3 4 5 2

Hamiltonian describing pairwise interaction between nearest neighbors: H = Pz ⊗ Pz ⊗ I ⊗ I ⊗ I + I ⊗ Pz ⊗ Pz ⊗ I ⊗ I + I ⊗ I ⊗ Pz ⊗ Pz ⊗ I + I ⊗ I ⊗ I ⊗ Pz ⊗ Pz + Pz ⊗ I ⊗ I ⊗ I ⊗ Pz

29

SLIDE 30

Quantum many-body problems

◮ Ising (ZZ) model for 1d chain of d spins with open boundary

conditions: H =

p−1

k=1

I ⊗ · · · ⊗ I ⊗ Pz ⊗ Pz ⊗ I ⊗ · · · ⊗ I +λ

p

k=1

I ⊗ · · · ⊗ I ⊗ Px ⊗ I ⊗ · · · ⊗ I λ = ratio between strength of magnetic field and pairwise interactions

◮ 1d Heisenberg (XY) model ◮ Current research: 2d models. ◮ More details in:

Huckle/Waldherr/Schulte-Herbrüggen: Computations in Quantum Tensor Networks. Schollwöck: The density-matrix renormalization group in the age

f matrix product states.

30

SLIDE 31

Stochastic Automata Networks (SANs)

◮ 3 stochastic automata A1, A2, A3 having 3 states each. ◮ Vector x(i) t

∈ R3 describes probabilities of states (1), (2), (3) in Ai at time t

◮ No coupling between automata local transition x(i) t

→ x(i)

t+1

described by Markov chain: x(i)

t+1 = Eix(i) t ,

with a stochastic matrix Ei.

◮ Stationary distribution of Ai = Perron vector of Ei (eigenvector for

eigenvalue 1).

31

SLIDE 32

Stochastic Automata Networks (SANs)

◮ 3 stochastic automata A1, A2, A3 having 3 states each. ◮ Coupling between automata local transition x(i) t

→ x(i)

t+1 not

described by Markov chain.

◮ Need to consider all possible combinations of states in

(A1, A2, A3): (1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 2, 1), (1, 2, 2), . . . .

◮ Vector xt ∈ R33 (or tensor X(t) ∈ R3×3×3) describes probabilities

f combined states.

32

SLIDE 33

Stochastic Automata Networks (SANs)

◮ Transition xt → xt+1 described by Markov chain:

xt+1 = Ext, with a large stochastic matrix E.

◮ Oversimplified example:

E = I ⊗ I ⊗ E1 + I ⊗ E2 ⊗ I + E3 ⊗ I ⊗ I

local transition

. + I ⊗ E21 ⊗ E12

interaction between A1, A2

+ E32 ⊗ E23 ⊗ I

interaction between A2, A3

◮ Goal: Compute stationary distribution = Perron vector of E. ◮ More details in:

Stewart: Introduction to the Numerical Solution of Markov

Chains. Chapter 9.

Buchholz: Product Form Approximations for Communicating Markov Processes.

33

SLIDE 34

Further applications

Other applications in scientific computing featuring low-rank tensor concepts:

◮ Boltzmann equation [Ibragimov/Rjasanow’2009]. ◮ Dynamical systems [Koch/Lubich’2009]. ◮ Parabolic PDEs [Andreev/Tobler’2011], [Khoromskij’2009]. ◮ Stochastic PDEs [Khoromskij/Schwab’2010],

[Matthies/Zander’2011], [Kressner/Tobler’2011], [Ballani/Grasedyck/Kluge’2011], . . .

◮ Electronic structure calculation [Chinnamsetty et al.’2007], [Flad

et al.’2009], [Khoromskij/Khoromskaja’2009], [Limpanuparb/Gill’2009], [Benedikt et al.’2011], [Mohlenkamp’2011], . . .

◮ Evaluation of boundary integrals (in BEM): [Grasedyck],

[Khoromskij/Sauter/Veit’2011].

◮ . . .

34

SLIDE 35

Summary

◮ Large diversity of applications leading to linear systems /

eigenvalue problems with Kronecker product structures.

◮ For many problems of practical interest:

Explicit storage / computation of solution infeasible.

◮ Increasing use of low-rank tensor techniques.

Heaviest use currently: DMRG for quantum many-body problems.

◮ Remark: For PDE-related applications, high dimensionality can

also be addressed during the discretization phase (sparse grids, adaptive sparse discretization, . . .). Has advantages and disadvantages.

35

SLIDE 36

Approximate low-rank matrices

◮ Singular value decomposition ◮ Separability and low rank ◮ Separability by polynomial interpolation ◮ Separability by exponential sums ◮ Low rank of snapshot matrices

36

SLIDE 37

Low-rank approximation

Setting: Matrix X ∈ Rn×m, m and n too large to compute/store X explicitly. Idea: Replace X by RST with R ∈ Rn×r, S ∈ Rm×r and r ≪ m, n. X RST Memory nm nr + rm Cost

ps(m, n)
ps(m, n) ×

r min{m,n} (?)

min

X − RST2 : R ∈ Rn×r, S ∈ Rm×r

= σk+1. with singular values σ1 ≥ σ2 ≥ · · · ≥ σmin{m,n} of X.

37

SLIDE 38

Construction from singular value decomposition

SVD: Let matrix X ∈ Rn×m and k = min{m, n}. Then ∃ orthonormal matrices U =

u1, u2, . . . , uk
∈ Rn×k,

V =

v1, v2, . . . , vk
∈ Rm×k,

such that X = UΣV T, Σ = diag(σ1, σ2, . . . , σk). Choose r ≤ k and partition X =

U1, U2

Σ1 Σ2 V1, V2 T = U1 Σ1

=:R

V T

1

=:ST

+ U2Σ2V T

2 .

Then X − RST2 = Σ22 = σr+1. Good low rank approximation if singular values decay sufficiently fast. Also: span(X) ≈ span(R), span(X T) ≈ span(ST)

38

SLIDE 39

Discretization of bivariate function

◮ Bivariate function: f(x, y) :

xmin, xmax
×
ymin, ymax
→ R.

◮ Function values on tensor grid [x1, . . . , xn] × [y1, . . . , ym]:

F =    

f(x1, y1) f(x1, y2) · · · f(x1, yn) f(x2, y1) f(x2, y2) · · · f(x2, yn) . . . . . . . . . f(xm, y1) f(xm, y2) · · · f(xm, yn)

    Basic but crucial observation: f(x, y) = g(x)h(y) F =   

g(x1)h(y1) · · · g(x1)h(yn) . . . . . . g(xm)h(y1) · · · g(xm)h(yn)

   =   

g(x1) . . . g(xm)

   [ h(y1)

· · · h(yn) ]

Separability implies rank 1.

39

SLIDE 40

Separability and low rank

Approximation by sum of separable functions f(x, y) = g1(x)h1(y) + · · · + gr(x)hr(y)

=:fr (x,y)

+ error. Define Fr =   

fr(x1, y1) · · · fr(x1, yn) . . . . . . fr(xm, y1) · · · fr(xm, yn)

  . Then Fr has rank ≤ r and F − FrF ≤ √mn × error.

σr+1(F) ≤F − Fr2 ≤ F − FrF ≤

√ mn × error. Semi-separable approximation implies low-rank approximation.

40

SLIDE 41

Semi-separable approximation by polynomials

Solution of approximation problem f(x, y) = g1(x)h1(y) + · · · + gr(x)hr(y) + error. not trivial; gj, hj can be chosen arbitrarily! General construction by polynomial interpolation:

1. Lagrange interpolation of f(x, y) in y-coordinate:

Iy[f](x, y) =

r

j=1

f(x, θj)Lj(y) with Lagrange polynomials Lj of degree r − 1 on [xmin, xmax].

2. Interpolation of Iy[f] in x-coordinate:

Ix[Iy[f]](x, y) =

r

i,j=1

f(ξi, θj)Li(x)Lj(y) ˆ =

r

i=1

Li,x(x)Lj,y(y), where f[f(ξi, θj)]i,j is “diagonalized” by SVD.

41

SLIDE 42

Semi-separable approximation by polynomials

error ≤ f − Ix[Iy[f]]∞ = f − Ix[f] + Ix[f] − Ix[Iy[f]]∞ ≤ f − Ix[f]∞ + Ix∞f − Iy[f]∞ with Lebesgue constant Ix∞ ∼ log r when using Chebyshev interpolation nodes. Polynomial interpolation error typically much too pessimistic

◮ Lebesgue constants hit hard in high dimensions: (log r)d−1. ◮ Severe theoretical barriers for general smooth multivariate

functions:

E. Novak and H. Wo´

zniakowski: Tractability of Multivariate Problems, Volume I and II. EMS.

42

SLIDE 43

Semi-separable approximation of 1/(x + y)

Consider f(x, y) = 1 x + y , x, y ∈ [α, β], 0 < α < β. Apply numerical quadrature: 1 z = ∞ e−tz dt =

r

j=1

ωje−γjz + error. Inserting z = x + y 1 x + y =

r

j=1

ωje−γj(x+y) + error =

r

j=1

ωje−γjxe−γjy + error. Choice of nodes γj > 0 and weights ωj > 0 as in [Stenger’93, Braess’86, Braess/Hackbusch’05] error ≤ 8 |α| exp

−

rπ2 log(8β/α)

.

43

SLIDE 44

Semi-separable approximation by exponential sums

◮ Consider more general case of function f(x, y) := g(x + y). ◮ Approximation of g(z) with z := x + y by exponential sum

g(z) ≈

r

j=1

ωj exp(γjz) (1) for some coefficients γj, ωj ∈ R.

◮ (1) gives semi-separable approximation for f:

f(x, y) = g(x + y) ≈

r

j=1

ωj exp(γj(x + y)) =

r

j=1

ωj exp(γjx) exp(γjy).

◮ Naturally extends to arbitrarily many variables. ◮ Problem: (1) nontrivial approx problem [Braess’1986],

[Hackbusch’2006], . . .

44

SLIDE 45

Low-rank approximation of snapshot matrices

Vector-valued function x(α) : [αmin, αmax] → Rn Sampling at α1, . . . , αm ∈ [αmin, αmax]: Snapshot matrix X = =

x(α1), x(α2), . . . , x(αm)
45

SLIDE 46

Example: Baking 1 cookie

Stationary heat equation with pw constant heat conductivity σ(x, α): −∇(σ(x, α)∇u) = f in Ω = [−1, 1]2 u = 0

n ∂Ω,

◮ σ(baking tray) = 1 ◮ σ(cookie) = 1 + α ◮ Undetermined parameter

α ∈ [αmin, αmax].

0.5 1 1.5 2 0.5 1 1.5 2 # Vertices : 455, # Elements : 825, # Edges : 1279

Standard FE discretization results in linearly parameter-dependent linear system (A0 + αA1)x(α) = b.

46

SLIDE 47

Singular value decay – observation

◮ 1 Cookie: n = 371, m = 101.

log10(singular values of snapshot matrix)

20 40 60 80 100 −20 −15 −10 −5 5

◮ Foundation of Proper Orthogonal Decomposition and Reduced

Basis Methods.

47

SLIDE 48

Singular value decay – explanation

Polynomial approximation: x(α) = x0 + αx1 + α2x2 + · · · + αk−1xk−1 + error. Approximation error:

◮ Assume b(·), A(·) analytic x(·) analytic. ◮ Then

error ρ−k, where ρ > 1 depends on domain of analyticity of A, b. (Proof: Direct extension of classical result for scalar-valued functions.)

48

SLIDE 49

Singular value decay – explanation

Polynomial approximation: x(α) = x0 + αx1 + α2x2 + · · · + αk−1xk−1 + error. Snapshot matrix: X =

x(α1), x(α2), . . . , x(αm)
=
x0, x1, . . . , xk−1


    1 1 . . . 1 α1 α2 . . . αm . . . . . . . . . αk−1

1

αk−1

2

. . . αk−1

m

     + error = matrix of rank k + error σk+1(X) ≤ error ρ−k Remark: Trivially extends to pw analytic case.

49

SLIDE 50

Singular value decay – pw analytic case

Example: Consider smallest singular value σ(z) and corresponding right singular vector v(z) of B(z) = A − izI for z ∈ [−1, 1].

◮ s(z) only Lipschitz

cont, but pw anal.

◮ v(z) discontinuous,

but pw anal.

◮ A = 2 × 2 block diag randn, n = 400. ◮ Snapshot matrix of singular vectors:

X =

v(z1), v(z2), . . . , v(z100)
for equidistant samples zj ∈ [−1, 1].

σ(z) Singular values of X

−1 −0.5 0.5 1 0.005 0.01 0.015 0.02 0.025 0.03 z 20 40 60 80 100 10

−20

10

−15

10

−10

10

−5

10 10

5

50

SLIDE 51

Summary

Need strong singular value decay for good low-rank approximations. For function-related matrices/tensors: Strong link to semi-separable approximations. Smoothness seems to be important... at least somehow.

◮ Fortunately, smoothness is not necessary.

Piecewise smoothness can be enough.

◮ Unfortunately, smoothness is not sufficient for higher-order

tensors.

◮ Need to impose stronger regularity as dimension/order d

increases, based, e.g., on mixed weak derivatives [Yserentant: Regularity and approximability of electronic wave functions. 2010].

51

SLIDE 52

Low-rank tensors: CP and Tucker

◮ CP ◮ Tucker ◮ Higher-order SVD ◮ Tensor networks

52

SLIDE 53

CP decomposition

◮ Aim: Generalize concept of low rank from matrices to tensors. ◮ One possibility motivated by

X =

a1, a2, . . . , aR
b1, b2, . . . , bR

T = = a1bT

1 + a2bT 2 + · · · + aRbT R.

vectorization vec(X) = b1 ⊗ a1 + b2 ⊗ a2 + · · · + bR ⊗ aR. Canonical Polyadic decomposition of tensor X ∈ Rn1×n2×n3 defined via vec(X) = c1 ⊗ b1 ⊗ a1 + c2 ⊗ b2 ⊗ a2 + · · · + cR ⊗ bR ⊗ aR for vectors aj ∈ Rn1, bj ∈ Rn2, cj ∈ Rn3. CP directly corresponds to semi-separable approximation. Tensor rank of X = minimal possible R

53

SLIDE 54

CP decomposition

Illustration of CP decomposition vec(X) = c1 ⊗ b1 ⊗ a1 + c2 ⊗ b2 ⊗ a2 + · · · + cR ⊗ bR ⊗ aR.

c1 a1 b1 cr ar br

X

54

SLIDE 55

CP decomposition

◮ CP decomposition offers low data-complexity; for constant R:

linear complexity in d.

◮ For matrices:

◮ rank r is upper semi-continuous closedness property:

sequence of rank= r matrices can only converge to rank≤ r matrix.

◮ best low-rank approximation possible by successive rank-1

approximations.

◮ Robust black-box algorithms/software available (svd, Lanczos).

For tensors of order d ≥ 3:

◮ tensor rank R is not upper

semi-continuous lack of closedness

◮ successive rank-1 approximations fail ◮ all algorithms based on optimization

techniques (ALS, Gauss-Newton)

Picture taken from [Kolda/Bader’2009].

55

SLIDE 56

Tucker decomposition

◮ Aim: Generalize concept of low rank from matrices to tensors. ◮ Alternative possibility motivated by

A = U · Σ · V T, U ∈ Rn1×r, V ∈ Rn2×r, Σ ∈ Rr×r. vectorization vec(X) =

V ⊗ U
· vec(Σ).

Ignore diagonal structure of Σ and call it C. Tucker decomposition of tensor X ∈ Rn1×n2×n3 defined via vec(X) =

W ⊗ V ⊗ U
· vec(C)

with U ∈ Rn1×r1, V ∈ Rn2×r2, W ∈ Rn3×r3, and core tensor C ∈ Rr1×r2×r3. In terms of µ-mode matrix products: X = U ◦1 V ◦2 W ◦3 C =: (U, V, W) ◦ C.

56

SLIDE 57

Tucker decomposition

Illustration of Tucker decomposition X = (U, V, W) ◦ C X C

U V W

57

SLIDE 58

Tucker decomposition

Consider all three matricizations: X (1) = U · C(1) ·

W ⊗ V

T, X (2) = V · C(2) ·

W ⊗ U

T, X (3) = W · C(3) ·

V ⊗ U

T. These are low rank decompositions rank

X (1)

≤ r1, rank

X (2)

≤ r2, rank

X (3)

≤ r3. Multilinear rank of tensor X ∈ Rn1×n2×n3 defined by tuple (r1, r2, r3), with ri = rank

X (i)

.

58

SLIDE 59

Higher-order SVD (HOSVD)

Goal: Approximate given tensor X by Tucker decomposition with prescribed multilinear rank (r1, r2, r3).

1. Calculate SVD of matricizations:

X (µ) = Uµ Σµ V T

µ

for µ = 1, 2, 3.

2. Truncate basis matrices:

Uµ := Uµ(:, 1 : rµ) for µ = 1, 2, 3.

3. Form core tensor:

vec(C) :=

UT

3 ⊗ UT 2 ⊗ UT 1

· vec(X).

Truncated tensor produced by HOSVD [Lathauwer/De Moor/Vandewalle’2000]: vec X

:=
U3 ⊗ U2 ⊗ U1
· vec(C).

Remark: Orthogonal projection X :=

π1 ◦ π2 ◦ π3
X with πµX := UµUT

µ ◦µ X.

59

SLIDE 60

Higher-order SVD (HOSVD)

Tensor X resulting from HOSVD satisfies quasi-optimality condition X − X ≤ √ dX − Xbest, where Xbest is best approximation of X with multilinear ranks (r1, . . . , rd). Proof: X − X2 = X − (π1 ◦ π2 ◦ π3)X2 = X − π1X2 + π1X − (π1 ◦ π2)X2 + · · · · · · + (π1 ◦ π2)X − (π1 ◦ π2 ◦ π3)X2 ≤ X − π1X2 + X − π2X2 + X − π3X2 Using X − πµX ≤ X − Xbest for µ = 1, 2, 3 leads to X − X2 ≤ 3 · X − Xbest2. Best approximation: See [Kolda/Bader’09].

60

SLIDE 61

Tucker decomposition – Summary

For general tensors:

◮ multilinear rank r is upper semi-continuous closedness

property.

◮ HOSVD – simple and robust algorithm to obtain quasi-optimal

low-rank approximation.

◮ quasi-optimality good enough for most applications in scientific

computing.

◮ robust black-box algorithms/software available (e.g., Tensor

Toolbox).

Drawback:

Storage of core tensor ∼ r d curse of dimensionality

61

SLIDE 62

Tensor network diagrams

Tensor network = undirected graph with:

◮ each node is a tensor; ◮ each outgoing edge is a mode; ◮ each connected edge represents a contraction; example:

Zi1,i2,i3,i4 =

r

j=1

Xi1,i2,jYj,i3,i4.

2 1 3 1 2 3

◮ number of free edges = order of tensor represented by entire

network Researchers on quantum many-body problems think2 in terms of tensor networks!

2and dream

62

SLIDE 63

Tensor network diagrams

Examples: 1 2 3 3 1 2 1 2 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 (v) (i) (ii) (iii) (iv) (i) vector; (ii) matrix; (iii) matrix-matrix multiplication; (iv) Tucker decomposition; (v) hierarchical Tucker decomposition.

63

SLIDE 64

Low-rank tensors: Hierarchical Tucker

◮ Intro of Hierarchical Tucker Decomposition (HTD) ◮ MATLAB toolbox htucker ◮ Basic operations: µ-mode matrix multiplication, addition, . . . ◮ Advanced Operations: inner product, elementwise multiplication,

. . .

64

SLIDE 65

Introduction

◮ CP offers low data complexity but difficult truncation; ◮ Tucker offers simple truncation but high data complexity.

Recently developed formats:

◮ Matrix Product State (MPS), ◮ TT decomposition, ◮ Hierarchical Tucker decomposition (HTD).

Aim to offer compromise between CP and Tucker. Focus in this lecture: HTD.

◮ L. Grasedyck. Hierarchical singular value decomposition of tensors.

SIAM J. Matrix Anal. Appl., 31(4):2029–2054, 2010.

◮ W. Hackbusch and S. Kühn. A new scheme for the tensor

representation. J. Fourier Anal. Appl., 15(5):706–722, 2009.

◮ D. Kressner and C. Tobler. htucker – A MATLAB toolbox for the

hierarchical Tucker decomposition. In preparation. See http://www.math.ethz.ch/~ctobler.

65

SLIDE 66

More general matricizations

Recall: µ-mode matricization for tensor X, X (µ) ∈ Rnµ×(n1···nµ−1nµ+1···nd), µ = 1, . . . , d. It is getting ugly... General matricization for mode decomposition {1, . . . , d} = t ∪ s: X (t) ∈ R(nt1···ntk )×(ns1···nsd−k ) with

X (t)

(it1,...,itk ),(is1,...,isd−k ) := Xi1,...,id.

X X (1) X (1,2)

66

SLIDE 67

Hierarchical construction

Singular value decomposition: X (t) = UtΣtUT

s .

Column spaces are nested t = t1 ∪ t2 ⇒ span(Ut) ⊂ span(Ut2 ⊗ Ut1) ⇒ ∃Bt : Ut = (Ut2 ⊗ Ut1)Bt. Size of Ut: Ut ∈ Rnt1···ntk ×rt with rt = rank

X (t)

. For d = 4: U12 = (U2 ⊗ U1)B12 U34 = (U4 ⊗ U3)B34 vec(X) = X (1234) = (U34 ⊗ U12)B1234 ⇒ vec(X) = (U4 ⊗ U3 ⊗ U2 ⊗ U1)(B34 ⊗ B12)B1234.

67

SLIDE 68

Dimension tree

Tree structure for d = 4:

B12 U1 U2 U3 U4 B34 B1234 (n2 × r2) (n3 × r3) (n4 × r4) (n1 × r1) (r1r2 × r12) (r1r2 × r12) (r3r4 × r34) (r12r34 × 1)

Reshape: B12 ∈ Rr1r2×r12 ⇒ B12 ∈ Rr1×r2×r12 B34 ∈ Rr3r4×r34 ⇒ B34 ∈ Rr3×r4×r34 B1234 ∈ Rr12r34×1 ⇒ B1234 ∈ Rr12×r34

68

SLIDE 69

Dimension tree

B34 B12 U4 U3 U2 U1 B1234

◮ Often, U1, U2, U3, U4 are orthonormal. This is advantageous but

not required.

◮ Storage requirements for general d:

O(dnr) + O(dr 3), where r = max{rt}, n = max{nµ}.

69

SLIDE 70

Constructors for MATLAB class htensor

x = htensor([4 5 6 7]) constructs zero htensor of size 4 × 5 × 6 × 7, with a balanced dimension tree. x = htensor([4 5 6 7], ’TT’) constructs zero htensor

f size 4 × 5 × 6 × 7, with a TT-style dimension tree.

x = htensor({U1, U2, U3}) constructs htensor from tensor in CP decomp X(i1, i2, i3) =

j U1(i1, j)U2(i2, j)U3(i3, j).

x = htenrandn([4 5 6 7]) constructs htensor of size 4 × 5 × 6 × 7, with random ranks and random entries. x = htenones([4 5 6 7]) constructs htensor of size 4 × 5 × 6 × 7, with all entries one. ...

70

SLIDE 71

Basic functionality for MATLAB class htensor

Example: x is in htensor of order 4. x(1, 3, 4, 2) returns entry of X. x(1, 3, :, :) returns slice of X as an htensor. full(x) returns full tensor represented by X. (use with care) disp_tree(htenrand([5 4 6 3])) returns tree structure/ranks:

ans is an htensor of size 5 x 4 x 6 x 3 1-4 1; 6 3 1 1-2 2; 3 4 6 1 4; 5 3 2 5; 4 4 3-4 3; 3 3 3 3 6; 6 3 4 7; 3 3

spy(x) displays spy plots of Ut, Bt, on the dimension tree. change_root(x, i) switches root node.

71

SLIDE 72

Singular value tree

plot_sv(x) plots singular values of corresponding matricizations in the dimension tree of a tensor X. Example: Singular value tree of solution to elliptic PDE with 4 parameters.

Dim. 1, 2
Dim. 3, 4, 5
Dim. 1
Dim. 2
Dim. 3
Dim. 4, 5
Dim. 4
Dim. 5

Remark: Singular values are computed from Gramians.

72

SLIDE 73

Basic ops: µ-mode matrix multiplication

Application of matrix A ∈ Rm×nµ to mode µ of X ∈ Rn1×···×nd: Y = A ◦µ X ⇔ Y (µ) = AX (µ). Nearly trivial if X is in H-Tucker format: A ◦µ X = A ◦µ

(U1, . . . , Ud) ◦ C
=

(U1, . . . , Uµ−1, AUµ, Uµ+1, . . . , Ud) ◦ C

◮ Almost no operations required. ◮ Ranks stay the same. ◮ Orthogonality destroyed.

ttm(x, A, 2) applies matrix A to htensor X in mode 2. y = ttm(x, {A, B, C}, [2, 3, 4]) y = ttm(x, @(x)(fft(x)), 2) applies FFT in mode 2. y = ttm(x, {A, B, C}, [2, 3, 4], ’h’) successively applies matrices AT, BT, CT in modes 2, 3, 4.

73

SLIDE 74

Addition of low-rank matrices

Addition of two matrices in low-rank format: A = U1ΣAUT

2 ,

B = V1ΣBV T

2

⇒ A + B = U1 V1 ΣA ΣB U2 V2 T

◮ No operations required. ◮ Rank increases. ◮ Orthogonality destroyed.

74

SLIDE 75

Addition of low-rank tensors

Addition of four tensors X1, X2, X3, X4 in H-Tucker format: X1 + X2 + X3 + X4. Proceed as in matrix case by embedding factors in larger matrices.

◮ No operations required. ◮ H-Tucker rank increases. ◮ Orthogonality destroyed.

Command in htucker: x1 + x2 + x3 + x4

75

SLIDE 76

U[4]

1

U[4]

2

U[4]

3

U[4]

4

B[1]

12

B[2]

12

B[3]

12

B[4]

12

B[1]

34

B[2]

34

B[3]

34

B[4]

34

B[1]

1234

B[2]

1234

B[3]

1234

B[4]

1234

U[3]

1

U[2]

1

U[1]

1

U[3]

3

U[3]

2

U[2]

2

U[2]

3

U[1]

3

U[1]

2

U[3]

4

U[2]

4

U[1]

4

76

SLIDE 77

Orthogonalization

Any tensor X in H-Tucker format can be orthogonalized in the sense that all factors in the dimension tree, except for the root node, contain

rthonormal columns.

Example: vec(X) = (U4 ⊗ U3 ⊗ U2 ⊗ U1)(B34 ⊗ B12)B1234. Step 1: QR decompositions Ut = QtRt vec(X) = (Q4 ⊗ Q3 ⊗ Q2 ⊗ Q1)( B34 ⊗ B12)B1234 with B34 := (R4 ⊗ R3)B34, B12 := (R2 ⊗ R1)B12. Step 2: QR decompositions B34 = Q34R34, B12 = Q12R12 vec(X) = (Q4 ⊗ Q3 ⊗ Q2 ⊗ Q1)(Q34 ⊗ Q12) B1234 with B1234 := (R34 ⊗ R12)B1234.

Compt. requirements for general d: O(dnr 2) + O(dr 4).

Command in htucker: x = orthog(x)

77

SLIDE 78

Norms and inner products

Inner product of two tensors X, Y ∈ Rn1×···nd: X, Y = vec(X), vec(Y) =

n1

i1=1

· · ·

nd

id=1

xi1,...,idyi1,...,id. Can be performed efficiently in H-Tucker, provided that X, Y have compatible dimension trees. Example: Two tensors of order 4 X, Y = (Bx

1234)T(Bx 34 ⊗ Bx 12)T(Ux 4 ⊗ Ux 3 ⊗ Ux 2 ⊗ Ux 1 )T · · ·

· · · (Uy

4 ⊗ Uy 3 ⊗ Uy 2 ⊗ Uy 1 )(By 34 ⊗ By 12)By 1234

Norm: After X has been orthogonalized: X =

X, X = Bx

12···dF.

Possibly most accurate way to compute norm. Used in norm(x).

78

SLIDE 79

Computation of inner products

X, Y =

n1

i1=1

· · ·

nd

id=1

xi1,...,idyi1,...,id.

79

SLIDE 80

Computation of inner products

80

SLIDE 81

Computation of inner products

81

SLIDE 82

Computation of inner products

82

SLIDE 83

Computation of inner products

83

SLIDE 84

Computation of inner products – contraction step

(Bx

t )T

(Ux

t2)TUy t2

(Ux

t1)TUy t1

By

t

(Ux

t )TUy t = (Bx t )T

(Ux

t2)TUy t2 ⊗ (Ux t1)TUy t1

By

t . ◮ htucker command: innerprod(x,y) ◮ Overall cost: O(dnr 2) + O(dr 4).

84

SLIDE 85

Reduced Gramians in H-Tucker

t Ut Gt t Ut

X (t) = UtV T

t

⇒ X (t)(X (t))T = Ut V T

t Vt =:Gt

UT

t

If Ut orthonormal svd

X (t)

=

eig(Gt) (used in plot_sv).

85

SLIDE 86

Reduced Gramians in H-Tucker

86

SLIDE 87

Reduced Gramians in H-Tucker

87

SLIDE 88

Reduced Gramians in H-Tucker

88

SLIDE 89

Reduced Gramians in H-Tucker

89

SLIDE 90

Reduced Gramians in H-Tucker

90

SLIDE 91

Reduced Gramians in H-Tucker

Implemented in htucker command gramians(x).

91

SLIDE 92

Advanced operations

◮ Truncation ◮ Combined addition + truncation ◮ Elementwise multiplication ◮ Elementwise reciprocal

92

SLIDE 93

Truncation of explicit tensor

Let X ∈ Rn1×n2×···×nd be explicitly given.

◮ For each tree node t, let Wt contain rt dominant left singular

vectors of X (t) and define projection πtX = WtW T

t ◦t X

⇔ πtX (t) = WtW T

t X (t). ◮ Truncated tensor:

˜ X :=

t∈TL

πt

· · ·

t∈T1

πt

X,

where Tℓ contains all nodes on level ℓ.

◮ [Grasedyck’2010]: X − ˜

X ≤ √ 2d − 3 X − Xbest. Proof similar as for HOSVD.

93

SLIDE 94

Truncation of explicit tensor

Example: vec ˜ X = (W4W T

4 ⊗ W3W T 3 ⊗ W2W T 2 ⊗ W1W T 1 )(W34W T 34 ⊗ W12W T 12)vecX

= (W4 ⊗ W3 ⊗ W2 ⊗ W1) · · · ([W T

4 ⊗ W T 3 ]W34

=:B34

⊗ [W T

2 ⊗ W T 1 ]W12

=:B12

) ([W T

34 ⊗ W T 12]vecX)

=:B1234

.

pts.max_rank = 10 maximal rank at truncation.
pts.rel_eps = 1e-6 maximal relative truncation error.
pts.abs_eps = 1e-6 maximal absolute truncation error.

Condition max_rank takes precedence over rel_eps and abs_eps. xt = htensor.truncate_rtl(x, opts) returns truncated tensor ˜ X of a multidimensional array. Remark: There is also a significantly faster htensor.truncate_ltr (proceeds successively from leafs to roots), for which the same error bound holds [Tobler’10].

94

SLIDE 95

Truncation of H-Tucker tensor

Let X ∈ Rn1×n2×···×nd be in H-Tucker format and orthogonalized.

◮ Compute left singular vectors of X (t) = UtV T t from eigenvectors

f

X (t) X (t)T = Ut V T

t Vt =Gt

UT

t ,

with reduced Gramian Gt. If St contains rt dominant eigenvectors of Gt Wt = UtSt.

◮ Traverse tree from root to leafs. In each step:

Btp StST

t

Bt Bt Btp ST

t

St ST

t ◦ Btp

St ◦ Bt

◮ In htucker: truncate(x,opts). Complexity O(dnr 2 + dr 4).

95

SLIDE 96

Combined addition + truncation

Sum of more than two tensors: Y = X1 + X2 + · · · + Xs. Two possibilities to incorporate truncation operator T :

1. Y ≈ T (X1 + X2 + X3 + · · · + Xs)
2. Y ≈ T (· · · (T (T (X1 + X2) + X3) + · · · + Xs)

Option 2 is usually significantly cheaper but may suffer from severe cancellation. Artificial example: X1, X2, X3 ∈ R101×101×101 truncated tensor grid discretizations for summands of f(x1, x2, x3) = tan(x1 + x2 + x3) + (x1 + x2 + x3)−1 − tan(x1 + x2 + x3). Error(Option 1) ≈ 10−7. Error(Option 2) ≈ 1.3. What is wrong with Option 1?

96

SLIDE 97

Combined addition + truncation

U[4]

1

U[4]

2

U[4]

3

U[4]

4

B[1]

12

B[2]

12

B[3]

12

B[4]

12

B[1]

34

B[2]

34

B[3]

34

B[4]

34

B[1]

1234

B[2]

1234

B[3]

1234

B[4]

1234

U[3]

1

U[2]

1

U[1]

1

U[3]

3

U[3]

2

U[2]

2

U[2]

3

U[1]

3

U[1]

2

U[3]

4

U[2]

4

U[1]

4

◮ Orthogonalization (needed before truncation) destroys block

diagonal structure.

◮ Complexity O(dns2r 2 + ds4r 4) for s summands.

97

SLIDE 98

Combined addition + truncation

Idea: New variant delays orthogonalization to keep block diagonal structure in transfer tensors as long as possible. Reduces O(dns2r 2 + ds4r 4) to O(dns2r 2 + ds2r 4 + ds3r 3)

10 10

1

10

−2

10

−1

10 10

1

10

2

Number of summands Runtime [s] time truncate std time truncate sum time truncate succ. O(t4) O(t2) O(t) ◮ htucker command: add_truncate(x1 x2 x3 x4, opts).

98

SLIDE 99

Elementwise multiplication

Elementwise multiplication (also called Hadamard or Schur product)

f two low-rank matrices A = U1ΣAUT

2 , B = V1ΣBV T 2 :

A ⋆ B = (U1 ⊙ V1)(ΣA ⊗ ΣB)(U2 ⊙ V2)T, with the row-wise Khatri-Rao product C ⊙ D =    cT

1

. . . cT

n

   ⊙    dT

1

. . . dT

n

   =    cT

1 ⊗ dT 1

. . . cT

n ⊗ dT n

  

◮ Orthogonality destroyed. ◮ Rank increases significantly.

But: singular value decay of ΣA ⊗ ΣB may become significantly stronger additional opportunities for truncation.

99

SLIDE 100

Elementwise multiplication

Elementwise multiplication of two tensors X, Y in H-Tucker format:

◮ Row-wise Khatri-Rao product of leaf matrices. ◮ “Kronecker product” of non-leaf tensors. ◮ Optional: Products are only formed after suitable truncation to

avoid excessive memory requirements. Commands in htucker: x.*y (without truncation) x.ˆ2 (without truncation) elem_mult( x, y, opt ) (with truncation)

100

SLIDE 101

Elementwise reciprocal

Goal: Compute reciprocal of each entry in tensor X. Basic idea: Newton-Schultz iteration y0 = 1, yi+1 = yi + yi(1 − x yi), (2) converges to 1/x for 0 < x < 2. Apply (2) simultaneously to all entries. Code snippet of elem_reciprocal( x, opt ) in htucker: all_ones = htenones(size(x)); y = all_ones; for it=1:maxit xy = elem_mult( x, y ); xy = truncate( all_ones - xy ); xy = elem_mult( xy, y ); y = truncate( y + xy ); end See also [Oseledets et al. 2009].

101

SLIDE 102

Elementwise reciprocal

Example: (x1 + x2 + x3 + x4)−1 with xi ∈ [10−3, 1].

c = laplace_core(4); U = [ones(100, 1), linspace(1e-3, 1, 100)’]; x = ttm(c, {U, U, U, U}); inv_x = elem_reciprocal(x, opts);

2 4 6 8 10 12 10

−5

10 ||y*xk − 1||/||1||

Convergence of X ⋆ Yk − 1.

Dim. 1, 2
Dim. 3, 4
Dim. 1
Dim. 2
Dim. 3
Dim. 4

Singular value tree upon convergence.

102

SLIDE 103

Summary

◮ HTD offers good compromise between CP and Tucker. ◮ Algorithms often quite technical but conceptually simple. ◮ Computational complexity ∼ d but often ∼ r 4:

Curse of dimensionality ⇒ curse of rank ?

◮ Important to keep in mind:

Unless d is tiny, tensor X can/should never be formed explicitly. All operations need to be performed implicitly in HTD. Can pose severe problems even for seemingly simple operations: min(X), max(X), abs(X), 1./X, . . .

103

SLIDE 104

104

SLIDE 105

Algorithms based on low-rank tensors

◮ Inexact LOBPCG ◮ ALS / MALS

105

SLIDE 106

Strategies for solving tensor equations

◮ In many practical situations, tensor X is given implicitly as

solution to linear system A(X) = B, eigenvalue problem A(X) = λX, nonlinear system, ODE, . . . Two main strategies to use low-rank tensor techniques:

1. Combine existing iterative solver (e.g., CG, LOBPCG, GMRES)

with repeated low-rank truncation of iterates ( inexact CG).

◮ Straightforward to derive and implement (based, e.g., on

htucker).

◮ Hard to analyze impact of nonnegligible truncations on accuracy

and convergence.

◮ Intermediate rank growth may result in excessive computing times

and/or harm accuracy+convergence.

2. Formulate optimization problem, constrain to low-rank tensors,

iteratively optimize wrt individual factors of low-rank format.

◮ Works well in practice. ◮ Convergence theory not well understood. ◮ Not straightforward to implement. 106

SLIDE 107

Example: PDE-eigenvalue problem

Goal: Compute smallest eigenvalue for ∆u(ξ) + V(ξ)u(ξ) = λ u(ξ) in Ω = [0, 1]d, u(ξ) =

n ∂Ω.

Assumption: Potential represented as V(ξ) =

s

j=1

V (1)

j

(ξ1)V (2)

j

(ξ2) · · · V (d)

j

(ξd). finite difference discretization Au = (AL + AV)u = λu, with AL =

d

j=1

I ⊗ · · · ⊗ I

d−j times

⊗AL ⊗ I ⊗ · · · ⊗ I

j−1 times

, AV =

s

j=1

A(d)

V,j ⊗ · · · ⊗ A(2) V,j ⊗ A(1) V,j.

107

SLIDE 108

LOBPCG method

LOBPCG with block size 1 [Knyazev’01] for computing smallest eigenvalue of Ax = λx, A symmetric. λ0 = x0, x0A, p0 = 0 for k = 0, 1, . . . (until converged) do rk = B−1(Axk − λkx) U =

xk, rk, pk
˜

A = UTAU, ˜ M = UTU Find eigenpair (λk+1, y), with y2 = 1, for smallest eigenvalue

f matrix pencil ˜

A − λ ˜ M. pk+1 = y2 · rk + y3 · pk xk+1 = y1 · xk + pk+1 xk+1 ← xk+1/xk+12 end for Return (λmin, x) = (λk+1, xk+1).

108

SLIDE 109

Tensor low-rank LOBPCG

Truncated LOBPCG with block size 1 for computing smallest eigenvalue of A(X) = λX, A symmetric, X tensor. λ0 = X0, X0A, P0 = 0 · X for k = 0, 1, . . . (until converged) do Rk = B−1(A(Xk) − λkXk), Rk ← T (Rk) U1 = Xk, U2 = Rk, U3 = Pk ˜ Aij = Ui, UjA, ˜ Mij = Ui, Uj Find eigenpair (λk+1, y), with y2 = 1, for smallest eigenvalue

f matrix pencil ˜

A − λ ˜ M. Pk+1 = y2 · Rk + y3 · Pk Pk+1 ← T (Pk+1) Xk+1 = y1 · Xk + Pk+1 Xk+1 ← T (Xk+1) Xk+1 ← Xk+1/

Xk+1, Xk+1

end for Return (λmin, X) = (λk+1, Xk+1). T = truncation to hierarchical low rank

109

SLIDE 110

Implementation details

Orthogonalization In standard LOBPCG, orthogonalization of U is recommended [Knyazev 2010]. This is not practical with low-rank tensors, as ranks would grow and truncation would destroy orthogonality. Truncation Xk, Rk, Pk are truncated in every step. Moreover, application of A(·) and preconditioner B−1(·) may involve truncation during the application of these operators. Inner product Reduced matrix ˜ A is very sensitive to truncation in A(·). The computation of ˜ Ai,j = Ui, UjA must be exact.

110

SLIDE 111

Numerical Experiments - Sine potential

PDE-eigenvalue problem with Ω = [0, π]d and sine potential V(ξ) = q ·

d

i=1

sin(ξi) for some constant q > 0. We choose d = 10, n = 128. Preconditioner: [Grasedyck 2004] A−1

L

= ∞ exp(−tAL)dt ≈

M

j=−M

ωj exp(−αjA(d)

L ) ⊗ · · · ⊗ exp(−αjA(1) L ) =: B−1,

for a certain, optimized and tabulated choice of coefficients αj, ωj > 0. We choose M = 10.

111

SLIDE 112

Numerical Experiments - Sine potential

q = 1

10 20 30 40 10

−8

10

−6

10

−4

10

−2

10 10

2

10

4

Residual Iterations 10 20 30 40 10 20 30 40 50 Maximal rank eps 1e−2 eps 1e−4 eps 1e−8

q = 1000

10 20 30 40 10

−8

10

−6

10

−4

10

−2

10 10

2

10

4

Residual Iterations 10 20 30 40 10 20 30 40 50 Maximal rank eps 1e−2 eps 1e−4 112

SLIDE 113

ALS

Originally from computational quantum physics [Schollwöck 2011], recently investigated by [Huckle et al. 2010; Oseledets, Khoromskij 2010; Holtz et al. 2010; Dolgov, Oseledets 2011] Goal: min X, A(X) X, X : X ∈ H-Tucker

(rt)t∈T
, X = 0
Method: Choose one node t, fix all other nodes, set new tensor at

node t to minimize Rayleigh quotient X,A(X)

X,X . This is done for all

nodes (a sweep), and sweeps are continued until convergence. Sketch: X (t) = UtV T

t =

Utr ⊗ Utl
BtV T

t ,

vec(X) =

Vt ⊗ Utr ⊗ Utl
vec(Bt) = Ut vec(Bt).

⇒ min yT(UT

t A Ut)y

yT(UT

t Ut)y

: y ∈ Rrtl rtr rt, y = 0

.

113

SLIDE 114

Computation of reduced matrices

Consider A = Ad ⊗ · · · ⊗ A1 (Other operators can be treated similarly) Compute ˜ At := UT

t AUt =

Vt ⊗ Utr ⊗ Utl

TA

Vt ⊗ Utr ⊗ Utl
= ˆ

At ⊗ ˜ Atr ⊗ ˜ Atl, where ˜ Atl = UT

tl i∈tl

Ai

Utl,

˜ Atr = UT

tr i∈tr

Ai

Utr ,

ˆ At = V T

t i∈t

Ai

Vt.

Additionally ˜ Mt := UT

t Ut = V T t Vt ⊗ UT tr Utr ⊗ UT tl Utl = Mt ⊗ Mtr ⊗ Mtl,

114

SLIDE 115

Computation of reduced matrices

A1 A3 A5 A6 A7 A8 A2 A4 ˜ A12 ˜ A34 ˆ A1234

115

SLIDE 116

MALS

Method:

◮ Select edge of tensor network. ◮ Combine tensors at the adjacent nodes to form a higher-order

tensor.

◮ Set this tensor to minimize the Rayleigh quotient. ◮ Use low-rank approximation to split new combined tensor into

two tensors at adjacent nodes of selected edge.

116

SLIDE 117

MALS - Illustration

117

SLIDE 118

Numerical Experiments – Sine potential

PDE-eigenvalue problem with Ω = [0, π]d and sine potential V(ξ) = q ·

d

i=1

sin(ξi) for some constant q > 0. Choose d = 10, n = 128, q = 1000. Preconditioner: [Grasedyck 2004] A−1

L

= ∞ exp(−tAL)dt ≈

M

j=−M

ωj exp(−αjA(d)

L ) ⊗ · · · ⊗ exp(−αjA(1) L ) =: B−1,

for a certain, optimized choice of coefficients αj, ωj > 0. We choose M = 10.

118

SLIDE 119

Numerical Experiments – Sine potential

ALS

100 200 300 400 500 10

−15

10

−10

10

−5

10 10

5

Execution time [s] 100 200 300 400 500 15 20 25 30 35 40 45 err_lambda res nr_iter

Hierarchical ranks 40. MALS

100 200 300 400 500 10

−15

10

−10

10

−5

10 10

5

Execution time [s] 100 200 300 400 500 20 40 60 80 100 err_lambda res eps rank nr_iter

Maximal hierarchical rank 30.

119

SLIDE 120

Conclusions and Outlook

120

SLIDE 121

Conclusions and Outlook

◮ Scientific computing with low-rank tensors rapidly evolving field

and highly technical.

◮ Precise scope of applications far from clear; many applications

remain to be explored. More analysis and comparison to alternative techniques (sparse grids, adaptive tensor discretization, Monte Carlo, . . .) needed. Some current trends:

◮ Tensorization of vectors + low rank (discrete Chebfun?) by

Hackbusch, Khoromskij, Oseledets, Tyrtishnikov, . . .

◮ Computational differential geometry on low-rank tensor manifolds

by Koch, Lubich, Schneider, Uschmajew, Vandereycken, . . .

◮ Robust low rank (Candes et al.) for tensors suitable way of

dealing with singularities?

◮ . . .

Acknowledgments: Presentation heavily benefited from joint work with Christine Tobler (ETH Zurich).

121

Low-Rank Tensor Techniques for High-Dimensional Problems

Contents

What is a tensor?

Vectors, matrices, and tensors

Vector Matrix Tensor

Tensors of arbitrary order

Two important points

Basic calculus

Vectorization

Vectorization

Matricization

Matricization

Matricization

µ-mode matrix products

µ-mode matrix products

Kronecker product

µ-mode matrix products and vectorization

Two classes of tensor problems

Separability helps

Two classes of tensor problems

Summary

Applications in scientific computing

High-dimensional elliptic PDEs: 3D model problem

High-dimensional elliptic PDEs: 3D model problem

High-dimensional elliptic PDEs: 3D model problem

High-dimensional elliptic PDEs: Arbitrary dimensions

High-dimensional PDE-eigenvalue problems

Quantum many-body problems

Quantum many-body problems

1 3 4 5 2

Quantum many-body problems

Stochastic Automata Networks (SANs)

Stochastic Automata Networks (SANs)

Stochastic Automata Networks (SANs)

Further applications

Summary

Approximate low-rank matrices

Low-rank approximation

Construction from singular value decomposition

Discretization of bivariate function

Separability and low rank

Semi-separable approximation by polynomials

Semi-separable approximation by polynomials

Semi-separable approximation of 1/(x + y)

Semi-separable approximation by exponential sums

Low-rank approximation of snapshot matrices

Example: Baking 1 cookie

Singular value decay – observation

Singular value decay – explanation

Singular value decay – explanation

Singular value decay – pw analytic case

Summary

Low-rank tensors: CP and Tucker

CP decomposition

CP decomposition

CP decomposition

Tucker decomposition

Tucker decomposition

Tucker decomposition

Higher-order SVD (HOSVD)

Higher-order SVD (HOSVD)

Tucker decomposition – Summary

Drawback:

Tensor network diagrams

Tensor network diagrams

Low-rank tensors: Hierarchical Tucker

Introduction

More general matricizations

Hierarchical construction

Dimension tree

Dimension tree

Constructors for MATLAB class htensor

Basic functionality for MATLAB class htensor

Singular value tree

Basic ops: µ-mode matrix multiplication

Addition of low-rank matrices

Addition of low-rank tensors

Orthogonalization

Norms and inner products

Computation of inner products