Low-Rank Tensor Techniques for High-Dimensional Problems Daniel - - PowerPoint PPT Presentation

low rank tensor techniques for high dimensional problems
SMART_READER_LITE
LIVE PREVIEW

Low-Rank Tensor Techniques for High-Dimensional Problems Daniel - - PowerPoint PPT Presentation

Low-Rank Tensor Techniques for High-Dimensional Problems Daniel Kressner CADMOS Chair for Numerical Algorithms and HPC MATHICSE, EPFL 1 Contents What is a tensor? Applications Matrices and low rank CP and Tucker


slide-1
SLIDE 1

Low-Rank Tensor Techniques for High-Dimensional Problems

Daniel Kressner CADMOS Chair for Numerical Algorithms and HPC MATHICSE, EPFL

1

slide-2
SLIDE 2

Contents

◮ What is a tensor? ◮ Applications ◮ Matrices and low rank ◮ CP and Tucker ◮ Hierarchical Tucker ◮ Algorithms based on low-rank tensors ◮ Conclusions

2

slide-3
SLIDE 3

What is a tensor?

◮ Vectors, matrices, and tensors ◮ Basic calculus with tensors ◮ Vectorization and matricization ◮ µ-mode matrix products ◮ Two classes of tensor problems

3

slide-4
SLIDE 4

Vectors, matrices, and tensors

Vector Matrix Tensor

◮ scalar = tensor of order 0 ◮ (column) vector = tensor of order 1 ◮ matrix = tensor of order 2 ◮ tensor of order 3

= n1n2n3 numbers arranged in n1 × n2 × n3 array

4

slide-5
SLIDE 5

Tensors of arbitrary order

A d-th order tensor X of size n1 × n2 × · · · × nd is a d-dimensional array with entries Xi1,i2,...,id, iµ ∈ {1, . . . , nµ} for µ = 1, . . . , d. In the following, entries of X are real (for simplicity) X ∈ Rn1×n2×···×nd. Multi-index notation: I = {1, . . . , n1} × {1, . . . , n2} × · · · × {1, . . . , nd}. Then i ∈ I is a tuple of d indices: i = (i1, i2, . . . , id). Allows to write entries of X as Xi for i ∈ I.

5

slide-6
SLIDE 6

Two important points

  • 1. A matrix A ∈ Rm×n has a natural interpretation as a linear
  • perator in terms of matrix-vector multiplications:

A : Rn → Rm, A : x → A · x. There is no such (unique and natural) interpretation for tensors! fundamental difficulty to define meaningful general notion of eigenvalues and singular values of tensors.

  • 2. Number of entries in tensor grows exponentially with d

Curse of dimensionality. Example: Tensor of order 30 with n1 = n2 = · · · = nd = 10 has 1030 entries = 8 × 1012 Exabyte storage!1 For d ≫ 1: Cannot afford to store tensor explicitly (in terms of its entries).

1Global data storage calculated at 295 exabyte, see

http://www.bbc.co.uk/news/technology-12419672.

6

slide-7
SLIDE 7

Basic calculus

◮ Addition of two equal-sized tensors X, Y:

Z = X + Y ⇔ Zi = Xi + Yi ∀i ∈ I.

◮ Scalar product with α ∈ R:

Z = αX ⇔ Zi = αXi ∀i ∈ I. vector space structure.

◮ Inner product of two equal-sized tensors X, Y:

X, Y :=

  • i∈I

xiyi. Induced norm X :=

i∈I

x2

i

1/2 For a 2nd order tensor (= matrix) this corresponds to the Frobenius norm.

7

slide-8
SLIDE 8

Vectorization

Tensor X of size n1 × n2 × · · · × nd has n1 · n2 · · · nd entries many ways to stack entries in a (loooong) column vector. One possible choice: The vectorization of X is denoted by vec(X), where vec : Rn1×n2×···×nd → Rn1·n2···nd stacks the entries of a tensor in reverse lexicographical order into a long column vector. Remark: For d = 2, this is the usual way how matrices are vectorized. A =   a11 a12 a21 a22 a31 a32   ⇒ vec(A) =         a11 a21 a31 a12 a22 a32        

8

slide-9
SLIDE 9

Vectorization

Example: d = 3, n1 = 3, n2 = 2, n3 = 3. vec(X) =                 x111 x112 x113 x121 . . . . . . x321 x322 x323                

9

slide-10
SLIDE 10

Matricization

◮ A matrix has two modes (column mode and row mode). ◮ A dth-order tensor X has d modes (µ = 1, µ = 2, . . ., µ = d).

Let us fix all but one mode, e.g., µ = 1: Then X(:, i2, i3, . . . , id) (abuse of MATLAB notation) is a vector of length n1 for each choice of i2, . . . , id. View tensor X as a bunch of column vectors:

10

slide-11
SLIDE 11

Matricization

Stack vectors into an n1 × (n2 · · · nd) matrix: X ∈ Rn1×n2×···×nd X (1) ∈ Rn1×(n2n3···nd) For µ = 1, . . . , d, the µ-mode matricization of X is a matrix X (µ) ∈ Rnµ×(n1···nµ−1nµ+1···nd) with entries

  • X (µ)

iµ1,(i1,...,iµ−1,iµ+1...id) = Xi

∀i ∈ I.

11

slide-12
SLIDE 12

Matricization

In MATLAB: a = rand(2,3,4,5);

◮ 1-mode matricization:

reshape(a,2,3*4*5)

◮ 2-mode matricization:

b = permute(a,[2 1 3 4]); reshape(b,3,2*4*5)

◮ 3-mode matricization:

b = permute(a,[3 1 2 4]); reshape(b,4,2*3*5)

◮ 4-mode matricization:

b = permute(a,[4 1 2 3]); reshape(b,5,2*3*4) For a matrix A ∈ Rn1×n2: A(1) = A, A(2) = AT.

12

slide-13
SLIDE 13

µ-mode matrix products

Consider 1-mode matricization X (1) ∈ Rn1×(n2···nd): Seems to make sense to multiply an m × n1 matrix A from the left: Y (1) := A X (1) ∈ Rm×(n2···nd). Can rearrange Y (1) back into an m × n2 × · · · × nd tensor Y. This is called 1-mode matrix multiplication Y = A ◦1 X ⇔ Y (1) = AX (1) More formally (and more ugly): Yi1,i2,...,id =

n1

  • k=1

ai1,kXk,i2,...,id.

13

slide-14
SLIDE 14

µ-mode matrix products

General definition of a µ-mode matrix product with A ∈ Rm×n1: Y = A ◦µ X ⇔ Y (µ) = AX (µ). More formally (and more ugly): Yi1,i2,...,id =

n1

  • k=1

aiµ,kXi1,...,iµ−1,k,iµ+1,...,id. For matrices:

◮ 1-mode multiplication = multiplication from the left:

Y = A ◦1 X = A X.

◮ 2-mode multiplication = transposed multiplication from the right:

Y = A ◦2 X = X AT.

14

slide-15
SLIDE 15

Kronecker product

For m × n matrix A and k × ℓ matrix B, Kronecker product defined as B ⊗ A :=    b11A · · · b1ℓA . . . . . . bk1A · · · bkℓA    ∈ Rkm×ℓn. Most important properties (for our purposes):

  • 1. vec(A X) = (I ⊗ A) vec(X).
  • 2. vec(X AT) = (A ⊗ I) vec(X).
  • 3. (B ⊗ A)(D ⊗ C) = (BD ⊗ AC).
  • 4. Im ⊗ In = Imn.

15

slide-16
SLIDE 16

µ-mode matrix products and vectorization

By definition, vec(X) = vec

  • X (1)

. Consequently, also vec(A ◦1 X) = vec

  • A X (1)

. Vectorized version of 1-mode matrix product: vec(A ◦1 X) = (In2···nd ⊗ A)vec(X) = (Ind ⊗ · · · ⊗ In2 ⊗ A) vec(X). Relation between µ-mode matrix product and matrix-vector product: vec(A ◦µ X) = (Ind ⊗ · · · ⊗ Inµ+1 ⊗ A ⊗ Inµ−1 ⊗ · · · ⊗ In1) vec(X)

16

slide-17
SLIDE 17

Two classes of tensor problems

Class 1: function-related tensors Consider a function u(ξ1, . . . , ξd) ∈ R in d variables ξ1, . . . , ξd. Tensor U ∈ Rn1×···×nd represents discretization of u:

◮ U contains function values of u evaluated on a grid; or ◮ U contains coefficients of truncated expansion in tensorized

basis functions: u(ξ1, . . . , ξd) ≈

  • i∈I

Ui φi1(ξ1)φi2(ξ2) · · · φid(ξd). Typical setting:

◮ U only given implicitly, e.g., as the solution of a discretized PDE; ◮ seek approximations to U with very low storage and tolerable

accuracy.

◮ d may become very large.

Focus of this lecture on function-related tensors!

17

slide-18
SLIDE 18

Discretization of function in d variables ξ1, . . . , ξd ∈ [0, 1] #function values grows exponentially with d

18

slide-19
SLIDE 19

Separability helps

Ideal situation: Function f separable: f(ξ1, ξ2, . . . , ξd) = f1(ξ1)f2(ξ2) . . . fd(ξd) Kronecker product diskretized f discretized f j O(nd) memory O(dn) memory Of course: Exact separability rarely satisfied in practice.

19

slide-20
SLIDE 20

Two classes of tensor problems

Class 2: data-related tensors Tensor U ∈ Rn1×···×nd contains multi-dimensional data. Example 1: U2011,3,2 denotes the number of papers published 2011 by author 3 in the mathematical journal 2. Example 2: A video of 1000 frames with resolution 640 × 480 can be viewed as a 640 × 480 × 1000 tensor. Typical setting:

◮ entries of U given explicitly (at least partially). ◮ extraction of dominant features from U. ◮ usually moderate values for d.

20

slide-21
SLIDE 21

Summary

◮ Tensor X ∈ Rn1×···×nd is a d-dimensional array. ◮ Various ways of reshaping entries of a tensor X into a vector or

matrix.

◮ µ-mode matrix multiplication can be expressed with Kronecker

products Further reading:

◮ T. Kolda and B. W. Bader. Tensor decompositions and

  • applications. SIAM Rev. 51 (2009), no. 3, 455–500.

Software:

◮ MATLAB offers basic functionality to work with d-dimensional

arrays.

◮ MATLAB Tensor Toolbox: http://www.csmr.ca.sandia.

gov/~tgkolda/TensorToolbox/

21

slide-22
SLIDE 22

Applications in scientific computing

◮ High-dimensional elliptic PDEs ◮ High-dimensional PDE-eigenvalue problems ◮ Quantum many-body problems ◮ Stochastic Automata Networks ◮ further applications

22

slide-23
SLIDE 23

High-dimensional elliptic PDEs: 3D model problem

◮ Consider

−∆u = f in Ω, u|∂Ω = 0,

  • n unit cube Ω = [0, 1]3.

◮ Discretize on tensor grid.

Uniform grid for simplicity: ξ(j)

µ = jh,

h = 1 n + 1 for µ = 1, 2, 3.

◮ Approximate solution tensor U ∈ Rn×n×n:

Ui1,i2,i3 ≈ u

  • ξ(i1)

1 , ξ(i2) 2 , . . . , ξ(id) d

  • .

23

slide-24
SLIDE 24

High-dimensional elliptic PDEs: 3D model problem

◮ Discretization of 1D-Laplace:

−∂xx ≈       2 −1 −1 2 ... ... ... −1 −1 2       =: A.

◮ Application in each coordinate direction:

−∂ξ1ξ1u(ξ1, ξ2, ξ3) ≈ A ◦1 U, −∂ξ2ξ2u(ξ1, ξ2, ξ3) ≈ A ◦2 U, −∂ξ3ξ3u(ξ1, ξ2, ξ3) ≈ A ◦3 U.

◮ Hence,

−∆u ≈ A ◦1 U + A ◦2 U + A ◦3 U

  • r in vectorized form with u = vec(U):

−∆u ≈ (I ⊗ I ⊗ A + I ⊗ A ⊗ I + A ⊗ I ⊗ I)u.

24

slide-25
SLIDE 25

High-dimensional elliptic PDEs: 3D model problem

Finite difference discretization of model problem −∆u = f in Ω, u|∂Ω = 0 for Ω = [0, 1]3 takes the form (I ⊗ I ⊗ A + I ⊗ A ⊗ I + A ⊗ I ⊗ I)u = f. Similar structure for finite element discretization with tensorized FEs: V ⊗ W ⊗ Z = αijkvi(ξ1)wj(ξ2)zk(ξ3) : αijk ∈ R

  • with

V = {v1(ξ1), . . . , vn(ξ1)}, W = {w1(ξ2), . . . , wn(ξ2)}, Z = {z1(ξ3), . . . , zn(ξ3)}

Galerkin discretization (KV ⊗ MW ⊗ MZ + MV ⊗ KW ⊗ MZ + MV ⊗ MW ⊗ KZ)u = f, with 1D mass/stiffness matrices MV, MW, MZ, KV, KW, KZ.

25

slide-26
SLIDE 26

High-dimensional elliptic PDEs: Arbitrary dimensions

Finite difference discretization of model problem −∆u = f in Ω, u|∂Ω = 0 for Ω = [0, 1]d takes the form

  • d
  • j=1

I ⊗ · · · ⊗ I ⊗ A ⊗ I ⊗ · · · ⊗ I

  • u = f.

To obtain such Kronecker structure in general:

◮ tensorized domain; ◮ highly structured grid; ◮ coefficients that can be written/approximated as sum of

separable functions.

26

slide-27
SLIDE 27

High-dimensional PDE-eigenvalue problems

PDE-eigenvalue problem ∆u(ξ) + V(ξ)u(ξ) = λ u(ξ) in Ω = [0, 1]d, u(ξ) =

  • n ∂Ω.

Assumption: Potential represented as V(ξ) =

s

  • j=1

V (1)

j

(ξ1)V (2)

j

(ξ2) · · · V (d)

j

(ξd). finite difference discretization Au = (AL + AV)u = λu, with AL =

d

  • j=1

I ⊗ · · · ⊗ I

  • d−j times

⊗AL ⊗ I ⊗ · · · ⊗ I

  • j−1 times

, AV =

s

  • j=1

A(d)

V,j ⊗ · · · ⊗ A(2) V,j ⊗ A(1) V,j.

27

slide-28
SLIDE 28

Quantum many-body problems

◮ spin-1/2 particles: proton, neutron, electron, and quark. ◮ two states: spin-up, spin-down ◮ quantum state for each spin represented by vector in C2 (spinor) ◮ quantum state for system of d spins represented by vector in C2d ◮ quantum mechanical operators expressed in terms of Pauli

matrices Px = 1 1

  • ,

Py = −i i

  • ,

Pz = 1 −1

  • .

◮ spin Hamiltonian: sum of Kronecker products of Pauli matrices

and identities each term describes physical (inter)action of spins

◮ interaction of spins described by graph ◮ Goal: Compute ground state of spin Hamiltonian.

28

slide-29
SLIDE 29

Quantum many-body problems

Example: 1d chain of 5 spins with periodic boundary conditions

1 3 4 5 2

Hamiltonian describing pairwise interaction between nearest neighbors: H = Pz ⊗ Pz ⊗ I ⊗ I ⊗ I + I ⊗ Pz ⊗ Pz ⊗ I ⊗ I + I ⊗ I ⊗ Pz ⊗ Pz ⊗ I + I ⊗ I ⊗ I ⊗ Pz ⊗ Pz + Pz ⊗ I ⊗ I ⊗ I ⊗ Pz

29

slide-30
SLIDE 30

Quantum many-body problems

◮ Ising (ZZ) model for 1d chain of d spins with open boundary

conditions: H =

p−1

  • k=1

I ⊗ · · · ⊗ I ⊗ Pz ⊗ Pz ⊗ I ⊗ · · · ⊗ I +λ

p

  • k=1

I ⊗ · · · ⊗ I ⊗ Px ⊗ I ⊗ · · · ⊗ I λ = ratio between strength of magnetic field and pairwise interactions

◮ 1d Heisenberg (XY) model ◮ Current research: 2d models. ◮ More details in:

Huckle/Waldherr/Schulte-Herbrüggen: Computations in Quantum Tensor Networks. Schollwöck: The density-matrix renormalization group in the age

  • f matrix product states.

30

slide-31
SLIDE 31

Stochastic Automata Networks (SANs)

◮ 3 stochastic automata A1, A2, A3 having 3 states each. ◮ Vector x(i) t

∈ R3 describes probabilities of states (1), (2), (3) in Ai at time t

◮ No coupling between automata local transition x(i) t

→ x(i)

t+1

described by Markov chain: x(i)

t+1 = Eix(i) t ,

with a stochastic matrix Ei.

◮ Stationary distribution of Ai = Perron vector of Ei (eigenvector for

eigenvalue 1).

31

slide-32
SLIDE 32

Stochastic Automata Networks (SANs)

◮ 3 stochastic automata A1, A2, A3 having 3 states each. ◮ Coupling between automata local transition x(i) t

→ x(i)

t+1 not

described by Markov chain.

◮ Need to consider all possible combinations of states in

(A1, A2, A3): (1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 2, 1), (1, 2, 2), . . . .

◮ Vector xt ∈ R33 (or tensor X(t) ∈ R3×3×3) describes probabilities

  • f combined states.

32

slide-33
SLIDE 33

Stochastic Automata Networks (SANs)

◮ Transition xt → xt+1 described by Markov chain:

xt+1 = Ext, with a large stochastic matrix E.

◮ Oversimplified example:

E = I ⊗ I ⊗ E1 + I ⊗ E2 ⊗ I + E3 ⊗ I ⊗ I

  • local transition

. + I ⊗ E21 ⊗ E12

  • interaction between A1, A2

+ E32 ⊗ E23 ⊗ I

  • interaction between A2, A3

◮ Goal: Compute stationary distribution = Perron vector of E. ◮ More details in:

Stewart: Introduction to the Numerical Solution of Markov

  • Chains. Chapter 9.

Buchholz: Product Form Approximations for Communicating Markov Processes.

33

slide-34
SLIDE 34

Further applications

Other applications in scientific computing featuring low-rank tensor concepts:

◮ Boltzmann equation [Ibragimov/Rjasanow’2009]. ◮ Dynamical systems [Koch/Lubich’2009]. ◮ Parabolic PDEs [Andreev/Tobler’2011], [Khoromskij’2009]. ◮ Stochastic PDEs [Khoromskij/Schwab’2010],

[Matthies/Zander’2011], [Kressner/Tobler’2011], [Ballani/Grasedyck/Kluge’2011], . . .

◮ Electronic structure calculation [Chinnamsetty et al.’2007], [Flad

et al.’2009], [Khoromskij/Khoromskaja’2009], [Limpanuparb/Gill’2009], [Benedikt et al.’2011], [Mohlenkamp’2011], . . .

◮ Evaluation of boundary integrals (in BEM): [Grasedyck],

[Khoromskij/Sauter/Veit’2011].

◮ . . .

34

slide-35
SLIDE 35

Summary

◮ Large diversity of applications leading to linear systems /

eigenvalue problems with Kronecker product structures.

◮ For many problems of practical interest:

Explicit storage / computation of solution infeasible.

◮ Increasing use of low-rank tensor techniques.

Heaviest use currently: DMRG for quantum many-body problems.

◮ Remark: For PDE-related applications, high dimensionality can

also be addressed during the discretization phase (sparse grids, adaptive sparse discretization, . . .). Has advantages and disadvantages.

35

slide-36
SLIDE 36

Approximate low-rank matrices

◮ Singular value decomposition ◮ Separability and low rank ◮ Separability by polynomial interpolation ◮ Separability by exponential sums ◮ Low rank of snapshot matrices

36

slide-37
SLIDE 37

Low-rank approximation

Setting: Matrix X ∈ Rn×m, m and n too large to compute/store X explicitly. Idea: Replace X by RST with R ∈ Rn×r, S ∈ Rm×r and r ≪ m, n. X RST Memory nm nr + rm Cost

  • ps(m, n)
  • ps(m, n) ×

r min{m,n} (?)

min

  • X − RST2 : R ∈ Rn×r, S ∈ Rm×r

= σk+1. with singular values σ1 ≥ σ2 ≥ · · · ≥ σmin{m,n} of X.

37

slide-38
SLIDE 38

Construction from singular value decomposition

SVD: Let matrix X ∈ Rn×m and k = min{m, n}. Then ∃ orthonormal matrices U =

  • u1, u2, . . . , uk
  • ∈ Rn×k,

V =

  • v1, v2, . . . , vk
  • ∈ Rm×k,

such that X = UΣV T, Σ = diag(σ1, σ2, . . . , σk). Choose r ≤ k and partition X =

  • U1, U2

Σ1 Σ2 V1, V2 T = U1 Σ1

=:R

V T

1

  • =:ST

+ U2Σ2V T

2 .

Then X − RST2 = Σ22 = σr+1. Good low rank approximation if singular values decay sufficiently fast. Also: span(X) ≈ span(R), span(X T) ≈ span(ST)

38

slide-39
SLIDE 39

Discretization of bivariate function

◮ Bivariate function: f(x, y) :

  • xmin, xmax
  • ×
  • ymin, ymax
  • → R.

◮ Function values on tensor grid [x1, . . . , xn] × [y1, . . . , ym]:

F =    

f(x1, y1) f(x1, y2) · · · f(x1, yn) f(x2, y1) f(x2, y2) · · · f(x2, yn) . . . . . . . . . f(xm, y1) f(xm, y2) · · · f(xm, yn)

    Basic but crucial observation: f(x, y) = g(x)h(y) F =   

g(x1)h(y1) · · · g(x1)h(yn) . . . . . . g(xm)h(y1) · · · g(xm)h(yn)

   =   

g(x1) . . . g(xm)

   [ h(y1)

· · · h(yn) ]

Separability implies rank 1.

39

slide-40
SLIDE 40

Separability and low rank

Approximation by sum of separable functions f(x, y) = g1(x)h1(y) + · · · + gr(x)hr(y)

  • =:fr (x,y)

+ error. Define Fr =   

fr(x1, y1) · · · fr(x1, yn) . . . . . . fr(xm, y1) · · · fr(xm, yn)

  . Then Fr has rank ≤ r and F − FrF ≤ √mn × error.

  • σr+1(F) ≤F − Fr2 ≤ F − FrF ≤

√ mn × error. Semi-separable approximation implies low-rank approximation.

40

slide-41
SLIDE 41

Semi-separable approximation by polynomials

Solution of approximation problem f(x, y) = g1(x)h1(y) + · · · + gr(x)hr(y) + error. not trivial; gj, hj can be chosen arbitrarily! General construction by polynomial interpolation:

  • 1. Lagrange interpolation of f(x, y) in y-coordinate:

Iy[f](x, y) =

r

  • j=1

f(x, θj)Lj(y) with Lagrange polynomials Lj of degree r − 1 on [xmin, xmax].

  • 2. Interpolation of Iy[f] in x-coordinate:

Ix[Iy[f]](x, y) =

r

  • i,j=1

f(ξi, θj)Li(x)Lj(y) ˆ =

r

  • i=1

Li,x(x)Lj,y(y), where f[f(ξi, θj)]i,j is “diagonalized” by SVD.

41

slide-42
SLIDE 42

Semi-separable approximation by polynomials

error ≤ f − Ix[Iy[f]]∞ = f − Ix[f] + Ix[f] − Ix[Iy[f]]∞ ≤ f − Ix[f]∞ + Ix∞f − Iy[f]∞ with Lebesgue constant Ix∞ ∼ log r when using Chebyshev interpolation nodes. Polynomial interpolation error typically much too pessimistic

◮ Lebesgue constants hit hard in high dimensions: (log r)d−1. ◮ Severe theoretical barriers for general smooth multivariate

functions:

  • E. Novak and H. Wo´

zniakowski: Tractability of Multivariate Problems, Volume I and II. EMS.

42

slide-43
SLIDE 43

Semi-separable approximation of 1/(x + y)

Consider f(x, y) = 1 x + y , x, y ∈ [α, β], 0 < α < β. Apply numerical quadrature: 1 z = ∞ e−tz dt =

r

  • j=1

ωje−γjz + error. Inserting z = x + y 1 x + y =

r

  • j=1

ωje−γj(x+y) + error =

r

  • j=1

ωje−γjxe−γjy + error. Choice of nodes γj > 0 and weights ωj > 0 as in [Stenger’93, Braess’86, Braess/Hackbusch’05] error ≤ 8 |α| exp

rπ2 log(8β/α)

  • .

43

slide-44
SLIDE 44

Semi-separable approximation by exponential sums

◮ Consider more general case of function f(x, y) := g(x + y). ◮ Approximation of g(z) with z := x + y by exponential sum

g(z) ≈

r

  • j=1

ωj exp(γjz) (1) for some coefficients γj, ωj ∈ R.

◮ (1) gives semi-separable approximation for f:

f(x, y) = g(x + y) ≈

r

  • j=1

ωj exp(γj(x + y)) =

r

  • j=1

ωj exp(γjx) exp(γjy).

◮ Naturally extends to arbitrarily many variables. ◮ Problem: (1) nontrivial approx problem [Braess’1986],

[Hackbusch’2006], . . .

44

slide-45
SLIDE 45

Low-rank approximation of snapshot matrices

Vector-valued function x(α) : [αmin, αmax] → Rn Sampling at α1, . . . , αm ∈ [αmin, αmax]: Snapshot matrix X = =

  • x(α1), x(α2), . . . , x(αm)
  • 45
slide-46
SLIDE 46

Example: Baking 1 cookie

Stationary heat equation with pw constant heat conductivity σ(x, α): −∇(σ(x, α)∇u) = f in Ω = [−1, 1]2 u = 0

  • n ∂Ω,

◮ σ(baking tray) = 1 ◮ σ(cookie) = 1 + α ◮ Undetermined parameter

α ∈ [αmin, αmax].

0.5 1 1.5 2 0.5 1 1.5 2 # Vertices : 455, # Elements : 825, # Edges : 1279

Standard FE discretization results in linearly parameter-dependent linear system (A0 + αA1)x(α) = b.

46

slide-47
SLIDE 47

Singular value decay – observation

◮ 1 Cookie: n = 371, m = 101.

log10(singular values of snapshot matrix)

20 40 60 80 100 −20 −15 −10 −5 5

◮ Foundation of Proper Orthogonal Decomposition and Reduced

Basis Methods.

47

slide-48
SLIDE 48

Singular value decay – explanation

Polynomial approximation: x(α) = x0 + αx1 + α2x2 + · · · + αk−1xk−1 + error. Approximation error:

◮ Assume b(·), A(·) analytic x(·) analytic. ◮ Then

error ρ−k, where ρ > 1 depends on domain of analyticity of A, b. (Proof: Direct extension of classical result for scalar-valued functions.)

48

slide-49
SLIDE 49

Singular value decay – explanation

Polynomial approximation: x(α) = x0 + αx1 + α2x2 + · · · + αk−1xk−1 + error. Snapshot matrix: X =

  • x(α1), x(α2), . . . , x(αm)
  • =
  • x0, x1, . . . , xk−1

    1 1 . . . 1 α1 α2 . . . αm . . . . . . . . . αk−1

1

αk−1

2

. . . αk−1

m

     + error = matrix of rank k + error σk+1(X) ≤ error ρ−k Remark: Trivially extends to pw analytic case.

49

slide-50
SLIDE 50

Singular value decay – pw analytic case

Example: Consider smallest singular value σ(z) and corresponding right singular vector v(z) of B(z) = A − izI for z ∈ [−1, 1].

◮ s(z) only Lipschitz

cont, but pw anal.

◮ v(z) discontinuous,

but pw anal.

◮ A = 2 × 2 block diag randn, n = 400. ◮ Snapshot matrix of singular vectors:

X =

  • v(z1), v(z2), . . . , v(z100)
  • for equidistant samples zj ∈ [−1, 1].

σ(z) Singular values of X

−1 −0.5 0.5 1 0.005 0.01 0.015 0.02 0.025 0.03 z 20 40 60 80 100 10

−20

10

−15

10

−10

10

−5

10 10

5

50

slide-51
SLIDE 51

Summary

Need strong singular value decay for good low-rank approximations. For function-related matrices/tensors: Strong link to semi-separable approximations. Smoothness seems to be important... at least somehow.

◮ Fortunately, smoothness is not necessary.

Piecewise smoothness can be enough.

◮ Unfortunately, smoothness is not sufficient for higher-order

tensors.

◮ Need to impose stronger regularity as dimension/order d

increases, based, e.g., on mixed weak derivatives [Yserentant: Regularity and approximability of electronic wave functions. 2010].

51

slide-52
SLIDE 52

Low-rank tensors: CP and Tucker

◮ CP ◮ Tucker ◮ Higher-order SVD ◮ Tensor networks

52

slide-53
SLIDE 53

CP decomposition

◮ Aim: Generalize concept of low rank from matrices to tensors. ◮ One possibility motivated by

X =

  • a1, a2, . . . , aR
  • b1, b2, . . . , bR

T = = a1bT

1 + a2bT 2 + · · · + aRbT R.

vectorization vec(X) = b1 ⊗ a1 + b2 ⊗ a2 + · · · + bR ⊗ aR. Canonical Polyadic decomposition of tensor X ∈ Rn1×n2×n3 defined via vec(X) = c1 ⊗ b1 ⊗ a1 + c2 ⊗ b2 ⊗ a2 + · · · + cR ⊗ bR ⊗ aR for vectors aj ∈ Rn1, bj ∈ Rn2, cj ∈ Rn3. CP directly corresponds to semi-separable approximation. Tensor rank of X = minimal possible R

53

slide-54
SLIDE 54

CP decomposition

Illustration of CP decomposition vec(X) = c1 ⊗ b1 ⊗ a1 + c2 ⊗ b2 ⊗ a2 + · · · + cR ⊗ bR ⊗ aR.

c1 a1 b1 cr ar br

X

54

slide-55
SLIDE 55

CP decomposition

◮ CP decomposition offers low data-complexity; for constant R:

linear complexity in d.

◮ For matrices:

◮ rank r is upper semi-continuous closedness property:

sequence of rank= r matrices can only converge to rank≤ r matrix.

◮ best low-rank approximation possible by successive rank-1

approximations.

◮ Robust black-box algorithms/software available (svd, Lanczos).

For tensors of order d ≥ 3:

◮ tensor rank R is not upper

semi-continuous lack of closedness

◮ successive rank-1 approximations fail ◮ all algorithms based on optimization

techniques (ALS, Gauss-Newton)

Picture taken from [Kolda/Bader’2009].

55

slide-56
SLIDE 56

Tucker decomposition

◮ Aim: Generalize concept of low rank from matrices to tensors. ◮ Alternative possibility motivated by

A = U · Σ · V T, U ∈ Rn1×r, V ∈ Rn2×r, Σ ∈ Rr×r. vectorization vec(X) =

  • V ⊗ U
  • · vec(Σ).

Ignore diagonal structure of Σ and call it C. Tucker decomposition of tensor X ∈ Rn1×n2×n3 defined via vec(X) =

  • W ⊗ V ⊗ U
  • · vec(C)

with U ∈ Rn1×r1, V ∈ Rn2×r2, W ∈ Rn3×r3, and core tensor C ∈ Rr1×r2×r3. In terms of µ-mode matrix products: X = U ◦1 V ◦2 W ◦3 C =: (U, V, W) ◦ C.

56

slide-57
SLIDE 57

Tucker decomposition

Illustration of Tucker decomposition X = (U, V, W) ◦ C X C

U V W

57

slide-58
SLIDE 58

Tucker decomposition

Consider all three matricizations: X (1) = U · C(1) ·

  • W ⊗ V

T, X (2) = V · C(2) ·

  • W ⊗ U

T, X (3) = W · C(3) ·

  • V ⊗ U

T. These are low rank decompositions rank

  • X (1)

≤ r1, rank

  • X (2)

≤ r2, rank

  • X (3)

≤ r3. Multilinear rank of tensor X ∈ Rn1×n2×n3 defined by tuple (r1, r2, r3), with ri = rank

  • X (i)

.

58

slide-59
SLIDE 59

Higher-order SVD (HOSVD)

Goal: Approximate given tensor X by Tucker decomposition with prescribed multilinear rank (r1, r2, r3).

  • 1. Calculate SVD of matricizations:

X (µ) = Uµ Σµ V T

µ

for µ = 1, 2, 3.

  • 2. Truncate basis matrices:

Uµ := Uµ(:, 1 : rµ) for µ = 1, 2, 3.

  • 3. Form core tensor:

vec(C) :=

  • UT

3 ⊗ UT 2 ⊗ UT 1

  • · vec(X).

Truncated tensor produced by HOSVD [Lathauwer/De Moor/Vandewalle’2000]: vec X

  • :=
  • U3 ⊗ U2 ⊗ U1
  • · vec(C).

Remark: Orthogonal projection X :=

  • π1 ◦ π2 ◦ π3
  • X with πµX := UµUT

µ ◦µ X.

59

slide-60
SLIDE 60

Higher-order SVD (HOSVD)

Tensor X resulting from HOSVD satisfies quasi-optimality condition X − X ≤ √ dX − Xbest, where Xbest is best approximation of X with multilinear ranks (r1, . . . , rd). Proof: X − X2 = X − (π1 ◦ π2 ◦ π3)X2 = X − π1X2 + π1X − (π1 ◦ π2)X2 + · · · · · · + (π1 ◦ π2)X − (π1 ◦ π2 ◦ π3)X2 ≤ X − π1X2 + X − π2X2 + X − π3X2 Using X − πµX ≤ X − Xbest for µ = 1, 2, 3 leads to X − X2 ≤ 3 · X − Xbest2. Best approximation: See [Kolda/Bader’09].

60

slide-61
SLIDE 61

Tucker decomposition – Summary

For general tensors:

◮ multilinear rank r is upper semi-continuous closedness

property.

◮ HOSVD – simple and robust algorithm to obtain quasi-optimal

low-rank approximation.

◮ quasi-optimality good enough for most applications in scientific

computing.

◮ robust black-box algorithms/software available (e.g., Tensor

Toolbox).

Drawback:

Storage of core tensor ∼ r d curse of dimensionality

61

slide-62
SLIDE 62

Tensor network diagrams

Tensor network = undirected graph with:

◮ each node is a tensor; ◮ each outgoing edge is a mode; ◮ each connected edge represents a contraction; example:

Zi1,i2,i3,i4 =

r

  • j=1

Xi1,i2,jYj,i3,i4.

2 1 3 1 2 3

◮ number of free edges = order of tensor represented by entire

network Researchers on quantum many-body problems think2 in terms of tensor networks!

2and dream

62

slide-63
SLIDE 63

Tensor network diagrams

Examples: 1 2 3 3 1 2 1 2 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 (v) (i) (ii) (iii) (iv) (i) vector; (ii) matrix; (iii) matrix-matrix multiplication; (iv) Tucker decomposition; (v) hierarchical Tucker decomposition.

63

slide-64
SLIDE 64

Low-rank tensors: Hierarchical Tucker

◮ Intro of Hierarchical Tucker Decomposition (HTD) ◮ MATLAB toolbox htucker ◮ Basic operations: µ-mode matrix multiplication, addition, . . . ◮ Advanced Operations: inner product, elementwise multiplication,

. . .

64

slide-65
SLIDE 65

Introduction

◮ CP offers low data complexity but difficult truncation; ◮ Tucker offers simple truncation but high data complexity.

Recently developed formats:

◮ Matrix Product State (MPS), ◮ TT decomposition, ◮ Hierarchical Tucker decomposition (HTD).

Aim to offer compromise between CP and Tucker. Focus in this lecture: HTD.

◮ L. Grasedyck. Hierarchical singular value decomposition of tensors.

SIAM J. Matrix Anal. Appl., 31(4):2029–2054, 2010.

◮ W. Hackbusch and S. Kühn. A new scheme for the tensor

  • representation. J. Fourier Anal. Appl., 15(5):706–722, 2009.

◮ D. Kressner and C. Tobler. htucker – A MATLAB toolbox for the

hierarchical Tucker decomposition. In preparation. See http://www.math.ethz.ch/~ctobler.

65

slide-66
SLIDE 66

More general matricizations

Recall: µ-mode matricization for tensor X, X (µ) ∈ Rnµ×(n1···nµ−1nµ+1···nd), µ = 1, . . . , d. It is getting ugly... General matricization for mode de- composition {1, . . . , d} = t ∪ s: X (t) ∈ R(nt1···ntk )×(ns1···nsd−k ) with

  • X (t)

(it1,...,itk ),(is1,...,isd−k ) := Xi1,...,id.

X X (1) X (1,2)

66

slide-67
SLIDE 67

Hierarchical construction

Singular value decomposition: X (t) = UtΣtUT

s .

Column spaces are nested t = t1 ∪ t2 ⇒ span(Ut) ⊂ span(Ut2 ⊗ Ut1) ⇒ ∃Bt : Ut = (Ut2 ⊗ Ut1)Bt. Size of Ut: Ut ∈ Rnt1···ntk ×rt with rt = rank

  • X (t)

. For d = 4: U12 = (U2 ⊗ U1)B12 U34 = (U4 ⊗ U3)B34 vec(X) = X (1234) = (U34 ⊗ U12)B1234 ⇒ vec(X) = (U4 ⊗ U3 ⊗ U2 ⊗ U1)(B34 ⊗ B12)B1234.

67

slide-68
SLIDE 68

Dimension tree

Tree structure for d = 4:

B12 U1 U2 U3 U4 B34 B1234 (n2 × r2) (n3 × r3) (n4 × r4) (n1 × r1) (r1r2 × r12) (r1r2 × r12) (r3r4 × r34) (r12r34 × 1)

Reshape: B12 ∈ Rr1r2×r12 ⇒ B12 ∈ Rr1×r2×r12 B34 ∈ Rr3r4×r34 ⇒ B34 ∈ Rr3×r4×r34 B1234 ∈ Rr12r34×1 ⇒ B1234 ∈ Rr12×r34

68

slide-69
SLIDE 69

Dimension tree

B34 B12 U4 U3 U2 U1 B1234

◮ Often, U1, U2, U3, U4 are orthonormal. This is advantageous but

not required.

◮ Storage requirements for general d:

O(dnr) + O(dr 3), where r = max{rt}, n = max{nµ}.

69

slide-70
SLIDE 70

Constructors for MATLAB class htensor

x = htensor([4 5 6 7]) constructs zero htensor of size 4 × 5 × 6 × 7, with a balanced dimension tree. x = htensor([4 5 6 7], ’TT’) constructs zero htensor

  • f size 4 × 5 × 6 × 7, with a TT-style dimension tree.

x = htensor({U1, U2, U3}) constructs htensor from tensor in CP decomp X(i1, i2, i3) =

j U1(i1, j)U2(i2, j)U3(i3, j).

x = htenrandn([4 5 6 7]) constructs htensor of size 4 × 5 × 6 × 7, with random ranks and random entries. x = htenones([4 5 6 7]) constructs htensor of size 4 × 5 × 6 × 7, with all entries one. ...

70

slide-71
SLIDE 71

Basic functionality for MATLAB class htensor

Example: x is in htensor of order 4. x(1, 3, 4, 2) returns entry of X. x(1, 3, :, :) returns slice of X as an htensor. full(x) returns full tensor represented by X. (use with care) disp_tree(htenrand([5 4 6 3])) returns tree structure/ranks:

ans is an htensor of size 5 x 4 x 6 x 3 1-4 1; 6 3 1 1-2 2; 3 4 6 1 4; 5 3 2 5; 4 4 3-4 3; 3 3 3 3 6; 6 3 4 7; 3 3

spy(x) displays spy plots of Ut, Bt, on the dimension tree. change_root(x, i) switches root node.

71

slide-72
SLIDE 72

Singular value tree

plot_sv(x) plots singular values of corresponding matricizations in the dimension tree of a tensor X. Example: Singular value tree of solution to elliptic PDE with 4 parameters.

  • Dim. 1, 2
  • Dim. 3, 4, 5
  • Dim. 1
  • Dim. 2
  • Dim. 3
  • Dim. 4, 5
  • Dim. 4
  • Dim. 5

Remark: Singular values are computed from Gramians.

72

slide-73
SLIDE 73

Basic ops: µ-mode matrix multiplication

Application of matrix A ∈ Rm×nµ to mode µ of X ∈ Rn1×···×nd: Y = A ◦µ X ⇔ Y (µ) = AX (µ). Nearly trivial if X is in H-Tucker format: A ◦µ X = A ◦µ

  • (U1, . . . , Ud) ◦ C
  • =

(U1, . . . , Uµ−1, AUµ, Uµ+1, . . . , Ud) ◦ C

◮ Almost no operations required. ◮ Ranks stay the same. ◮ Orthogonality destroyed.

ttm(x, A, 2) applies matrix A to htensor X in mode 2. y = ttm(x, {A, B, C}, [2, 3, 4]) y = ttm(x, @(x)(fft(x)), 2) applies FFT in mode 2. y = ttm(x, {A, B, C}, [2, 3, 4], ’h’) successively applies matrices AT, BT, CT in modes 2, 3, 4.

73

slide-74
SLIDE 74

Addition of low-rank matrices

Addition of two matrices in low-rank format: A = U1ΣAUT

2 ,

B = V1ΣBV T

2

⇒ A + B = U1 V1 ΣA ΣB U2 V2 T

◮ No operations required. ◮ Rank increases. ◮ Orthogonality destroyed.

74

slide-75
SLIDE 75

Addition of low-rank tensors

Addition of four tensors X1, X2, X3, X4 in H-Tucker format: X1 + X2 + X3 + X4. Proceed as in matrix case by embedding factors in larger matrices.

◮ No operations required. ◮ H-Tucker rank increases. ◮ Orthogonality destroyed.

Command in htucker: x1 + x2 + x3 + x4

75

slide-76
SLIDE 76

U[4]

1

U[4]

2

U[4]

3

U[4]

4

B[1]

12

B[2]

12

B[3]

12

B[4]

12

B[1]

34

B[2]

34

B[3]

34

B[4]

34

B[1]

1234

B[2]

1234

B[3]

1234

B[4]

1234

U[3]

1

U[2]

1

U[1]

1

U[3]

3

U[3]

2

U[2]

2

U[2]

3

U[1]

3

U[1]

2

U[3]

4

U[2]

4

U[1]

4

76

slide-77
SLIDE 77

Orthogonalization

Any tensor X in H-Tucker format can be orthogonalized in the sense that all factors in the dimension tree, except for the root node, contain

  • rthonormal columns.

Example: vec(X) = (U4 ⊗ U3 ⊗ U2 ⊗ U1)(B34 ⊗ B12)B1234. Step 1: QR decompositions Ut = QtRt vec(X) = (Q4 ⊗ Q3 ⊗ Q2 ⊗ Q1)( B34 ⊗ B12)B1234 with B34 := (R4 ⊗ R3)B34, B12 := (R2 ⊗ R1)B12. Step 2: QR decompositions B34 = Q34R34, B12 = Q12R12 vec(X) = (Q4 ⊗ Q3 ⊗ Q2 ⊗ Q1)(Q34 ⊗ Q12) B1234 with B1234 := (R34 ⊗ R12)B1234.

  • Compt. requirements for general d: O(dnr 2) + O(dr 4).

Command in htucker: x = orthog(x)

77

slide-78
SLIDE 78

Norms and inner products

Inner product of two tensors X, Y ∈ Rn1×···nd: X, Y = vec(X), vec(Y) =

n1

  • i1=1

· · ·

nd

  • id=1

xi1,...,idyi1,...,id. Can be performed efficiently in H-Tucker, provided that X, Y have compatible dimension trees. Example: Two tensors of order 4 X, Y = (Bx

1234)T(Bx 34 ⊗ Bx 12)T(Ux 4 ⊗ Ux 3 ⊗ Ux 2 ⊗ Ux 1 )T · · ·

· · · (Uy

4 ⊗ Uy 3 ⊗ Uy 2 ⊗ Uy 1 )(By 34 ⊗ By 12)By 1234

Norm: After X has been orthogonalized: X =

  • X, X = Bx

12···dF.

Possibly most accurate way to compute norm. Used in norm(x).

78

slide-79
SLIDE 79

Computation of inner products

X, Y =

n1

  • i1=1

· · ·

nd

  • id=1

xi1,...,idyi1,...,id.

79

slide-80
SLIDE 80

Computation of inner products

80

slide-81
SLIDE 81

Computation of inner products

81

slide-82
SLIDE 82

Computation of inner products

82

slide-83
SLIDE 83

Computation of inner products

83

slide-84
SLIDE 84

Computation of inner products – contraction step

(Bx

t )T

(Ux

t2)TUy t2

(Ux

t1)TUy t1

By

t

(Ux

t )TUy t = (Bx t )T

(Ux

t2)TUy t2 ⊗ (Ux t1)TUy t1

  • By

t . ◮ htucker command: innerprod(x,y) ◮ Overall cost: O(dnr 2) + O(dr 4).

84

slide-85
SLIDE 85

Reduced Gramians in H-Tucker

t Ut Gt t Ut

X (t) = UtV T

t

⇒ X (t)(X (t))T = Ut V T

t Vt =:Gt

UT

t

If Ut orthonormal svd

  • X (t)

=

  • eig(Gt) (used in plot_sv).

85

slide-86
SLIDE 86

Reduced Gramians in H-Tucker

86

slide-87
SLIDE 87

Reduced Gramians in H-Tucker

87

slide-88
SLIDE 88

Reduced Gramians in H-Tucker

88

slide-89
SLIDE 89

Reduced Gramians in H-Tucker

89

slide-90
SLIDE 90

Reduced Gramians in H-Tucker

90

slide-91
SLIDE 91

Reduced Gramians in H-Tucker

Implemented in htucker command gramians(x).

91

slide-92
SLIDE 92

Advanced operations

◮ Truncation ◮ Combined addition + truncation ◮ Elementwise multiplication ◮ Elementwise reciprocal

92

slide-93
SLIDE 93

Truncation of explicit tensor

Let X ∈ Rn1×n2×···×nd be explicitly given.

◮ For each tree node t, let Wt contain rt dominant left singular

vectors of X (t) and define projection πtX = WtW T

t ◦t X

⇔ πtX (t) = WtW T

t X (t). ◮ Truncated tensor:

˜ X :=

t∈TL

πt

  • · · ·

t∈T1

πt

  • X,

where Tℓ contains all nodes on level ℓ.

◮ [Grasedyck’2010]: X − ˜

X ≤ √ 2d − 3 X − Xbest. Proof similar as for HOSVD.

93

slide-94
SLIDE 94

Truncation of explicit tensor

Example: vec ˜ X = (W4W T

4 ⊗ W3W T 3 ⊗ W2W T 2 ⊗ W1W T 1 )(W34W T 34 ⊗ W12W T 12)vecX

= (W4 ⊗ W3 ⊗ W2 ⊗ W1) · · · ([W T

4 ⊗ W T 3 ]W34

  • =:B34

⊗ [W T

2 ⊗ W T 1 ]W12

  • =:B12

) ([W T

34 ⊗ W T 12]vecX)

  • =:B1234

.

  • pts.max_rank = 10 maximal rank at truncation.
  • pts.rel_eps = 1e-6 maximal relative truncation error.
  • pts.abs_eps = 1e-6 maximal absolute truncation error.

Condition max_rank takes precedence over rel_eps and abs_eps. xt = htensor.truncate_rtl(x, opts) returns truncated tensor ˜ X of a multidimensional array. Remark: There is also a significantly faster htensor.truncate_ltr (proceeds successively from leafs to roots), for which the same error bound holds [Tobler’10].

94

slide-95
SLIDE 95

Truncation of H-Tucker tensor

Let X ∈ Rn1×n2×···×nd be in H-Tucker format and orthogonalized.

◮ Compute left singular vectors of X (t) = UtV T t from eigenvectors

  • f

X (t) X (t)T = Ut V T

t Vt =Gt

UT

t ,

with reduced Gramian Gt. If St contains rt dominant eigenvectors of Gt Wt = UtSt.

◮ Traverse tree from root to leafs. In each step:

Btp StST

t

Bt Bt Btp ST

t

St ST

t ◦ Btp

St ◦ Bt

◮ In htucker: truncate(x,opts). Complexity O(dnr 2 + dr 4).

95

slide-96
SLIDE 96

Combined addition + truncation

Sum of more than two tensors: Y = X1 + X2 + · · · + Xs. Two possibilities to incorporate truncation operator T :

  • 1. Y ≈ T (X1 + X2 + X3 + · · · + Xs)
  • 2. Y ≈ T (· · · (T (T (X1 + X2) + X3) + · · · + Xs)

Option 2 is usually significantly cheaper but may suffer from severe cancellation. Artificial example: X1, X2, X3 ∈ R101×101×101 truncated tensor grid discretizations for summands of f(x1, x2, x3) = tan(x1 + x2 + x3) + (x1 + x2 + x3)−1 − tan(x1 + x2 + x3). Error(Option 1) ≈ 10−7. Error(Option 2) ≈ 1.3. What is wrong with Option 1?

96

slide-97
SLIDE 97

Combined addition + truncation

U[4]

1

U[4]

2

U[4]

3

U[4]

4

B[1]

12

B[2]

12

B[3]

12

B[4]

12

B[1]

34

B[2]

34

B[3]

34

B[4]

34

B[1]

1234

B[2]

1234

B[3]

1234

B[4]

1234

U[3]

1

U[2]

1

U[1]

1

U[3]

3

U[3]

2

U[2]

2

U[2]

3

U[1]

3

U[1]

2

U[3]

4

U[2]

4

U[1]

4

◮ Orthogonalization (needed before truncation) destroys block

diagonal structure.

◮ Complexity O(dns2r 2 + ds4r 4) for s summands.

97

slide-98
SLIDE 98

Combined addition + truncation

Idea: New variant delays orthogonalization to keep block diagonal structure in transfer tensors as long as possible. Reduces O(dns2r 2 + ds4r 4) to O(dns2r 2 + ds2r 4 + ds3r 3)

10 10

1

10

−2

10

−1

10 10

1

10

2

Number of summands Runtime [s] time truncate std time truncate sum time truncate succ. O(t4) O(t2) O(t) ◮ htucker command: add_truncate(x1 x2 x3 x4, opts).

98

slide-99
SLIDE 99

Elementwise multiplication

Elementwise multiplication (also called Hadamard or Schur product)

  • f two low-rank matrices A = U1ΣAUT

2 , B = V1ΣBV T 2 :

A ⋆ B = (U1 ⊙ V1)(ΣA ⊗ ΣB)(U2 ⊙ V2)T, with the row-wise Khatri-Rao product C ⊙ D =    cT

1

. . . cT

n

   ⊙    dT

1

. . . dT

n

   =    cT

1 ⊗ dT 1

. . . cT

n ⊗ dT n

  

◮ Orthogonality destroyed. ◮ Rank increases significantly.

But: singular value decay of ΣA ⊗ ΣB may become significantly stronger additional opportunities for truncation.

99

slide-100
SLIDE 100

Elementwise multiplication

Elementwise multiplication of two tensors X, Y in H-Tucker format:

◮ Row-wise Khatri-Rao product of leaf matrices. ◮ “Kronecker product” of non-leaf tensors. ◮ Optional: Products are only formed after suitable truncation to

avoid excessive memory requirements. Commands in htucker: x.*y (without truncation) x.ˆ2 (without truncation) elem_mult( x, y, opt ) (with truncation)

100

slide-101
SLIDE 101

Elementwise reciprocal

Goal: Compute reciprocal of each entry in tensor X. Basic idea: Newton-Schultz iteration y0 = 1, yi+1 = yi + yi(1 − x yi), (2) converges to 1/x for 0 < x < 2. Apply (2) simultaneously to all entries. Code snippet of elem_reciprocal( x, opt ) in htucker: all_ones = htenones(size(x)); y = all_ones; for it=1:maxit xy = elem_mult( x, y ); xy = truncate( all_ones - xy ); xy = elem_mult( xy, y ); y = truncate( y + xy ); end See also [Oseledets et al. 2009].

101

slide-102
SLIDE 102

Elementwise reciprocal

Example: (x1 + x2 + x3 + x4)−1 with xi ∈ [10−3, 1].

c = laplace_core(4); U = [ones(100, 1), linspace(1e-3, 1, 100)’]; x = ttm(c, {U, U, U, U}); inv_x = elem_reciprocal(x, opts);

2 4 6 8 10 12 10

−5

10 ||y*xk − 1||/||1||

Convergence of X ⋆ Yk − 1.

  • Dim. 1, 2
  • Dim. 3, 4
  • Dim. 1
  • Dim. 2
  • Dim. 3
  • Dim. 4

Singular value tree upon conver- gence.

102

slide-103
SLIDE 103

Summary

◮ HTD offers good compromise between CP and Tucker. ◮ Algorithms often quite technical but conceptually simple. ◮ Computational complexity ∼ d but often ∼ r 4:

Curse of dimensionality ⇒ curse of rank ?

◮ Important to keep in mind:

Unless d is tiny, tensor X can/should never be formed explicitly. All operations need to be performed implicitly in HTD. Can pose severe problems even for seemingly simple operations: min(X), max(X), abs(X), 1./X, . . .

103

slide-104
SLIDE 104

104

slide-105
SLIDE 105

Algorithms based on low-rank tensors

◮ Inexact LOBPCG ◮ ALS / MALS

105

slide-106
SLIDE 106

Strategies for solving tensor equations

◮ In many practical situations, tensor X is given implicitly as

solution to linear system A(X) = B, eigenvalue problem A(X) = λX, nonlinear system, ODE, . . . Two main strategies to use low-rank tensor techniques:

  • 1. Combine existing iterative solver (e.g., CG, LOBPCG, GMRES)

with repeated low-rank truncation of iterates ( inexact CG).

◮ Straightforward to derive and implement (based, e.g., on

htucker).

◮ Hard to analyze impact of nonnegligible truncations on accuracy

and convergence.

◮ Intermediate rank growth may result in excessive computing times

and/or harm accuracy+convergence.

  • 2. Formulate optimization problem, constrain to low-rank tensors,

iteratively optimize wrt individual factors of low-rank format.

◮ Works well in practice. ◮ Convergence theory not well understood. ◮ Not straightforward to implement. 106

slide-107
SLIDE 107

Example: PDE-eigenvalue problem

Goal: Compute smallest eigenvalue for ∆u(ξ) + V(ξ)u(ξ) = λ u(ξ) in Ω = [0, 1]d, u(ξ) =

  • n ∂Ω.

Assumption: Potential represented as V(ξ) =

s

  • j=1

V (1)

j

(ξ1)V (2)

j

(ξ2) · · · V (d)

j

(ξd). finite difference discretization Au = (AL + AV)u = λu, with AL =

d

  • j=1

I ⊗ · · · ⊗ I

  • d−j times

⊗AL ⊗ I ⊗ · · · ⊗ I

  • j−1 times

, AV =

s

  • j=1

A(d)

V,j ⊗ · · · ⊗ A(2) V,j ⊗ A(1) V,j.

107

slide-108
SLIDE 108

LOBPCG method

LOBPCG with block size 1 [Knyazev’01] for computing smallest eigenvalue of Ax = λx, A symmetric. λ0 = x0, x0A, p0 = 0 for k = 0, 1, . . . (until converged) do rk = B−1(Axk − λkx) U =

  • xk, rk, pk
  • ˜

A = UTAU, ˜ M = UTU Find eigenpair (λk+1, y), with y2 = 1, for smallest eigenvalue

  • f matrix pencil ˜

A − λ ˜ M. pk+1 = y2 · rk + y3 · pk xk+1 = y1 · xk + pk+1 xk+1 ← xk+1/xk+12 end for Return (λmin, x) = (λk+1, xk+1).

108

slide-109
SLIDE 109

Tensor low-rank LOBPCG

Truncated LOBPCG with block size 1 for computing smallest eigenvalue of A(X) = λX, A symmetric, X tensor. λ0 = X0, X0A, P0 = 0 · X for k = 0, 1, . . . (until converged) do Rk = B−1(A(Xk) − λkXk), Rk ← T (Rk) U1 = Xk, U2 = Rk, U3 = Pk ˜ Aij = Ui, UjA, ˜ Mij = Ui, Uj Find eigenpair (λk+1, y), with y2 = 1, for smallest eigenvalue

  • f matrix pencil ˜

A − λ ˜ M. Pk+1 = y2 · Rk + y3 · Pk Pk+1 ← T (Pk+1) Xk+1 = y1 · Xk + Pk+1 Xk+1 ← T (Xk+1) Xk+1 ← Xk+1/

  • Xk+1, Xk+1

end for Return (λmin, X) = (λk+1, Xk+1). T = truncation to hierarchical low rank

109

slide-110
SLIDE 110

Implementation details

Orthogonalization In standard LOBPCG, orthogonalization of U is recommended [Knyazev 2010]. This is not practical with low-rank tensors, as ranks would grow and truncation would destroy orthogonality. Truncation Xk, Rk, Pk are truncated in every step. Moreover, application of A(·) and preconditioner B−1(·) may involve truncation during the application of these operators. Inner product Reduced matrix ˜ A is very sensitive to truncation in A(·). The computation of ˜ Ai,j = Ui, UjA must be exact.

110

slide-111
SLIDE 111

Numerical Experiments - Sine potential

PDE-eigenvalue problem with Ω = [0, π]d and sine potential V(ξ) = q ·

d

  • i=1

sin(ξi) for some constant q > 0. We choose d = 10, n = 128. Preconditioner: [Grasedyck 2004] A−1

L

= ∞ exp(−tAL)dt ≈

M

  • j=−M

ωj exp(−αjA(d)

L ) ⊗ · · · ⊗ exp(−αjA(1) L ) =: B−1,

for a certain, optimized and tabulated choice of coefficients αj, ωj > 0. We choose M = 10.

111

slide-112
SLIDE 112

Numerical Experiments - Sine potential

q = 1

10 20 30 40 10

−8

10

−6

10

−4

10

−2

10 10

2

10

4

Residual Iterations 10 20 30 40 10 20 30 40 50 Maximal rank eps 1e−2 eps 1e−4 eps 1e−8

q = 1000

10 20 30 40 10

−8

10

−6

10

−4

10

−2

10 10

2

10

4

Residual Iterations 10 20 30 40 10 20 30 40 50 Maximal rank eps 1e−2 eps 1e−4 112

slide-113
SLIDE 113

ALS

Originally from computational quantum physics [Schollwöck 2011], recently investigated by [Huckle et al. 2010; Oseledets, Khoromskij 2010; Holtz et al. 2010; Dolgov, Oseledets 2011] Goal: min X, A(X) X, X : X ∈ H-Tucker

  • (rt)t∈T
  • , X = 0
  • Method: Choose one node t, fix all other nodes, set new tensor at

node t to minimize Rayleigh quotient X,A(X)

X,X . This is done for all

nodes (a sweep), and sweeps are continued until convergence. Sketch: X (t) = UtV T

t =

  • Utr ⊗ Utl
  • BtV T

t ,

vec(X) =

  • Vt ⊗ Utr ⊗ Utl
  • vec(Bt) = Ut vec(Bt).

⇒ min yT(UT

t A Ut)y

yT(UT

t Ut)y

: y ∈ Rrtl rtr rt, y = 0

  • .

113

slide-114
SLIDE 114

Computation of reduced matrices

Consider A = Ad ⊗ · · · ⊗ A1 (Other operators can be treated similarly) Compute ˜ At := UT

t AUt =

  • Vt ⊗ Utr ⊗ Utl

TA

  • Vt ⊗ Utr ⊗ Utl
  • = ˆ

At ⊗ ˜ Atr ⊗ ˜ Atl, where ˜ Atl = UT

tl i∈tl

Ai

  • Utl,

˜ Atr = UT

tr i∈tr

Ai

  • Utr ,

ˆ At = V T

t i∈t

Ai

  • Vt.

Additionally ˜ Mt := UT

t Ut = V T t Vt ⊗ UT tr Utr ⊗ UT tl Utl = Mt ⊗ Mtr ⊗ Mtl,

114

slide-115
SLIDE 115

Computation of reduced matrices

A1 A3 A5 A6 A7 A8 A2 A4 ˜ A12 ˜ A34 ˆ A1234

115

slide-116
SLIDE 116

MALS

Method:

◮ Select edge of tensor network. ◮ Combine tensors at the adjacent nodes to form a higher-order

tensor.

◮ Set this tensor to minimize the Rayleigh quotient. ◮ Use low-rank approximation to split new combined tensor into

two tensors at adjacent nodes of selected edge.

116

slide-117
SLIDE 117

MALS - Illustration

117

slide-118
SLIDE 118

Numerical Experiments – Sine potential

PDE-eigenvalue problem with Ω = [0, π]d and sine potential V(ξ) = q ·

d

  • i=1

sin(ξi) for some constant q > 0. Choose d = 10, n = 128, q = 1000. Preconditioner: [Grasedyck 2004] A−1

L

= ∞ exp(−tAL)dt ≈

M

  • j=−M

ωj exp(−αjA(d)

L ) ⊗ · · · ⊗ exp(−αjA(1) L ) =: B−1,

for a certain, optimized choice of coefficients αj, ωj > 0. We choose M = 10.

118

slide-119
SLIDE 119

Numerical Experiments – Sine potential

ALS

100 200 300 400 500 10

−15

10

−10

10

−5

10 10

5

Execution time [s] 100 200 300 400 500 15 20 25 30 35 40 45 err_lambda res nr_iter

Hierarchical ranks 40. MALS

100 200 300 400 500 10

−15

10

−10

10

−5

10 10

5

Execution time [s] 100 200 300 400 500 20 40 60 80 100 err_lambda res eps rank nr_iter

Maximal hierarchical rank 30.

119

slide-120
SLIDE 120

Conclusions and Outlook

120

slide-121
SLIDE 121

Conclusions and Outlook

◮ Scientific computing with low-rank tensors rapidly evolving field

and highly technical.

◮ Precise scope of applications far from clear; many applications

remain to be explored. More analysis and comparison to alternative techniques (sparse grids, adaptive tensor discretization, Monte Carlo, . . .) needed. Some current trends:

◮ Tensorization of vectors + low rank (discrete Chebfun?) by

Hackbusch, Khoromskij, Oseledets, Tyrtishnikov, . . .

◮ Computational differential geometry on low-rank tensor manifolds

by Koch, Lubich, Schneider, Uschmajew, Vandereycken, . . .

◮ Robust low rank (Candes et al.) for tensors suitable way of

dealing with singularities?

◮ . . .

Acknowledgments: Presentation heavily benefited from joint work with Christine Tobler (ETH Zurich).

121