[PPT] - Stiefel Manifolds and their Applications Pierre-Antoine Absil PowerPoint Presentation

SLIDE 1

Stiefel Manifolds and their Applications

Pierre-Antoine Absil

(UCLouvain)

CESAME seminar 22 September 2009

1

SLIDE 2

Structure

◮ Definition and visualization ◮ A glimpse of applications ◮ Geometry of the Stiefel manifolds ◮ Applications

2

SLIDE 3

Collaborations

◮ Chris Baker (Sandia) ◮ Thomas Cason (UCLouvain) ◮ Kyle Gallivan (Florida State University) ◮ Damien Laurent (UCLouvain) ◮ Rob Mahony (Australian National University) ◮ Chafik Samir (U Clermont-Ferrand) ◮ Rodolphe Sepulchre (U of Li`

ege)

◮ Fabian Theis (TU Munich) ◮ Paul Van Dooren (UCLouvain) ◮ ...

3

SLIDE 4

Definition

Stiefel manifold: Definition

The (compact) Stiefel manifold Vn,p is the set of all p-tuples (x1, . . . , xp)

f orthonormal vectors in Rn.

If we turn p-tuples into n × p matrices as follows (x1, . . . , xp) →

x1

· · · xp

,

the definition becomes Vn,p = {X ∈ Rn×p : X TX = Ip}.

4

SLIDE 5

Definition

Visualization: an element of V3,2

5

SLIDE 6

Definition

Stiefel manifold: (very unfaithful) artist view

Vn,p

6

SLIDE 7

Definition

Stiefel manifold: optimization problems

Vn,p f R

7

SLIDE 8

Definition

Stiefel manifold: optimization algorithms

Vn,p f R x

8

SLIDE 9

Definition

Stiefel manifold: Extensions

◮ Recall: Real case:

Vp(Rn) = {X ∈ Rn×p : X TX = Ip} =: Vn,p.

◮ Complex case:

Vp(Cn) = {X ∈ Cn×p : X HX = Ip}.

◮ Quaternion case:

Vp(Hn) = {X ∈ Hn×p : X ∗X = Ip}.

◮ If M is a Riemannian manifold, one can define

Vp(TM) = {(ξ1, . . . , ξp)|∃x ∈ M : ξi ∈ TxM, ξi, ξj = δij}.

9

SLIDE 10

Definition

Stiefel manifold: Particular cases

◮ Recall: Real case:

Vp(Rn) = {X ∈ Rn×p : X TX = Ip} =: Vn,p.

◮ p = 1: the sphere

Vn,1 = {x ∈ Rn : xTx = 1}.

◮ p = n: the orthogonal group

Vn,n = On = {X ∈ Rn×n : X TX = In}.

10

SLIDE 11

Definition

Notation

◮ E. Stiefel (1935): Vn,m (compact), V ∗ n,m (noncompact). ◮ I. M. James (1976): On,k (compact) Stiefel manifold, O∗ n,k

noncompact Stiefel manifold, Vn,k in the real case, Wn,k in the complex case, Xn,k in the quaternion case.

◮ Helmke & Moore (1994): St(k, n) compact Stiefel manifold,

ST(k, n) noncompact Stiefel manifold.

◮ Edelman, Arias, & Smith (1998): Vn,p. ◮ Bridges & Reich (2001): Vk(Rn). ◮ Bloch et al.(2006): V (n, N) = {Q ∈ RnN; QQT = In}.

11

SLIDE 12

Glimpse of applications

A glimpse of applications

◮ Principal component analysis ◮ Lyapunov exponents of a dynamical system ◮ Procrustes problem ◮ Blind Source Separation - soft dimension reduction

12

SLIDE 13

Geometry

◮ Dimension ◮ Tangent spaces ◮ Projection onto tangent spaces ◮ Geodesics

13

SLIDE 14

Geometry

Stiefel manifold: dimension

Dimension of Vn,p:

◮ 1st vector: one unit-norm constraint: n − 1 DOF. ◮ 2nd vector: unit-norm and orthogonal to 1st: n − 2 DOF. ◮ ... ◮ pth vector: n − p DOF.

Total: dim(Vn,p) = pn − (1 + 2 + · · · + p) = pn − p(p + 1)/2 = p(n − p) + p(p − 1)/2.

14

SLIDE 15

Geometry

Stiefel manifold: tangent space

Y (t) ˙ Y (0) TXVn,p Vn,p X

15

SLIDE 16

Geometry

Stiefel manifold: tangent space

Let X ∈ Vn,p and let Y (t) be a curve on Vn,p with Y (0) = X. Then ˙ Y (0) is a tangent vector to Vn,p at X. The set of all such vectors is the tangent space to Vn,p at X. We have Y (t)TY (t) = Ip for all t d dt (Y (t)TY (t)) = 0 for all t ˙ Y (0)TY (0) + Y (0)T ˙ Y (0) = 0 X T ˙ Y (0) is skew ˙ Y (0) = XΩ + X⊥K, ΩT = −Ω. Hence TXVn,p = {XΩ + X⊥K : ΩT = −Ω, K ∈ R(n−p)×p}.

16

SLIDE 17

Geometry

Stiefel manifold: projection onto the tangent space

Y (t) TXVn,p ˙ Y (0) Z PTXVn,p(Z) Vn,p X

17

SLIDE 18

Geometry

Stiefel manifold: projection onto the tangent space

◮ Tangent space: TXVn,p = {XΩ + X⊥K : ΩT = −Ω, K ∈ R(n−p)×p}. ◮ Normal space: NXVn,p = {XS : ST = S}. ◮ Projection onto the tangent space:

PTX Vn,p(Z) = Z − Xsym(X TZ) = (I − XX T)Z + Xskew(X TZ), where sym(M) = 1

2(M + MT) and skew(M) = 1 2(M − MT).

18

SLIDE 19

Geometry

Stiefel manifold: geodesics

Vn,p X

19

SLIDE 20

Geometry

Stiefel manifold: geodesics

A curve X(t) on Vn,p is a geodesic if, for all t, ¨ X(t) ∈ NX(t)Vn,p. Ross Lippert showed that X(t) =

X(0)

˙ X(0)

exp t

X(0)T ˙ X(0) − ˙ X(0)T ˙ X(0) I X(0)T ˙ X(0)

I2p,pe−tX(0)T ˙

X(0).

20

SLIDE 21

Geometry

Stiefel manifold: quotient geodesics

Bijection between Vn,p and On/On−p: Vn,p ∋ X ↔ {

U

X

X⊥

: UTU = In} ∈ On/On−p

Quotient geodesics: If U(t) = U(0) exp t A −BT B

.

then U:,1:p(t) ∈ Vn,p follows a quotient geodesic.

21

SLIDE 22

Applications

◮ Principal component analysis ◮ Lyapunov exponents of a dynamical system ◮ Procrustes problem ◮ Blind Source Separation - soft dimension reduction

22

SLIDE 23

Applications

Principal component analysis

◮ Let A = AT ∈ Rn×n. ◮ Goal: Compute the p dominant eigenvectors of A. ◮ Principle: Let N = diag(p, p − 1, · · · , 1) and solve

max

X T X=Ip

tr(X TAXN). The columns of X are the p dominant eigenvectors or A.

◮ A basic method: Steepest-descent on Vn,p. ◮ Let f : Rn×p → R : X → tr(X TAXN). ◮ We have 1 2grad f (X) = AXN. ◮ Thus 1 2grad f |Vn,p(X) = PTX Vn,p(AXN) = AXN − Xsym(X TAXN),

where sym(Z) := (Z + Z T)/2.

◮ Basic algorithm: Follow ˙

X = grad f |Vn,p(X).

23

SLIDE 24

Applications

Principal component analysis

◮ Let A = AT ∈ Rn×n. ◮ Goal: Compute the p dominant eigenvectors of A. ◮ Principle: Let N = diag(p, p − 1, · · · , 1) and solve

max

X T X=Ip

tr(X TAXN). The columns of X are the p dominant eigenvectors or A.

◮ A basic method: Steepest-descent on Vn,p. ◮ Let f : Rn×p → R : X → tr(X TAXN). ◮ We have 1 2grad f (X) = AXN. ◮ Thus 1 2grad f |Vn,p(X) = PTX Vn,p(AXN) = AXN − Xsym(X TAXN),

where sym(Z) := (Z + Z T)/2.

◮ Basic algorithm: Follow ˙

X = grad f |Vn,p(X).

24

SLIDE 25

Applications

Principal component analysis

◮ Let A = AT ∈ Rn×n. ◮ Goal: Compute the p dominant eigenvectors of A. ◮ Principle: Let N = diag(p, p − 1, · · · , 1) and solve

max

X T X=Ip

tr(X TAXN). The columns of X are the p dominant eigenvectors or A.

◮ A basic method: Steepest-descent on Vn,p. ◮ Let f : Rn×p → R : X → tr(X TAXN). ◮ We have 1 2grad f (X) = AXN. ◮ Thus 1 2grad f |Vn,p(X) = PTX Vn,p(AXN) = AXN − Xsym(X TAXN),

where sym(Z) := (Z + Z T)/2.

◮ Basic algorithm: Follow ˙

X = grad f |Vn,p(X).

25

SLIDE 26

Applications

Principal component analysis

◮ Let A = AT ∈ Rn×n. ◮ Goal: Compute the p dominant eigenvectors of A. ◮ Principle: Let N = diag(p, p − 1, · · · , 1) and solve

max

X T X=Ip

tr(X TAXN). The columns of X are the p dominant eigenvectors or A.

◮ A basic method: Steepest-descent on Vn,p. ◮ Let f : Rn×p → R : X → tr(X TAXN). ◮ We have 1 2grad f (X) = AXN. ◮ Thus 1 2grad f |Vn,p(X) = PTX Vn,p(AXN) = AXN − Xsym(X TAXN),

where sym(Z) := (Z + Z T)/2.

◮ Basic algorithm: Follow ˙

X = grad f |Vn,p(X).

26

SLIDE 27

Applications

Principal component analysis

◮ Let A = AT ∈ Rn×n. ◮ Goal: Compute the p dominant eigenvectors of A. ◮ Principle: Let N = diag(p, p − 1, · · · , 1) and solve

max

X T X=Ip

tr(X TAXN). The columns of X are the p dominant eigenvectors or A.

◮ A basic method: Steepest-descent on Vn,p. ◮ Let f : Rn×p → R : X → tr(X TAXN). ◮ We have 1 2grad f (X) = AXN. ◮ Thus 1 2grad f |Vn,p(X) = PTX Vn,p(AXN) = AXN − Xsym(X TAXN),

where sym(Z) := (Z + Z T)/2.

◮ Basic algorithm: Follow ˙

X = grad f |Vn,p(X).

27

SLIDE 28

Applications

Principal component analysis

◮ Let A = AT ∈ Rn×n. ◮ Goal: Compute the p dominant eigenvectors of A. ◮ Principle: Let N = diag(p, p − 1, · · · , 1) and solve

max

X T X=Ip

tr(X TAXN). The columns of X are the p dominant eigenvectors or A.

◮ A basic method: Steepest-descent on Vn,p. ◮ Let f : Rn×p → R : X → tr(X TAXN). ◮ We have 1 2grad f (X) = AXN. ◮ Thus 1 2grad f |Vn,p(X) = PTX Vn,p(AXN) = AXN − Xsym(X TAXN),

where sym(Z) := (Z + Z T)/2.

◮ Basic algorithm: Follow ˙

X = grad f |Vn,p(X).

28

SLIDE 29

Applications

Computing Lyapunov exponents: a method on the Stiefel manifold

◮ Ref: T. Bridges and S. Reich, Computing Lyapunov exponents on a

Stiefel manifold, Physica D 156, pp. 219–238, 2001.

◮ Dynamical system: ˙

x = f (x).

◮ Nominal trajectory: x∗(t). ◮ Goal: Describe the behavior of nearby trajectories.

29

SLIDE 30

Applications

Computing Lyapunov exponents: a method on the Stiefel manifold

◮ Dynamical system: ˙

x = f (x).

◮ Nominal trajectory: x∗(t). ◮ Goal: Describe the behavior of nearby trajectories.

30

SLIDE 31

Applications

Computing Lyapunov exponents: a method on the Stiefel manifold

◮ Dynamical system: ˙

x = f (x).

◮ Nominal trajectory: x∗(t). ◮ Goal: Describe the behavior of nearby trajectories. ◮ Principle: Consider the evolution of an infinitesimal ball of perturbed

initial conditions. The ball becomes distorted into an infinitesimal

ellipsoid. If the length δk(t) of the kth principal axis evolves as

δk(t) ≈ δk(0)eλkt, then λk is the kth Lyapunov exponent of the system along the nominal trajectory. The mean Lyapunov exponents are given by λk = lim

t→∞

1 t δk(t) δk(0).

31

SLIDE 32

Applications

Computing Lyapunov exponents: a method on the Stiefel manifold

◮ Dynamical system: ˙

x = f (x).

◮ Nominal trajectory: x∗(t). ◮ Goal: Describe the behavior of nearby trajectories. ◮ Principle: δk(t) ≈ δk(0)eλkt. ◮ Challenge 1: Compute just a few Lyapunov exponents ❀ work with

p-frames (noncompact Stiefel manifold).

◮ Perturbed system:

˙ Z = A(t)Z, Z ∈ Rn×p, A(t) := Df (x∗(t)). (1)

32

SLIDE 33

Applications

Computing Lyapunov exponents: a method on the Stiefel manifold

◮ Dynamical system: ˙

x = f (x).

◮ Nominal trajectory: x∗(t). ◮ Goal: Describe the behavior of nearby trajectories. ◮ Principle: δk(t) ≈ δk(0)eλkt. ◮ Challenge 1: Compute just a few Lyapunov exponents ❀ work with

p-frames (noncompact Stiefel manifold).

◮ Perturbed system:

˙ Z = A(t)Z, Z ∈ Rn×p, A(t) := Df (x∗(t)). (1)

33

SLIDE 34

Applications

Computing Lyapunov exponents: a method on the Stiefel manifold

◮ Dynamical system: ˙

x = f (x).

◮ Nominal trajectory: x∗(t). ◮ Goal: Describe the behavior of nearby trajectories. ◮ Principle: δk(t) ≈ δk(0)eλkt. ◮ Challenge 1: Compute just a few Lyapunov exponents ❀ work with

p-frames (noncompact Stiefel manifold).

◮ Perturbed system:

˙ Z = A(t)Z, Z ∈ Rn×p, A(t) := Df (x∗(t)). (1)

34

SLIDE 35

Applications

Computing Lyapunov exponents: a method on the Stiefel manifold

◮ Dynamical system: ˙

x = f (x).

◮ Nominal trajectory: x∗(t). ◮ Goal: Describe the behavior of nearby trajectories. ◮ Principle: δk(t) ≈ δk(0)eλkt. ◮ Perturbed system:

˙ Z = A(t)Z, Z ∈ Rn×p, A(t) := Df (x∗(t)). (1)

◮ Challenge 2: Perform continuous orthogonalization to prevent the

columns of Z from converging to 1st Lyapunov vector ❀ Vn,p.

◮ Method: Follow the evolution of Q(t) in the thin QR decomposition

Z(t) = Q(t)R(t).

35

SLIDE 36

Applications

Computing Lyapunov exponents: a method on the Stiefel manifold

◮ Dynamical system: ˙

x = f (x).

◮ Nominal trajectory: x∗(t). ◮ Goal: Describe the behavior of nearby trajectories. ◮ Principle: δk(t) ≈ δk(0)eλkt. ◮ Perturbed system:

˙ Z = A(t)Z, Z ∈ Rn×p, A(t) := Df (x∗(t)). (1)

◮ Challenge 2: Perform continuous orthogonalization to prevent the

columns of Z from converging to 1st Lyapunov vector ❀ Vn,p.

◮ Method: Follow the evolution of Q(t) in the thin QR decomposition

Z(t) = Q(t)R(t).

36

SLIDE 37

Applications

Computing Lyapunov exponents: details of method on Stiefel

◮ Perturbed system: ˙

Z = A(t)Z, Z ∈ Rn×p, A(t) := Df (x∗(t)).

◮ Principle: track Q(t) in Z(t) = Q(t)R(t). ◮ We have the formula

˙ Q = (I − QQT) ˙ ZR−1 + QS, Si,j =      (QT ˙ ZR−1)i,j, i > j i = j −(QT ˙ ZR−1)j,i i < j.

◮ Hence

˙ Q = (I − QQT)A(t)Q + QS, Si,j =      (QTA(t)Q)i,j, i > j i = j −(QTA(t)Q)j,i i < j.

◮ This is a dynamical system on the Stiefel manifold Vn,p. It can be

rewritten as ˙ Q = A(t)Q − QT, where T is upper triangular.

◮ For almost all initial Q(0), the p columns of Q(t) converge to the p

leading Lyapunov vectors for the given trajectory of the given

37

SLIDE 38

Applications

Computing Lyapunov exponents: details of method on Stiefel

◮ Perturbed system: ˙

Z = A(t)Z, Z ∈ Rn×p, A(t) := Df (x∗(t)).

◮ Principle: track Q(t) in Z(t) = Q(t)R(t). ◮ We have the formula

˙ Q = (I − QQT) ˙ ZR−1 + QS, Si,j =      (QT ˙ ZR−1)i,j, i > j i = j −(QT ˙ ZR−1)j,i i < j.

◮ Hence

˙ Q = (I − QQT)A(t)Q + QS, Si,j =      (QTA(t)Q)i,j, i > j i = j −(QTA(t)Q)j,i i < j.

◮ This is a dynamical system on the Stiefel manifold Vn,p. It can be

rewritten as ˙ Q = A(t)Q − QT, where T is upper triangular.

◮ For almost all initial Q(0), the p columns of Q(t) converge to the p

leading Lyapunov vectors for the given trajectory of the given

38

SLIDE 39

Applications

Computing Lyapunov exponents: details of method on Stiefel

◮ Perturbed system: ˙

Z = A(t)Z, Z ∈ Rn×p, A(t) := Df (x∗(t)).

◮ Principle: track Q(t) in Z(t) = Q(t)R(t). ◮ We have the formula

˙ Q = (I − QQT) ˙ ZR−1 + QS, Si,j =      (QT ˙ ZR−1)i,j, i > j i = j −(QT ˙ ZR−1)j,i i < j.

◮ Hence

˙ Q = (I − QQT)A(t)Q + QS, Si,j =      (QTA(t)Q)i,j, i > j i = j −(QTA(t)Q)j,i i < j.

◮ This is a dynamical system on the Stiefel manifold Vn,p. It can be

rewritten as ˙ Q = A(t)Q − QT, where T is upper triangular.

◮ For almost all initial Q(0), the p columns of Q(t) converge to the p

leading Lyapunov vectors for the given trajectory of the given

39

SLIDE 40

Applications

Computing Lyapunov exponents: details of method on Stiefel

◮ Perturbed system: ˙

Z = A(t)Z, Z ∈ Rn×p, A(t) := Df (x∗(t)).

◮ Principle: track Q(t) in Z(t) = Q(t)R(t). ◮ Hence

˙ Q = (I − QQT)A(t)Q + QS, Si,j =      (QTA(t)Q)i,j, i > j i = j −(QTA(t)Q)j,i i < j.

◮ This is a dynamical system on the Stiefel manifold Vn,p. It can be

rewritten as ˙ Q = A(t)Q − QT, where T is upper triangular.

◮ For almost all initial Q(0), the p columns of Q(t) converge to the p

leading Lyapunov vectors for the given trajectory of the given

system. The corresponding Lyapunov exponents can be computed as

λj = lim

t→∞

1 t t Tjj(s)ds, j = 1, . . . , p.

40

SLIDE 41

Applications

Computing Lyapunov exponents: details of method on Stiefel

◮ Perturbed system: ˙

Z = A(t)Z, Z ∈ Rn×p, A(t) := Df (x∗(t)).

◮ Principle: track Q(t) in Z(t) = Q(t)R(t). ◮ Hence

˙ Q = (I − QQT)A(t)Q + QS, Si,j =      (QTA(t)Q)i,j, i > j i = j −(QTA(t)Q)j,i i < j.

◮ This is a dynamical system on the Stiefel manifold Vn,p. It can be

rewritten as ˙ Q = A(t)Q − QT, where T is upper triangular.

◮ For almost all initial Q(0), the p columns of Q(t) converge to the p

leading Lyapunov vectors for the given trajectory of the given

system. The corresponding Lyapunov exponents can be computed as

λj = lim

t→∞

1 t t Tjj(s)ds, j = 1, . . . , p.

41

SLIDE 42

Applications

Procrustes problem on the Stiefel manifold

◮ Ref: Lars Eld´

en and Haesun Park, A Procrustes problem on the Stiefel manifold, Numer. Math. (1999) 82: 599–619.

◮ Orthogonal Procrustes problem:

given A ∈ Rm×n, B ∈ Rm×p. find Q ∈ Rn×p that solves min

QT Q=Ip

AQ − B2

F. ◮ First-order optimality condition `

a la manifold: Consider f : Rn×p → R : Q → AQ − B2

F. We have

f (Q) = tr(BTB) + tr(QTATAQ) − 2tr(QTATB) Df (Q)[ ˙ Q] = −2tr( ˙ QTATB) + 2tr( ˙ QTATAQ), grad f (Q) = −2AT(B − AQ), grad f |Vn,p(Q) = grad f (Q) − Qsym(QTgrad f (Q)), where sym(A) := 1

2(A + AT). ◮ Case p = n: Then f |Vn,p(Q) = cst − 2tr(QTATB), and the solution

is given by ATB = UΣV T, Q = UV T

42

SLIDE 43

Applications

Procrustes problem on the Stiefel manifold

◮ Orthogonal Procrustes problem:

given A ∈ Rm×n, B ∈ Rm×p. find Q ∈ Rn×p that solves min

QT Q=Ip

AQ − B2

F. ◮ First-order optimality condition `

a la manifold: Consider f : Rn×p → R : Q → AQ − B2

F. We have

f (Q) = tr(BTB) + tr(QTATAQ) − 2tr(QTATB) Df (Q)[ ˙ Q] = −2tr( ˙ QTATB) + 2tr( ˙ QTATAQ), grad f (Q) = −2AT(B − AQ), grad f |Vn,p(Q) = grad f (Q) − Qsym(QTgrad f (Q)), where sym(A) := 1

2(A + AT). ◮ Case p = n: Then f |Vn,p(Q) = cst − 2tr(QTATB), and the solution

is given by ATB = UΣV T, Q = UV T if ATB is invertible.

43

SLIDE 44

Applications

Procrustes problem on the Stiefel manifold

◮ Orthogonal Procrustes problem:

given A ∈ Rm×n, B ∈ Rm×p. find Q ∈ Rn×p that solves min

QT Q=Ip

AQ − B2

F. ◮ Applications: factor analysis, used notably in psychometrics. ◮ First-order optimality condition `

a la manifold: Consider f : Rn×p → R : Q → AQ − B2

F. We have

f (Q) = tr(BTB) + tr(QTATAQ) − 2tr(QTATB) Df (Q)[ ˙ Q] = −2tr( ˙ QTATB) + 2tr( ˙ QTATAQ), grad f (Q) = −2AT(B − AQ), grad f |Vn,p(Q) = grad f (Q) − Qsym(QTgrad f (Q)), where sym(A) := 1

2(A + AT). ◮ Case p = n: Then f |Vn,p(Q) = cst − 2tr(QTATB), and the solution

is given by ATB = UΣV T, Q = UV T if ATB is invertible.

44

SLIDE 45

Applications

Procrustes problem on the Stiefel manifold

◮ Orthogonal Procrustes problem:

min

QT Q=Ip

AQ − B2

F. ◮ First-order optimality condition `

a la manifold: Consider f : Rn×p → R : Q → AQ − B2

F. We have

f (Q) = tr(BTB) + tr(QTATAQ) − 2tr(QTATB) Df (Q)[ ˙ Q] = −2tr( ˙ QTATB) + 2tr( ˙ QTATAQ), grad f (Q) = −2AT(B − AQ), grad f |Vn,p(Q) = grad f (Q) − Qsym(QTgrad f (Q)), where sym(A) := 1

2(A + AT). ◮ Case p = n: Then f |Vn,p(Q) = cst − 2tr(QTATB), and the solution

is given by ATB = UΣV T, Q = UV T if ATB is invertible.

45

SLIDE 46

Applications

Procrustes problem on the Stiefel manifold

◮ Orthogonal Procrustes problem:

min

QT Q=Ip

AQ − B2

F. ◮ First-order optimality condition `

a la manifold: Consider f : Rn×p → R : Q → AQ − B2

F. We have

f (Q) = tr(BTB) + tr(QTATAQ) − 2tr(QTATB) Df (Q)[ ˙ Q] = −2tr( ˙ QTATB) + 2tr( ˙ QTATAQ), grad f (Q) = −2AT(B − AQ), grad f |Vn,p(Q) = grad f (Q) − Qsym(QTgrad f (Q)), where sym(A) := 1

2(A + AT). ◮ Case p = n: Then f |Vn,p(Q) = cst − 2tr(QTATB), and the solution

is given by ATB = UΣV T, Q = UV T if ATB is invertible.

46

SLIDE 47

Applications

Joint diagonalization on the Stiefel manifold

◮ Measurements X =

   x1(t1) x1(t2) · · · x1(tf ) . . . . . . ... . . . xn(t1) xn(t2) · · · xn(tf )   .

◮ Goal: Find a matrix W ∈ Rn×p such that the rows of

Y = W TX look as statistically independent as possible.

◮ Decompose W = UΣV T. We have

Y = V T ΣUTX

=: ˜ X

.

◮ Whitening: Choose Σ and U such that ˜

X ˜ X T = In. Then YY T = V T ˜ X ˜ X TV = V TV = Ip.

◮ Independence and dimension reduction: Consider a collection of

covariance-like matrix functions Ci(Y ) such that Ci(Y ) = V TCi(˜ X)V . Choose V to make the Ci(Y )’s as diagonal as possible.

◮ Principle: Solve

47

SLIDE 48

Applications

Joint diagonalization on the Stiefel manifold

◮ Measurements X =

   x1(t1) x1(t2) · · · x1(tf ) . . . . . . ... . . . xn(t1) xn(t2) · · · xn(tf )   .

◮ Goal: Find a matrix W ∈ Rn×p such that the rows of

Y = W TX look as statistically independent as possible.

◮ Decompose W = UΣV T. We have

Y = V T ΣUTX

=: ˜ X

.

◮ Whitening: Choose Σ and U such that ˜

X ˜ X T = In. Then YY T = V T ˜ X ˜ X TV = V TV = Ip.

◮ Independence and dimension reduction: Consider a collection of

covariance-like matrix functions Ci(Y ) such that Ci(Y ) = V TCi(˜ X)V . Choose V to make the Ci(Y )’s as diagonal as possible.

◮ Principle: Solve

48

SLIDE 49

Applications

Joint diagonalization on the Stiefel manifold

◮ Measurements X =

   x1(t1) x1(t2) · · · x1(tf ) . . . . . . ... . . . xn(t1) xn(t2) · · · xn(tf )   .

◮ Goal: Find a matrix W ∈ Rn×p such that the rows of

Y = W TX look as statistically independent as possible.

◮ Decompose W = UΣV T. We have

Y = V T ΣUTX

=: ˜ X

.

◮ Whitening: Choose Σ and U such that ˜

X ˜ X T = In. Then YY T = V T ˜ X ˜ X TV = V TV = Ip.

◮ Independence and dimension reduction: Consider a collection of

covariance-like matrix functions Ci(Y ) such that Ci(Y ) = V TCi(˜ X)V . Choose V to make the Ci(Y )’s as diagonal as possible.

◮ Principle: Solve

49

SLIDE 50

Applications

Joint diagonalization on the Stiefel manifold

◮ Goal: Find a matrix W ∈ Rn×p such that the rows of

Y = W TX look as statistically independent as possible.

◮ Decompose W = UΣV T. We have

Y = V T ΣUTX

=: ˜ X

.

◮ Whitening: Choose Σ and U such that ˜

X ˜ X T = In. Then YY T = V T ˜ X ˜ X TV = V TV = Ip.

◮ Independence and dimension reduction: Consider a collection of

covariance-like matrix functions Ci(Y ) such that Ci(Y ) = V TCi(˜ X)V . Choose V to make the Ci(Y )’s as diagonal as possible.

◮ Principle: Solve

max

V T V =Ip N

i=1

diag(V TCi(˜ X)V )2

F.

50

SLIDE 51

Applications

Joint diagonalization on the Stiefel manifold

◮ Goal: Find a matrix W ∈ Rn×p such that the rows of

Y = W TX look as statistically independent as possible.

◮ Decompose W = UΣV T. We have

Y = V T ΣUTX

=: ˜ X

.

◮ Whitening: Choose Σ and U such that ˜

X ˜ X T = In. Then YY T = V T ˜ X ˜ X TV = V TV = Ip.

◮ Independence and dimension reduction: Consider a collection of

covariance-like matrix functions Ci(Y ) such that Ci(Y ) = V TCi(˜ X)V . Choose V to make the Ci(Y )’s as diagonal as possible.

◮ Principle: Solve

max

V T V =Ip N

i=1

diag(V TCi(˜ X)V )2

F.

51

SLIDE 52

Applications

Joint diagonalization on the Stiefel manifold

◮ Goal: Find a matrix W ∈ Rn×p such that the rows of

Y = W TX look as statistically independent as possible.

◮ Decompose W = UΣV T. We have

Y = V T ΣUTX

=: ˜ X

.

◮ Whitening: Choose Σ and U such that ˜

X ˜ X T = In. Then YY T = V T ˜ X ˜ X TV = V TV = Ip.

◮ Independence and dimension reduction: Consider a collection of

covariance-like matrix functions Ci(Y ) such that Ci(Y ) = V TCi(˜ X)V . Choose V to make the Ci(Y )’s as diagonal as possible.

◮ Principle: Solve

max

V T V =Ip N

i=1

diag(V TCi(˜ X)V )2

F.

52

SLIDE 53

Applications

Joint diagonalization on the Stiefel manifold: application

The application is blind source separation. Two mixed pictures are given as input to a blind source separation algorithm based on a trust-region method on V2,2.

53

SLIDE 54

Applications

Joint diagonalization on the Stiefel manifold: application: input

54

SLIDE 55

Applications

Joint diagonalization on the Stiefel manifold: application: output

55

SLIDE 56

Applications

Some References

◮ E. Stiefel, Richtungsfelder und Fernparallelismus in n-dimensionalen

Mannigfaltigkeiten, Comment. Math. Helv. 8, no. 1, 305–353, 1935.

◮ U. Helmke, J. B. Moore, Optimization and Dynamical Systems, Springer, 1994. ◮ A. Edelman, T. Arias, S. T. Smith, The geometry of algorithms with

rthogonality constraints, SIAM J. Matrix Anal. Appl. 20(2), pp. 303–353, 1998.

◮ L. Eld´

en, H. Park, A Procrustes problem on the Stiefel manifold, Numer. Math. 82: 599–619, 1999.

◮ T. Bridges, S. Reich, Computing Lyapunov exponents on a Stiefel manifold,

Physica D 156, pp. 219–238, 2001.

◮ Y. Nishimori, S. Akaho, Learning algorithms utilizing quasi-geodesic flows on the

Stiefel manifold, Neurocomputing 67: 106–135, 2005.

◮ A. M. Bloch, P. E. Crouch, A. K. Sanyal, A variational problem on Stiefel

manifolds, Nonlinearity 19, pp. 2247–2276, 2006.

◮ F. Tompkins, P. J. Wolfe, Bayesian Filtering on the Stiefel Manifold, 2nd IEEE

International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, 2007. CAMPSAP 2007.

◮ P.-A. Absil, R. Mahony, R. Sepulchre, Optimization Algorithms on Matrix

Manifolds, Princeton University Press, 2008.

◮ F. J. Theis, T. P. Cason, P.-A. Absil, Soft Dimension Reduction for ICA by Joint

Diagonalization on the Stiefel Manifold, Proc. ICA 2009.

◮ ...

56

SLIDE 57

Applications 57