Stiefel Manifolds and their Applications Pierre-Antoine Absil - - PowerPoint PPT Presentation

stiefel manifolds and their applications
SMART_READER_LITE
LIVE PREVIEW

Stiefel Manifolds and their Applications Pierre-Antoine Absil - - PowerPoint PPT Presentation

Stiefel Manifolds and their Applications Pierre-Antoine Absil (UCLouvain) CESAME seminar 22 September 2009 1 Structure Definition and visualization A glimpse of applications Geometry of the Stiefel manifolds Applications 2


slide-1
SLIDE 1

Stiefel Manifolds and their Applications

Pierre-Antoine Absil

(UCLouvain)

CESAME seminar 22 September 2009

1

slide-2
SLIDE 2

Structure

◮ Definition and visualization ◮ A glimpse of applications ◮ Geometry of the Stiefel manifolds ◮ Applications

2

slide-3
SLIDE 3

Collaborations

◮ Chris Baker (Sandia) ◮ Thomas Cason (UCLouvain) ◮ Kyle Gallivan (Florida State University) ◮ Damien Laurent (UCLouvain) ◮ Rob Mahony (Australian National University) ◮ Chafik Samir (U Clermont-Ferrand) ◮ Rodolphe Sepulchre (U of Li`

ege)

◮ Fabian Theis (TU Munich) ◮ Paul Van Dooren (UCLouvain) ◮ ...

3

slide-4
SLIDE 4

Definition

Stiefel manifold: Definition

The (compact) Stiefel manifold Vn,p is the set of all p-tuples (x1, . . . , xp)

  • f orthonormal vectors in Rn.

If we turn p-tuples into n × p matrices as follows (x1, . . . , xp) →

  • x1

· · · xp

  • ,

the definition becomes Vn,p = {X ∈ Rn×p : X TX = Ip}.

4

slide-5
SLIDE 5

Definition

Visualization: an element of V3,2

5

slide-6
SLIDE 6

Definition

Stiefel manifold: (very unfaithful) artist view

Vn,p

6

slide-7
SLIDE 7

Definition

Stiefel manifold: optimization problems

Vn,p f R

7

slide-8
SLIDE 8

Definition

Stiefel manifold: optimization algorithms

Vn,p f R x

8

slide-9
SLIDE 9

Definition

Stiefel manifold: Extensions

◮ Recall: Real case:

Vp(Rn) = {X ∈ Rn×p : X TX = Ip} =: Vn,p.

◮ Complex case:

Vp(Cn) = {X ∈ Cn×p : X HX = Ip}.

◮ Quaternion case:

Vp(Hn) = {X ∈ Hn×p : X ∗X = Ip}.

◮ If M is a Riemannian manifold, one can define

Vp(TM) = {(ξ1, . . . , ξp)|∃x ∈ M : ξi ∈ TxM, ξi, ξj = δij}.

9

slide-10
SLIDE 10

Definition

Stiefel manifold: Particular cases

◮ Recall: Real case:

Vp(Rn) = {X ∈ Rn×p : X TX = Ip} =: Vn,p.

◮ p = 1: the sphere

Vn,1 = {x ∈ Rn : xTx = 1}.

◮ p = n: the orthogonal group

Vn,n = On = {X ∈ Rn×n : X TX = In}.

10

slide-11
SLIDE 11

Definition

Notation

◮ E. Stiefel (1935): Vn,m (compact), V ∗ n,m (noncompact). ◮ I. M. James (1976): On,k (compact) Stiefel manifold, O∗ n,k

noncompact Stiefel manifold, Vn,k in the real case, Wn,k in the complex case, Xn,k in the quaternion case.

◮ Helmke & Moore (1994): St(k, n) compact Stiefel manifold,

ST(k, n) noncompact Stiefel manifold.

◮ Edelman, Arias, & Smith (1998): Vn,p. ◮ Bridges & Reich (2001): Vk(Rn). ◮ Bloch et al.(2006): V (n, N) = {Q ∈ RnN; QQT = In}.

11

slide-12
SLIDE 12

Glimpse of applications

A glimpse of applications

◮ Principal component analysis ◮ Lyapunov exponents of a dynamical system ◮ Procrustes problem ◮ Blind Source Separation - soft dimension reduction

12

slide-13
SLIDE 13

Geometry

Geometry

◮ Dimension ◮ Tangent spaces ◮ Projection onto tangent spaces ◮ Geodesics

13

slide-14
SLIDE 14

Geometry

Stiefel manifold: dimension

Dimension of Vn,p:

◮ 1st vector: one unit-norm constraint: n − 1 DOF. ◮ 2nd vector: unit-norm and orthogonal to 1st: n − 2 DOF. ◮ ... ◮ pth vector: n − p DOF.

Total: dim(Vn,p) = pn − (1 + 2 + · · · + p) = pn − p(p + 1)/2 = p(n − p) + p(p − 1)/2.

14

slide-15
SLIDE 15

Geometry

Stiefel manifold: tangent space

Y (t) ˙ Y (0) TXVn,p Vn,p X

15

slide-16
SLIDE 16

Geometry

Stiefel manifold: tangent space

Let X ∈ Vn,p and let Y (t) be a curve on Vn,p with Y (0) = X. Then ˙ Y (0) is a tangent vector to Vn,p at X. The set of all such vectors is the tangent space to Vn,p at X. We have Y (t)TY (t) = Ip for all t d dt (Y (t)TY (t)) = 0 for all t ˙ Y (0)TY (0) + Y (0)T ˙ Y (0) = 0 X T ˙ Y (0) is skew ˙ Y (0) = XΩ + X⊥K, ΩT = −Ω. Hence TXVn,p = {XΩ + X⊥K : ΩT = −Ω, K ∈ R(n−p)×p}.

16

slide-17
SLIDE 17

Geometry

Stiefel manifold: projection onto the tangent space

Y (t) TXVn,p ˙ Y (0) Z PTXVn,p(Z) Vn,p X

17

slide-18
SLIDE 18

Geometry

Stiefel manifold: projection onto the tangent space

◮ Tangent space: TXVn,p = {XΩ + X⊥K : ΩT = −Ω, K ∈ R(n−p)×p}. ◮ Normal space: NXVn,p = {XS : ST = S}. ◮ Projection onto the tangent space:

PTX Vn,p(Z) = Z − Xsym(X TZ) = (I − XX T)Z + Xskew(X TZ), where sym(M) = 1

2(M + MT) and skew(M) = 1 2(M − MT).

18

slide-19
SLIDE 19

Geometry

Stiefel manifold: geodesics

Vn,p X

19

slide-20
SLIDE 20

Geometry

Stiefel manifold: geodesics

A curve X(t) on Vn,p is a geodesic if, for all t, ¨ X(t) ∈ NX(t)Vn,p. Ross Lippert showed that X(t) =

  • X(0)

˙ X(0)

  • exp t

X(0)T ˙ X(0) − ˙ X(0)T ˙ X(0) I X(0)T ˙ X(0)

  • I2p,pe−tX(0)T ˙

X(0).

20

slide-21
SLIDE 21

Geometry

Stiefel manifold: quotient geodesics

Bijection between Vn,p and On/On−p: Vn,p ∋ X ↔ {

U

  • X

X⊥

  • : UTU = In} ∈ On/On−p

Quotient geodesics: If U(t) = U(0) exp t A −BT B

  • .

then U:,1:p(t) ∈ Vn,p follows a quotient geodesic.

21

slide-22
SLIDE 22

Applications

Applications

◮ Principal component analysis ◮ Lyapunov exponents of a dynamical system ◮ Procrustes problem ◮ Blind Source Separation - soft dimension reduction

22

slide-23
SLIDE 23

Applications

Principal component analysis

◮ Let A = AT ∈ Rn×n. ◮ Goal: Compute the p dominant eigenvectors of A. ◮ Principle: Let N = diag(p, p − 1, · · · , 1) and solve

max

X T X=Ip

tr(X TAXN). The columns of X are the p dominant eigenvectors or A.

◮ A basic method: Steepest-descent on Vn,p. ◮ Let f : Rn×p → R : X → tr(X TAXN). ◮ We have 1 2grad f (X) = AXN. ◮ Thus 1 2grad f |Vn,p(X) = PTX Vn,p(AXN) = AXN − Xsym(X TAXN),

where sym(Z) := (Z + Z T)/2.

◮ Basic algorithm: Follow ˙

X = grad f |Vn,p(X).

23

slide-24
SLIDE 24

Applications

Principal component analysis

◮ Let A = AT ∈ Rn×n. ◮ Goal: Compute the p dominant eigenvectors of A. ◮ Principle: Let N = diag(p, p − 1, · · · , 1) and solve

max

X T X=Ip

tr(X TAXN). The columns of X are the p dominant eigenvectors or A.

◮ A basic method: Steepest-descent on Vn,p. ◮ Let f : Rn×p → R : X → tr(X TAXN). ◮ We have 1 2grad f (X) = AXN. ◮ Thus 1 2grad f |Vn,p(X) = PTX Vn,p(AXN) = AXN − Xsym(X TAXN),

where sym(Z) := (Z + Z T)/2.

◮ Basic algorithm: Follow ˙

X = grad f |Vn,p(X).

24

slide-25
SLIDE 25

Applications

Principal component analysis

◮ Let A = AT ∈ Rn×n. ◮ Goal: Compute the p dominant eigenvectors of A. ◮ Principle: Let N = diag(p, p − 1, · · · , 1) and solve

max

X T X=Ip

tr(X TAXN). The columns of X are the p dominant eigenvectors or A.

◮ A basic method: Steepest-descent on Vn,p. ◮ Let f : Rn×p → R : X → tr(X TAXN). ◮ We have 1 2grad f (X) = AXN. ◮ Thus 1 2grad f |Vn,p(X) = PTX Vn,p(AXN) = AXN − Xsym(X TAXN),

where sym(Z) := (Z + Z T)/2.

◮ Basic algorithm: Follow ˙

X = grad f |Vn,p(X).

25

slide-26
SLIDE 26

Applications

Principal component analysis

◮ Let A = AT ∈ Rn×n. ◮ Goal: Compute the p dominant eigenvectors of A. ◮ Principle: Let N = diag(p, p − 1, · · · , 1) and solve

max

X T X=Ip

tr(X TAXN). The columns of X are the p dominant eigenvectors or A.

◮ A basic method: Steepest-descent on Vn,p. ◮ Let f : Rn×p → R : X → tr(X TAXN). ◮ We have 1 2grad f (X) = AXN. ◮ Thus 1 2grad f |Vn,p(X) = PTX Vn,p(AXN) = AXN − Xsym(X TAXN),

where sym(Z) := (Z + Z T)/2.

◮ Basic algorithm: Follow ˙

X = grad f |Vn,p(X).

26

slide-27
SLIDE 27

Applications

Principal component analysis

◮ Let A = AT ∈ Rn×n. ◮ Goal: Compute the p dominant eigenvectors of A. ◮ Principle: Let N = diag(p, p − 1, · · · , 1) and solve

max

X T X=Ip

tr(X TAXN). The columns of X are the p dominant eigenvectors or A.

◮ A basic method: Steepest-descent on Vn,p. ◮ Let f : Rn×p → R : X → tr(X TAXN). ◮ We have 1 2grad f (X) = AXN. ◮ Thus 1 2grad f |Vn,p(X) = PTX Vn,p(AXN) = AXN − Xsym(X TAXN),

where sym(Z) := (Z + Z T)/2.

◮ Basic algorithm: Follow ˙

X = grad f |Vn,p(X).

27

slide-28
SLIDE 28

Applications

Principal component analysis

◮ Let A = AT ∈ Rn×n. ◮ Goal: Compute the p dominant eigenvectors of A. ◮ Principle: Let N = diag(p, p − 1, · · · , 1) and solve

max

X T X=Ip

tr(X TAXN). The columns of X are the p dominant eigenvectors or A.

◮ A basic method: Steepest-descent on Vn,p. ◮ Let f : Rn×p → R : X → tr(X TAXN). ◮ We have 1 2grad f (X) = AXN. ◮ Thus 1 2grad f |Vn,p(X) = PTX Vn,p(AXN) = AXN − Xsym(X TAXN),

where sym(Z) := (Z + Z T)/2.

◮ Basic algorithm: Follow ˙

X = grad f |Vn,p(X).

28

slide-29
SLIDE 29

Applications

Computing Lyapunov exponents: a method on the Stiefel manifold

◮ Ref: T. Bridges and S. Reich, Computing Lyapunov exponents on a

Stiefel manifold, Physica D 156, pp. 219–238, 2001.

◮ Dynamical system: ˙

x = f (x).

◮ Nominal trajectory: x∗(t). ◮ Goal: Describe the behavior of nearby trajectories.

29

slide-30
SLIDE 30

Applications

Computing Lyapunov exponents: a method on the Stiefel manifold

◮ Dynamical system: ˙

x = f (x).

◮ Nominal trajectory: x∗(t). ◮ Goal: Describe the behavior of nearby trajectories.

30

slide-31
SLIDE 31

Applications

Computing Lyapunov exponents: a method on the Stiefel manifold

◮ Dynamical system: ˙

x = f (x).

◮ Nominal trajectory: x∗(t). ◮ Goal: Describe the behavior of nearby trajectories. ◮ Principle: Consider the evolution of an infinitesimal ball of perturbed

initial conditions. The ball becomes distorted into an infinitesimal

  • ellipsoid. If the length δk(t) of the kth principal axis evolves as

δk(t) ≈ δk(0)eλkt, then λk is the kth Lyapunov exponent of the system along the nominal trajectory. The mean Lyapunov exponents are given by λk = lim

t→∞

1 t δk(t) δk(0).

31

slide-32
SLIDE 32

Applications

Computing Lyapunov exponents: a method on the Stiefel manifold

◮ Dynamical system: ˙

x = f (x).

◮ Nominal trajectory: x∗(t). ◮ Goal: Describe the behavior of nearby trajectories. ◮ Principle: δk(t) ≈ δk(0)eλkt. ◮ Challenge 1: Compute just a few Lyapunov exponents ❀ work with

p-frames (noncompact Stiefel manifold).

◮ Perturbed system:

˙ Z = A(t)Z, Z ∈ Rn×p, A(t) := Df (x∗(t)). (1)

32

slide-33
SLIDE 33

Applications

Computing Lyapunov exponents: a method on the Stiefel manifold

◮ Dynamical system: ˙

x = f (x).

◮ Nominal trajectory: x∗(t). ◮ Goal: Describe the behavior of nearby trajectories. ◮ Principle: δk(t) ≈ δk(0)eλkt. ◮ Challenge 1: Compute just a few Lyapunov exponents ❀ work with

p-frames (noncompact Stiefel manifold).

◮ Perturbed system:

˙ Z = A(t)Z, Z ∈ Rn×p, A(t) := Df (x∗(t)). (1)

33

slide-34
SLIDE 34

Applications

Computing Lyapunov exponents: a method on the Stiefel manifold

◮ Dynamical system: ˙

x = f (x).

◮ Nominal trajectory: x∗(t). ◮ Goal: Describe the behavior of nearby trajectories. ◮ Principle: δk(t) ≈ δk(0)eλkt. ◮ Challenge 1: Compute just a few Lyapunov exponents ❀ work with

p-frames (noncompact Stiefel manifold).

◮ Perturbed system:

˙ Z = A(t)Z, Z ∈ Rn×p, A(t) := Df (x∗(t)). (1)

34

slide-35
SLIDE 35

Applications

Computing Lyapunov exponents: a method on the Stiefel manifold

◮ Dynamical system: ˙

x = f (x).

◮ Nominal trajectory: x∗(t). ◮ Goal: Describe the behavior of nearby trajectories. ◮ Principle: δk(t) ≈ δk(0)eλkt. ◮ Perturbed system:

˙ Z = A(t)Z, Z ∈ Rn×p, A(t) := Df (x∗(t)). (1)

◮ Challenge 2: Perform continuous orthogonalization to prevent the

columns of Z from converging to 1st Lyapunov vector ❀ Vn,p.

◮ Method: Follow the evolution of Q(t) in the thin QR decomposition

Z(t) = Q(t)R(t).

35

slide-36
SLIDE 36

Applications

Computing Lyapunov exponents: a method on the Stiefel manifold

◮ Dynamical system: ˙

x = f (x).

◮ Nominal trajectory: x∗(t). ◮ Goal: Describe the behavior of nearby trajectories. ◮ Principle: δk(t) ≈ δk(0)eλkt. ◮ Perturbed system:

˙ Z = A(t)Z, Z ∈ Rn×p, A(t) := Df (x∗(t)). (1)

◮ Challenge 2: Perform continuous orthogonalization to prevent the

columns of Z from converging to 1st Lyapunov vector ❀ Vn,p.

◮ Method: Follow the evolution of Q(t) in the thin QR decomposition

Z(t) = Q(t)R(t).

36

slide-37
SLIDE 37

Applications

Computing Lyapunov exponents: details of method on Stiefel

◮ Perturbed system: ˙

Z = A(t)Z, Z ∈ Rn×p, A(t) := Df (x∗(t)).

◮ Principle: track Q(t) in Z(t) = Q(t)R(t). ◮ We have the formula

˙ Q = (I − QQT) ˙ ZR−1 + QS, Si,j =      (QT ˙ ZR−1)i,j, i > j i = j −(QT ˙ ZR−1)j,i i < j.

◮ Hence

˙ Q = (I − QQT)A(t)Q + QS, Si,j =      (QTA(t)Q)i,j, i > j i = j −(QTA(t)Q)j,i i < j.

◮ This is a dynamical system on the Stiefel manifold Vn,p. It can be

rewritten as ˙ Q = A(t)Q − QT, where T is upper triangular.

◮ For almost all initial Q(0), the p columns of Q(t) converge to the p

leading Lyapunov vectors for the given trajectory of the given

37

slide-38
SLIDE 38

Applications

Computing Lyapunov exponents: details of method on Stiefel

◮ Perturbed system: ˙

Z = A(t)Z, Z ∈ Rn×p, A(t) := Df (x∗(t)).

◮ Principle: track Q(t) in Z(t) = Q(t)R(t). ◮ We have the formula

˙ Q = (I − QQT) ˙ ZR−1 + QS, Si,j =      (QT ˙ ZR−1)i,j, i > j i = j −(QT ˙ ZR−1)j,i i < j.

◮ Hence

˙ Q = (I − QQT)A(t)Q + QS, Si,j =      (QTA(t)Q)i,j, i > j i = j −(QTA(t)Q)j,i i < j.

◮ This is a dynamical system on the Stiefel manifold Vn,p. It can be

rewritten as ˙ Q = A(t)Q − QT, where T is upper triangular.

◮ For almost all initial Q(0), the p columns of Q(t) converge to the p

leading Lyapunov vectors for the given trajectory of the given

38

slide-39
SLIDE 39

Applications

Computing Lyapunov exponents: details of method on Stiefel

◮ Perturbed system: ˙

Z = A(t)Z, Z ∈ Rn×p, A(t) := Df (x∗(t)).

◮ Principle: track Q(t) in Z(t) = Q(t)R(t). ◮ We have the formula

˙ Q = (I − QQT) ˙ ZR−1 + QS, Si,j =      (QT ˙ ZR−1)i,j, i > j i = j −(QT ˙ ZR−1)j,i i < j.

◮ Hence

˙ Q = (I − QQT)A(t)Q + QS, Si,j =      (QTA(t)Q)i,j, i > j i = j −(QTA(t)Q)j,i i < j.

◮ This is a dynamical system on the Stiefel manifold Vn,p. It can be

rewritten as ˙ Q = A(t)Q − QT, where T is upper triangular.

◮ For almost all initial Q(0), the p columns of Q(t) converge to the p

leading Lyapunov vectors for the given trajectory of the given

39

slide-40
SLIDE 40

Applications

Computing Lyapunov exponents: details of method on Stiefel

◮ Perturbed system: ˙

Z = A(t)Z, Z ∈ Rn×p, A(t) := Df (x∗(t)).

◮ Principle: track Q(t) in Z(t) = Q(t)R(t). ◮ Hence

˙ Q = (I − QQT)A(t)Q + QS, Si,j =      (QTA(t)Q)i,j, i > j i = j −(QTA(t)Q)j,i i < j.

◮ This is a dynamical system on the Stiefel manifold Vn,p. It can be

rewritten as ˙ Q = A(t)Q − QT, where T is upper triangular.

◮ For almost all initial Q(0), the p columns of Q(t) converge to the p

leading Lyapunov vectors for the given trajectory of the given

  • system. The corresponding Lyapunov exponents can be computed as

λj = lim

t→∞

1 t t Tjj(s)ds, j = 1, . . . , p.

40

slide-41
SLIDE 41

Applications

Computing Lyapunov exponents: details of method on Stiefel

◮ Perturbed system: ˙

Z = A(t)Z, Z ∈ Rn×p, A(t) := Df (x∗(t)).

◮ Principle: track Q(t) in Z(t) = Q(t)R(t). ◮ Hence

˙ Q = (I − QQT)A(t)Q + QS, Si,j =      (QTA(t)Q)i,j, i > j i = j −(QTA(t)Q)j,i i < j.

◮ This is a dynamical system on the Stiefel manifold Vn,p. It can be

rewritten as ˙ Q = A(t)Q − QT, where T is upper triangular.

◮ For almost all initial Q(0), the p columns of Q(t) converge to the p

leading Lyapunov vectors for the given trajectory of the given

  • system. The corresponding Lyapunov exponents can be computed as

λj = lim

t→∞

1 t t Tjj(s)ds, j = 1, . . . , p.

41

slide-42
SLIDE 42

Applications

Procrustes problem on the Stiefel manifold

◮ Ref: Lars Eld´

en and Haesun Park, A Procrustes problem on the Stiefel manifold, Numer. Math. (1999) 82: 599–619.

◮ Orthogonal Procrustes problem:

given A ∈ Rm×n, B ∈ Rm×p. find Q ∈ Rn×p that solves min

QT Q=Ip

AQ − B2

F. ◮ First-order optimality condition `

a la manifold: Consider f : Rn×p → R : Q → AQ − B2

  • F. We have

f (Q) = tr(BTB) + tr(QTATAQ) − 2tr(QTATB) Df (Q)[ ˙ Q] = −2tr( ˙ QTATB) + 2tr( ˙ QTATAQ), grad f (Q) = −2AT(B − AQ), grad f |Vn,p(Q) = grad f (Q) − Qsym(QTgrad f (Q)), where sym(A) := 1

2(A + AT). ◮ Case p = n: Then f |Vn,p(Q) = cst − 2tr(QTATB), and the solution

is given by ATB = UΣV T, Q = UV T

42

slide-43
SLIDE 43

Applications

Procrustes problem on the Stiefel manifold

◮ Orthogonal Procrustes problem:

given A ∈ Rm×n, B ∈ Rm×p. find Q ∈ Rn×p that solves min

QT Q=Ip

AQ − B2

F. ◮ First-order optimality condition `

a la manifold: Consider f : Rn×p → R : Q → AQ − B2

  • F. We have

f (Q) = tr(BTB) + tr(QTATAQ) − 2tr(QTATB) Df (Q)[ ˙ Q] = −2tr( ˙ QTATB) + 2tr( ˙ QTATAQ), grad f (Q) = −2AT(B − AQ), grad f |Vn,p(Q) = grad f (Q) − Qsym(QTgrad f (Q)), where sym(A) := 1

2(A + AT). ◮ Case p = n: Then f |Vn,p(Q) = cst − 2tr(QTATB), and the solution

is given by ATB = UΣV T, Q = UV T if ATB is invertible.

43

slide-44
SLIDE 44

Applications

Procrustes problem on the Stiefel manifold

◮ Orthogonal Procrustes problem:

given A ∈ Rm×n, B ∈ Rm×p. find Q ∈ Rn×p that solves min

QT Q=Ip

AQ − B2

F. ◮ Applications: factor analysis, used notably in psychometrics. ◮ First-order optimality condition `

a la manifold: Consider f : Rn×p → R : Q → AQ − B2

  • F. We have

f (Q) = tr(BTB) + tr(QTATAQ) − 2tr(QTATB) Df (Q)[ ˙ Q] = −2tr( ˙ QTATB) + 2tr( ˙ QTATAQ), grad f (Q) = −2AT(B − AQ), grad f |Vn,p(Q) = grad f (Q) − Qsym(QTgrad f (Q)), where sym(A) := 1

2(A + AT). ◮ Case p = n: Then f |Vn,p(Q) = cst − 2tr(QTATB), and the solution

is given by ATB = UΣV T, Q = UV T if ATB is invertible.

44

slide-45
SLIDE 45

Applications

Procrustes problem on the Stiefel manifold

◮ Orthogonal Procrustes problem:

min

QT Q=Ip

AQ − B2

F. ◮ First-order optimality condition `

a la manifold: Consider f : Rn×p → R : Q → AQ − B2

  • F. We have

f (Q) = tr(BTB) + tr(QTATAQ) − 2tr(QTATB) Df (Q)[ ˙ Q] = −2tr( ˙ QTATB) + 2tr( ˙ QTATAQ), grad f (Q) = −2AT(B − AQ), grad f |Vn,p(Q) = grad f (Q) − Qsym(QTgrad f (Q)), where sym(A) := 1

2(A + AT). ◮ Case p = n: Then f |Vn,p(Q) = cst − 2tr(QTATB), and the solution

is given by ATB = UΣV T, Q = UV T if ATB is invertible.

45

slide-46
SLIDE 46

Applications

Procrustes problem on the Stiefel manifold

◮ Orthogonal Procrustes problem:

min

QT Q=Ip

AQ − B2

F. ◮ First-order optimality condition `

a la manifold: Consider f : Rn×p → R : Q → AQ − B2

  • F. We have

f (Q) = tr(BTB) + tr(QTATAQ) − 2tr(QTATB) Df (Q)[ ˙ Q] = −2tr( ˙ QTATB) + 2tr( ˙ QTATAQ), grad f (Q) = −2AT(B − AQ), grad f |Vn,p(Q) = grad f (Q) − Qsym(QTgrad f (Q)), where sym(A) := 1

2(A + AT). ◮ Case p = n: Then f |Vn,p(Q) = cst − 2tr(QTATB), and the solution

is given by ATB = UΣV T, Q = UV T if ATB is invertible.

46

slide-47
SLIDE 47

Applications

Joint diagonalization on the Stiefel manifold

◮ Measurements X =

   x1(t1) x1(t2) · · · x1(tf ) . . . . . . ... . . . xn(t1) xn(t2) · · · xn(tf )   .

◮ Goal: Find a matrix W ∈ Rn×p such that the rows of

Y = W TX look as statistically independent as possible.

◮ Decompose W = UΣV T. We have

Y = V T ΣUTX

=: ˜ X

.

◮ Whitening: Choose Σ and U such that ˜

X ˜ X T = In. Then YY T = V T ˜ X ˜ X TV = V TV = Ip.

◮ Independence and dimension reduction: Consider a collection of

covariance-like matrix functions Ci(Y ) such that Ci(Y ) = V TCi(˜ X)V . Choose V to make the Ci(Y )’s as diagonal as possible.

◮ Principle: Solve

47

slide-48
SLIDE 48

Applications

Joint diagonalization on the Stiefel manifold

◮ Measurements X =

   x1(t1) x1(t2) · · · x1(tf ) . . . . . . ... . . . xn(t1) xn(t2) · · · xn(tf )   .

◮ Goal: Find a matrix W ∈ Rn×p such that the rows of

Y = W TX look as statistically independent as possible.

◮ Decompose W = UΣV T. We have

Y = V T ΣUTX

=: ˜ X

.

◮ Whitening: Choose Σ and U such that ˜

X ˜ X T = In. Then YY T = V T ˜ X ˜ X TV = V TV = Ip.

◮ Independence and dimension reduction: Consider a collection of

covariance-like matrix functions Ci(Y ) such that Ci(Y ) = V TCi(˜ X)V . Choose V to make the Ci(Y )’s as diagonal as possible.

◮ Principle: Solve

48

slide-49
SLIDE 49

Applications

Joint diagonalization on the Stiefel manifold

◮ Measurements X =

   x1(t1) x1(t2) · · · x1(tf ) . . . . . . ... . . . xn(t1) xn(t2) · · · xn(tf )   .

◮ Goal: Find a matrix W ∈ Rn×p such that the rows of

Y = W TX look as statistically independent as possible.

◮ Decompose W = UΣV T. We have

Y = V T ΣUTX

=: ˜ X

.

◮ Whitening: Choose Σ and U such that ˜

X ˜ X T = In. Then YY T = V T ˜ X ˜ X TV = V TV = Ip.

◮ Independence and dimension reduction: Consider a collection of

covariance-like matrix functions Ci(Y ) such that Ci(Y ) = V TCi(˜ X)V . Choose V to make the Ci(Y )’s as diagonal as possible.

◮ Principle: Solve

49

slide-50
SLIDE 50

Applications

Joint diagonalization on the Stiefel manifold

◮ Goal: Find a matrix W ∈ Rn×p such that the rows of

Y = W TX look as statistically independent as possible.

◮ Decompose W = UΣV T. We have

Y = V T ΣUTX

=: ˜ X

.

◮ Whitening: Choose Σ and U such that ˜

X ˜ X T = In. Then YY T = V T ˜ X ˜ X TV = V TV = Ip.

◮ Independence and dimension reduction: Consider a collection of

covariance-like matrix functions Ci(Y ) such that Ci(Y ) = V TCi(˜ X)V . Choose V to make the Ci(Y )’s as diagonal as possible.

◮ Principle: Solve

max

V T V =Ip N

  • i=1

diag(V TCi(˜ X)V )2

F.

50

slide-51
SLIDE 51

Applications

Joint diagonalization on the Stiefel manifold

◮ Goal: Find a matrix W ∈ Rn×p such that the rows of

Y = W TX look as statistically independent as possible.

◮ Decompose W = UΣV T. We have

Y = V T ΣUTX

=: ˜ X

.

◮ Whitening: Choose Σ and U such that ˜

X ˜ X T = In. Then YY T = V T ˜ X ˜ X TV = V TV = Ip.

◮ Independence and dimension reduction: Consider a collection of

covariance-like matrix functions Ci(Y ) such that Ci(Y ) = V TCi(˜ X)V . Choose V to make the Ci(Y )’s as diagonal as possible.

◮ Principle: Solve

max

V T V =Ip N

  • i=1

diag(V TCi(˜ X)V )2

F.

51

slide-52
SLIDE 52

Applications

Joint diagonalization on the Stiefel manifold

◮ Goal: Find a matrix W ∈ Rn×p such that the rows of

Y = W TX look as statistically independent as possible.

◮ Decompose W = UΣV T. We have

Y = V T ΣUTX

=: ˜ X

.

◮ Whitening: Choose Σ and U such that ˜

X ˜ X T = In. Then YY T = V T ˜ X ˜ X TV = V TV = Ip.

◮ Independence and dimension reduction: Consider a collection of

covariance-like matrix functions Ci(Y ) such that Ci(Y ) = V TCi(˜ X)V . Choose V to make the Ci(Y )’s as diagonal as possible.

◮ Principle: Solve

max

V T V =Ip N

  • i=1

diag(V TCi(˜ X)V )2

F.

52

slide-53
SLIDE 53

Applications

Joint diagonalization on the Stiefel manifold: application

The application is blind source separation. Two mixed pictures are given as input to a blind source separation algorithm based on a trust-region method on V2,2.

53

slide-54
SLIDE 54

Applications

Joint diagonalization on the Stiefel manifold: application: input

54

slide-55
SLIDE 55

Applications

Joint diagonalization on the Stiefel manifold: application: output

55

slide-56
SLIDE 56

Applications

Some References

◮ E. Stiefel, Richtungsfelder und Fernparallelismus in n-dimensionalen

Mannigfaltigkeiten, Comment. Math. Helv. 8, no. 1, 305–353, 1935.

◮ U. Helmke, J. B. Moore, Optimization and Dynamical Systems, Springer, 1994. ◮ A. Edelman, T. Arias, S. T. Smith, The geometry of algorithms with

  • rthogonality constraints, SIAM J. Matrix Anal. Appl. 20(2), pp. 303–353, 1998.

◮ L. Eld´

en, H. Park, A Procrustes problem on the Stiefel manifold, Numer. Math. 82: 599–619, 1999.

◮ T. Bridges, S. Reich, Computing Lyapunov exponents on a Stiefel manifold,

Physica D 156, pp. 219–238, 2001.

◮ Y. Nishimori, S. Akaho, Learning algorithms utilizing quasi-geodesic flows on the

Stiefel manifold, Neurocomputing 67: 106–135, 2005.

◮ A. M. Bloch, P. E. Crouch, A. K. Sanyal, A variational problem on Stiefel

manifolds, Nonlinearity 19, pp. 2247–2276, 2006.

◮ F. Tompkins, P. J. Wolfe, Bayesian Filtering on the Stiefel Manifold, 2nd IEEE

International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, 2007. CAMPSAP 2007.

◮ P.-A. Absil, R. Mahony, R. Sepulchre, Optimization Algorithms on Matrix

Manifolds, Princeton University Press, 2008.

◮ F. J. Theis, T. P. Cason, P.-A. Absil, Soft Dimension Reduction for ICA by Joint

Diagonalization on the Stiefel Manifold, Proc. ICA 2009.

◮ ...

56

slide-57
SLIDE 57

Applications 57