[PPT] - Structure-Preserving Method for Dimension Reduction Ewa Nowakowska PowerPoint Presentation

SLIDE 1

Introduction The method Summary

Structure-Preserving Method for Dimension Reduction

Ewa Nowakowska

Institute of Computer Science, Polish Academy of Sciences

Joint Statistical Meetings, San Diego, CA 29 July 2012

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 2

Introduction The method Summary

Outline

1

Introduction Model and notation Basic definitions Concept

2

The method Isotropic transformation Weighting Algorithm

3

Summary

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 3

Introduction The method Summary

Outline

1

Introduction Model and notation Basic definitions Concept

2

The method Isotropic transformation Weighting Algorithm

3

Summary

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 4

Introduction The method Summary Model and notation

Model and notation

data: X = (x1, . . . , xn)T, X ∈ Rn×d model: f(x) = π1f1(µ1, Σ1)(x) + . . . + πkfk(µk, Σk)(x), where fl(µl, Σl)(x) = 1 ( √ 2π)d√det Σl e− 1

2 (x−µl)T Σ−1 l

(x−µl).

additional assumptions:

equal mixing factors π1 = · · · = πk = 1

k

heterogeneity Σl1 = Σl2 large space dimension d > k − 1 large sample size n ≫ d number of components k known

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 5

Introduction The method Summary Model and notation

Model and notation

data: X = (x1, . . . , xn)T, X ∈ Rn×d model: f(x) = π1f1(µ1, Σ1)(x) + . . . + πkfk(µk, Σk)(x), where fl(µl, Σl)(x) = 1 ( √ 2π)d√det Σl e− 1

2(x−µl)T Σ−1 l

(x−µl).

additional assumptions:

equal mixing factors π1 = · · · = πk = 1

k

heterogeneity Σl1 = Σl2 large space dimension d > k − 1 large sample size n ≫ d number of components k known

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 6

Introduction The method Summary Model and notation

Model and notation

data: X = (x1, . . . , xn)T, X ∈ Rn×d model: f(x) = π1f1(µ1, Σ1)(x) + . . . + πkfk(µk, Σk)(x), where fl(µl, Σl)(x) = 1 ( √ 2π)d√det Σl e− 1

2(x−µl)T Σ−1 l

(x−µl).

additional assumptions:

equal mixing factors π1 = · · · = πk = 1

k

heterogeneity Σl1 = Σl2 large space dimension d > k − 1 large sample size n ≫ d number of components k known

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 7

Introduction The method Summary Basic definitions

Basic facts and definitions

Let µX and ΣX be the empirical estimates of µ and Σ.

Definition (Scatter decomposition)

Let TX = nΣX. Then TX = WX + BX constitutes the total scatter decomposition to its between and within cluster component.

Definition (Isotropic position)

We say that data is in isotropic position if µX = 0 and TX = I.

Definition (Principal component subspace PC(k − 1))

By principal component subspace PC(k − 1) we understand the subspace spanned by the first k − 1 principal components.

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 8

Introduction The method Summary Basic definitions

Basic facts and definitions

Let µX and ΣX be the empirical estimates of µ and Σ.

Definition (Scatter decomposition)

Let TX = nΣX. Then TX = WX + BX constitutes the total scatter decomposition to its between and within cluster component.

Definition (Isotropic position)

We say that data is in isotropic position if µX = 0 and TX = I.

Definition (Principal component subspace PC(k − 1))

By principal component subspace PC(k − 1) we understand the subspace spanned by the first k − 1 principal components.

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 9

Introduction The method Summary Basic definitions

Basic facts and definitions

Let µX and ΣX be the empirical estimates of µ and Σ.

Definition (Scatter decomposition)

Let TX = nΣX. Then TX = WX + BX constitutes the total scatter decomposition to its between and within cluster component.

Definition (Isotropic position)

We say that data is in isotropic position if µX = 0 and TX = I.

Definition (Principal component subspace PC(k − 1))

By principal component subspace PC(k − 1) we understand the subspace spanned by the first k − 1 principal components.

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 10

Introduction The method Summary Basic definitions

Basic facts and definitions

Let µX and ΣX be the empirical estimates of µ and Σ.

Definition (Scatter decomposition)

Let TX = nΣX. Then TX = WX + BX constitutes the total scatter decomposition to its between and within cluster component.

Definition (Isotropic position)

We say that data is in isotropic position if µX = 0 and TX = I.

Definition (Principal component subspace PC(k − 1))

By principal component subspace PC(k − 1) we understand the subspace spanned by the first k − 1 principal components.

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 11

Introduction The method Summary Basic definitions

Basic facts and definitions

Definition (Fisher’s subspace S∗)

We define Fisher’s subspace (Fisher’s discriminant) as S∗ = argmax

S⊂Rd dim(S)=k−1

k−1

j=1 vT j BXvj

k−1

j=1 vT j TXvj

, where v1, . . . , vk−1 is the orthonormal basis for S. Equivalently, S∗ – solution to an eigenproblem with T −1

X BX.

Definition (Structure distinctness coefficient ¯ λX)

¯ λX = 1 k − 1

k−1

j=1

λ

T −1

X

BX j

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 12

Introduction The method Summary Basic definitions

Basic facts and definitions

Definition (Fisher’s subspace S∗)

We define Fisher’s subspace (Fisher’s discriminant) as S∗ = argmax

S⊂Rd dim(S)=k−1

k−1

j=1 vT j BXvj

k−1

j=1 vT j TXvj

, where v1, . . . , vk−1 is the orthonormal basis for S. Equivalently, S∗ – solution to an eigenproblem with T −1

X BX.

Definition (Structure distinctness coefficient ¯ λX)

¯ λX = 1 k − 1

k−1

j=1

λ

T −1

X

BX j

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 13

Introduction The method Summary Basic definitions

Basic facts and definitions

Definition (Fisher’s subspace S∗)

We define Fisher’s subspace (Fisher’s discriminant) as S∗ = argmax

S⊂Rd dim(S)=k−1

k−1

j=1 vT j BXvj

k−1

j=1 vT j TXvj

, where v1, . . . , vk−1 is the orthonormal basis for S. Equivalently, S∗ – solution to an eigenproblem with T −1

X BX.

Definition (Structure distinctness coefficient ¯ λX)

¯ λX = 1 k − 1

k−1

j=1

λ

T −1

X

BX j

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 14

Introduction The method Summary Concept

Concept

Inspired by:

S. Brubaker, S. Vempala, Isotropic PCA and Affine-Invariant

Clustering, Proceedings of the 49th Annual IEEE Symposium

n Foundations of Computer Science, pp. 551–560, 2008.
●
●
●
●
●
●
●
−10

−5 5 10 −10 −5 5 10

●
●
●
●
●
●
−3

−2 −1 1 2 3 −3 −2 −1 1 2 3

−1.5

−1.0 −0.5 0.0 0.5 1.0 1.5 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

riginal data −

→ isotropic data − → weighted data

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 15

Introduction The method Summary

Outline

1

Introduction Model and notation Basic definitions Concept

2

The method Isotropic transformation Weighting Algorithm

3

Summary

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 16

Introduction The method Summary Isotropic transformation

Isotropic transformation (IT)

centering:

mean subtraction: X0 = (x1 − µX, . . . , xn − µX)T

decorrelation:

spectral decomposition: TX0 = ATX0 LTX0 AT

TX0

simple manipulation:

X0ATX0 L

− 1

2

TX0

T X0ATX0 L

− 1

2

TX0

= I

isotropic transformation: Y = X0ATX0 L

− 1

2

TX0

Lemma (Eigenvalues preservation)

IT does not affect eigenvalues for the Fisher’s task λX

j = λY j .

Corollary (Distinctness preservation)

IT does not change structure distinctness ¯ λX = ¯ λY.

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 17

Introduction The method Summary Isotropic transformation

Isotropic transformation (IT)

centering:

mean subtraction: X0 = (x1 − µX, . . . , xn − µX)T

decorrelation:

spectral decomposition: TX0 = ATX0 LTX0AT

TX0

simple manipulation:

X0ATX0 L

− 1

2

TX0

T X0ATX0 L

− 1

2

TX0

= I

isotropic transformation: Y = X0ATX0 L

− 1

2

TX0

Lemma (Eigenvalues preservation)

IT does not affect eigenvalues for the Fisher’s task λX

j = λY j .

Corollary (Distinctness preservation)

IT does not change structure distinctness ¯ λX = ¯ λY.

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 18

Introduction The method Summary Isotropic transformation

Isotropic transformation (IT)

centering:

mean subtraction: X0 = (x1 − µX, . . . , xn − µX)T

decorrelation:

spectral decomposition: TX0 = ATX0 LTX0AT

TX0

simple manipulation:

X0ATX0 L

− 1

2

TX0

T X0ATX0 L

− 1

2

TX0

= I

isotropic transformation: Y = X0ATX0 L

− 1

2

TX0

Lemma (Eigenvalues preservation)

IT does not affect eigenvalues for the Fisher’s task λX

j = λY j .

Corollary (Distinctness preservation)

IT does not change structure distinctness ¯ λX = ¯ λY.

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 19

Introduction The method Summary Isotropic transformation

Isotropic transformation (IT)

centering:

mean subtraction: X0 = (x1 − µX, . . . , xn − µX)T

decorrelation:

spectral decomposition: TX0 = ATX0 LTX0AT

TX0

simple manipulation:

X0ATX0 L

− 1

2

TX0

T X0ATX0 L

− 1

2

TX0

= I

isotropic transformation: Y = X0ATX0 L

− 1

2

TX0

Lemma (Eigenvalues preservation)

IT does not affect eigenvalues for the Fisher’s task λX

j = λY j .

Corollary (Distinctness preservation)

IT does not change structure distinctness ¯ λX = ¯ λY.

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 20

Introduction The method Summary Weighting

Weighting - requirements and the function

Differentiate variability across the directions Reduce variability in all the directions but the ones determined by the cluster centers Bring principal components close to the directions of best between-cluster discrimination Introduce only little distortion to the structure Relocate the extreme observations only leaving the core of the structure almost untouched Weighting: ωi =

1

1+ 1

α yi2 and Z = diag(ω1, . . . , ωn)Y. Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 21

Introduction The method Summary Weighting

Weighting - requirements and the function

Differentiate variability across the directions Reduce variability in all the directions but the ones determined by the cluster centers Bring principal components close to the directions of best between-cluster discrimination Introduce only little distortion to the structure Relocate the extreme observations only leaving the core of the structure almost untouched Weighting: ωi =

1

1+ 1

α yi2 and Z = diag(ω1, . . . , ωn)Y. Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 22

Introduction The method Summary Weighting

Weighting - requirements and the function

Differentiate variability across the directions Reduce variability in all the directions but the ones determined by the cluster centers Bring principal components close to the directions of best between-cluster discrimination Introduce only little distortion to the structure Relocate the extreme observations only leaving the core of the structure almost untouched Weighting: ωi =

1

1+ 1

α yi2 and Z = diag(ω1, . . . , ωn)Y. Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 23

Introduction The method Summary Weighting

Weighting – structure distinctness

Theorem (Structure distinctness preservation)

In agreement with the previous notation and assumptions

¯

λZ − ¯ λX

≤

1 √n d α ¯ λX + √ k

+ r1

1 n

,

where r1 1

n

denotes a remainder of the first order of 1/n.

Proof (Idea).

show smallness of weights’ variance → translate it into small perturbation of structure distinctness

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 24

Introduction The method Summary Weighting

Weighting – structure distinctness

Theorem (Structure distinctness preservation)

In agreement with the previous notation and assumptions

¯

λZ − ¯ λX

≤

1 √n d α ¯ λX + √ k

+ r1

1 n

,

where r1 1

n

denotes a remainder of the first order of 1/n.

Proof (Idea).

show smallness of weights’ variance → translate it into small perturbation of structure distinctness

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 25

Introduction The method Summary Weighting

Weighting – dissimilarity between the subspaces (ssd)

ssd(PC(k − 1), S∗) = 1 k − 1

k−1

l=1

L2(l, l), L a matrix of canonical correlations between PC(k − 1) and S∗.

0.4 ¡ 0.5 ¡ 0.6 ¡ 0.7 ¡ 0.8 ¡ 0.9 ¡ 1 ¡

Figure: Average canonical correlations for original and transformed data, d = 7.

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 26

Introduction The method Summary Algorithm

Dimension reduction algorithm

Algorithm 2.1: DISTPRESERVINGDIMREDUCTION(X) Step 1: Isotropic transformation X0 ← FX; TX0 ← ATX0LTX0AT

TX0; Y ← X0ATX0L − 1

2

TX0

Step 2: Weighting α ← 0.5; ωi =

1

1+ 1

α yi2 ; Z ← diag(ω)Y; Z0 ← FZ

Step 3: Dimension reduction

1 nTZ0 ← ATZ0GTZ0AT TZ0; R ← (A(k−1) TZ0

)TZ T return (R)

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 27

Introduction The method Summary Algorithm

Method’s performance - simulation example

●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
ORIGINAL DATA, PCA(k−1)

−10 −5 5 10 −10 −5 5 10

●
●
●
●
●
●
●
●
●
●
●
●
ORIGINAL DATA, S*

−4 −2 2 4 −10 −5 5 10

ISOTROPIC DATA, NO PC(k−1) PLOTTING

−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0

●
●
●
●
ISOTROPIC DATA, S*

−2 −1 1 2 3 4 −4 −3 −2 −1 1 2 3

●
●
●
●
●
WEIGHTED DATA, PC(k−1)

−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0

●
●
●
●
●
●
●
●
WEIGHTED DATA, S*

−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0

riginal

weighted dist 0.01 1.00 diss 0.48 0.51

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 28

Introduction The method Summary

Outline

1

Introduction Model and notation Basic definitions Concept

2

The method Isotropic transformation Weighting Algorithm

3

Summary

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 29

Introduction The method Summary

Summary

Data transformation consists of two steps - isotropic transformation and weighting It preserves distinctness of the original structure (defined in terms of variance in the Fisher’s subspace) with only negligible error It brings principal component subspace PC(k − 1) close to the Fisher’s subspace S∗ if sample is large enough For transformed data Z, projection on PC(k − 1) is similar to projection on S∗ but does not require the knowledge of classes Facilitates further analysis of the unknown structure in the subspace of reduced dimension

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 30

Introduction The method Summary

Summary

Data transformation consists of two steps - isotropic transformation and weighting It preserves distinctness of the original structure (defined in terms of variance in the Fisher’s subspace) with only negligible error It brings principal component subspace PC(k − 1) close to the Fisher’s subspace S∗ if sample is large enough For transformed data Z, projection on PC(k − 1) is similar to projection on S∗ but does not require the knowledge of classes Facilitates further analysis of the unknown structure in the subspace of reduced dimension

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 31

Introduction The method Summary

Summary

Data transformation consists of two steps - isotropic transformation and weighting It preserves distinctness of the original structure (defined in terms of variance in the Fisher’s subspace) with only negligible error It brings principal component subspace PC(k − 1) close to the Fisher’s subspace S∗ if sample is large enough For transformed data Z, projection on PC(k − 1) is similar to projection on S∗ but does not require the knowledge of classes Facilitates further analysis of the unknown structure in the subspace of reduced dimension

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 32

Introduction The method Summary

Summary

Data transformation consists of two steps - isotropic transformation and weighting It preserves distinctness of the original structure (defined in terms of variance in the Fisher’s subspace) with only negligible error It brings principal component subspace PC(k − 1) close to the Fisher’s subspace S∗ if sample is large enough For transformed data Z, projection on PC(k − 1) is similar to projection on S∗ but does not require the knowledge of classes Facilitates further analysis of the unknown structure in the subspace of reduced dimension

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 33

Introduction The method Summary

Summary

Data transformation consists of two steps - isotropic transformation and weighting It preserves distinctness of the original structure (defined in terms of variance in the Fisher’s subspace) with only negligible error It brings principal component subspace PC(k − 1) close to the Fisher’s subspace S∗ if sample is large enough For transformed data Z, projection on PC(k − 1) is similar to projection on S∗ but does not require the knowledge of classes Facilitates further analysis of the unknown structure in the subspace of reduced dimension

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction

SLIDE 34

Introduction The method Summary

Thank you for your attention!

ewa.nowakowska@ipipan.waw.pl

Research funded by National Science Center of Poland DEC-2011/01/N/ST6/04174

Ewa Nowakowska Institute of Computer Science, Polish Academy of Sciences Structure-Preserving Method for Dimension Reduction