Some improvements of the SIR method for the estimation of Mars - - PowerPoint PPT Presentation

some improvements of the sir method for the estimation of
SMART_READER_LITE
LIVE PREVIEW

Some improvements of the SIR method for the estimation of Mars - - PowerPoint PPT Presentation

Some improvements of the SIR method for the estimation of Mars physical properties from hyperspectral images Stphane Girard Mistis team, INRIA Grenoble Rhne-Alpes. http ://mistis.inrialpes.fr/ girard Joint work with CQFD team, INRIA


slide-1
SLIDE 1

1

Some improvements of the SIR method for the estimation of Mars physical properties from hyperspectral images

Stéphane Girard

Mistis team, INRIA Grenoble Rhône-Alpes. http ://mistis.inrialpes.fr/˜girard Joint work with CQFD team, INRIA Bordeaux Sud-Ouest and Laboratoire de Planétologie de Grenoble.

slide-2
SLIDE 2

2

Outline

1

Sliced Inverse Regression (SIR)

2

Regularization of SIR

3

SIR for data streams

4

Application to real data

slide-3
SLIDE 3

3

Outline

1

Sliced Inverse Regression (SIR)

2

Regularization of SIR

3

SIR for data streams

4

Application to real data

slide-4
SLIDE 4

4

Multivariate regression

Let Y ∈ R and X ∈ Rp. The goal is to estimate G : Rp → R such that Y = G(X) + ξ where ξ is independent of X. Unrealistic when p is large (curse of dimensionality). Dimension reduction : Replace X by its projection on a subspace of lower dimension without loss of information on the distribution of Y given X. Central subspace : smallest subspace S such that, conditionally on the projection of X on S, Y and X are independent.

slide-5
SLIDE 5

5

Dimension reduction

Assume (for the sake of simplicity) that dim(S) = 1 i.e. S =span(b), with b ∈ Rp = ⇒ Single index model : Y = g(btX) + ξ where ξ is independent of X. The estimation of the p-variate function G is replaced by the estimation of the univariate function g and of the direction b. Goal of SIR [Li, 1991] : Estimate a basis of the central

  • subspace. (i.e. b in this particular case.)
slide-6
SLIDE 6

6

SIR

Idea : Find the direction b such that btX best explains Y . Conversely, when Y is fixed, btX should not vary. Find the direction b minimizing the variations of btX given Y . In practice : The support of Y is divided into h slices Sj. Minimization of the within-slice variance of btX under the constraint var(btX) = 1. Equivalent to maximizing the between-slice variance under the same constraint.

slide-7
SLIDE 7

7

Illustration

slide-8
SLIDE 8

8

Estimation procedure

Given a sample {(X1, Y1), . . . , (Xn, Yn)}, the direction b is estimated by ˆ b = argmax

b

btˆ Γb such that bt ˆ Σb = 1. (1) where ˆ Σ is the empirical covariance matrix and ˆ Γ is the between-slice covariance matrix defined by ˆ Γ =

h

  • j=1

nj n ( ¯ Xj − ¯ X)( ¯ Xj − ¯ X)t, ¯ Xj = 1 nj

  • Yi∈Sj

Xi, where nj is the number of observations in the slice Sj. The optimization problem (1) has a closed-form solution : ˆ b is the eigenvector of ˆ Σ−1ˆ Γ associated to the largest eigenvalue.

slide-9
SLIDE 9

9

Illustration

Simulated data. Sample {(X1, Y1), . . . , (Xn, Yn)} of size n = 100 with Xi ∈ Rp and Yi ∈ R, i = 1, . . . , n. Xi ∼ Np(0, Σ) where Σ = Q∆Qt with

∆ =diag(pθ, . . . , 2θ, 1θ), θ controls the decreasing rate of the eigenvalue screeplot, Q is an orientation matrix drawn from the uniform distribution

  • n the set of orthogonal matrices.

Yi = g(btXi) + ξ where

g is the link function g(t) = sin(πt/2), b is the true direction b = 5−1/2Q(1, 1, 1, 1, 1, 0, . . . , 0)t, ξ ∼ N1(0, 9.10−4)

slide-10
SLIDE 10

10

Results with θ = 2, dimension p = 10

Blue : Yi versus the projec- tions btXi on the true direc- tion b, Red : Yi versus the projections ˆ btXi on the estimated direc- tion ˆ b, Green : ˆ btXi versus btXi.

slide-11
SLIDE 11

11

Results with θ = 2, dimension p = 50

Blue : Yi versus the projec- tions btXi on the true direc- tion b, Red : Yi versus the projections ˆ btXi on the estimated direc- tion ˆ b, Green : ˆ btXi versus btXi.

slide-12
SLIDE 12

12

Explanation

Problem : ˆ Σ may be singular or at least ill-conditioned in several situations. Since rank(ˆ Σ) ≤ min(n − 1, p), if n ≤ p then ˆ Σ is singular. Even if n and p are of the same order, ˆ Σ is ill-conditioned, and its inversion yields numerical problems in the estimation

  • f the central subspace.

The same phenomenon occurs if the coordinates of X are strongly correlated. In the previous example, the condition number of Σ was pθ.

slide-13
SLIDE 13

13

Outline

1

Sliced Inverse Regression (SIR)

2

Regularization of SIR

3

SIR for data streams

4

Application to real data

slide-14
SLIDE 14

14

Inverse regression model

Model introduced in [Cook, 2007]. X = µ + c(Y )V b + ε, (2) where µ and b are vectors of Rp, ε ∼ Np(0, V ), independent of Y , c : R → R the coordinate function. Consequence : The expectation of X − µ given Y is collinear to the direction V b.

slide-15
SLIDE 15

15

Maximum likelihood estimation (1/3)

c(.) is expanded as a linear combination of h basis functions sj(.), c(.) =

h

  • j=1

cjsj(.) = st(.)c, where c = (c1, . . . , ch)t is unknown and s(.) = (s1(.), . . . , sh(.))t. Model (2) can be rewritten as X = µ + st(Y )cV b + ε, ε ∼ Np(0, V ),

slide-16
SLIDE 16

16

Maximum likelihood estimation (2/3)

Notations W : The h × h empirical covariance matrix of s(Y ) defined by W = 1 n

n

  • i=1

(s(Yi) − ¯ s)(s(Yi) − ¯ s)t with ¯ s = 1 n

n

  • i=1

s(Yi). M : the h × p matrix defined by M = 1 n

n

  • i=1

(s(Yi) − ¯ s)(Xi − ¯ X)t,

slide-17
SLIDE 17

17

Maximum likelihood estimation (3/3)

If W and ˆ Σ are regular, then the maximum likelihood estimator of b is ˆ b the eigenvector associated to the largest eigenvalue of ˆ Σ−1MtW −1M. = ⇒ The inversion ˆ Σ is necessary. In the particular case of piecewise constant basis functions sj(.) = I{. ∈ Sj}, j = 1, . . . , h, it can be shown that MtW −1M = ˆ Γ and thus ˆ b is the eigenvector associated to the largest eigenvalue of ˆ Σ−1ˆ Γ. = ⇒ SIR method.

slide-18
SLIDE 18

18

Regularized SIR

Introduction of a Gaussian prior N(0, Ω) on the unknown vector b. Ω describes which directions in Rp are more likely to contain b. If W and Ωˆ Σ + Ip are regular, then ˆ b is the eigenvector associated to the largest eigenvalue of (Ωˆ Σ + Ip)−1ΩMtW −1M. In the particular case where the basis functions are piecewise constant, ˆ b is the eigenvector associated to the largest eigenvalue of (Ωˆ Σ + Ip)−1Ωˆ Γ. = ⇒ The inversion of ˆ Σ is replaced by the inversion of Ωˆ Σ + Ip. = ⇒ For a well-chosen a priori matrix Ω, numerical problems disappear.

slide-19
SLIDE 19

19

Links with existing methods

Ridge [Zhong et al, 2005] : Ω = τ −1Ip. No privileged direction for b in Rp. τ > 0 is a regularization parameter. PCA+SIR [Chiaromonte et al, 2002] : Ω =

d

  • j=1

1 ˆ δj ˆ qj ˆ qt

j,

where d ∈ {1, . . . , p} is fixed, ˆ δ1 ≥ · · · ≥ ˆ δd are the d largest eigenvalues of ˆ Σ and ˆ q1, . . . , ˆ qd are the associated eigenvectors.

slide-20
SLIDE 20

20

Three new methods

PCA+ridge : Ω = 1 τ

d

  • j=1

ˆ qj ˆ qt

j.

In the eigenspace of dimension d, all the directions are a priori equivalent. Tikhonov : Ω = τ −1 ˆ Σ. The directions with large variance are the most likely to contain b. PCA+Tikhonov : Ω = 1 τ

d

  • j=1

ˆ δj ˆ qj ˆ qt

j.

In the eigenspace of dimension d, the directions with large variance are the most likely to contain b.

slide-21
SLIDE 21

21

Recall of SIR results with θ = 2 and p = 50

Blue : Projections btXi on the true direction b versus Yi, Red : Projections ˆ btXi on the estimated direction ˆ b ver- sus Yi, Green : btXi versus ˆ btXi.

slide-22
SLIDE 22

22

Regularized SIR results (PCA+Ridge)

Blue : Projections btXi on the true direction b versus Yi, Red : Projections ˆ btXi on the estimated direction ˆ b ver- sus Yi, Green : btXi versus ˆ btXi.

slide-23
SLIDE 23

23

Validation on simulations

Proximity criterion between the true direction b and the estimated ones ˆ b(r) on N = 100 replications : PC = 1 N

N

  • r=1

cos2(b,ˆ b(r)) 0 ≤ PC≤ 1, a value close to 0 implies a low proximity : The ˆ b(r) are nearly

  • rthogonal to b,

a value close to 1 implies a high proximity : The ˆ b(r) are approximately collinear with b.

slide-24
SLIDE 24

24

Sensitivity with respect to the “cut-off” dimension

d versus PC. The condition number is fixed (θ = 2) The optimal regularization parameter is used for each value of d. PCA+SIR : very sensitive to d. PCA+ridge and PCA+Tikhonov : stable as d increases.

slide-25
SLIDE 25

25

Outline

1

Sliced Inverse Regression (SIR)

2

Regularization of SIR

3

SIR for data streams

4

Application to real data

slide-26
SLIDE 26

26

Context

We consider data arriving sequentially by blocks in a stream. Each data block j = 1, . . . , J is an i.i.d. sample (Xi, Yi), i = 1, . . . , n from the regression model (2). Goal : Update the estimation of the direction b at each arrival

  • f a new block of observations.
slide-27
SLIDE 27

27

Method

Compute the individual directions ˆ bj on each block j = 1, . . . , J using regularized SIR. Compute a common direction as ˆ b = argmax

||b||=1 J

  • j=1

cos2(ˆ bj, b) cos2(ˆ bj,ˆ bJ). Idea : If ˆ bj is close to ˆ bJ then ˆ b should be close to ˆ bj. Explicit solution : ˆ b is the eigenvector associated to the largest eigenvalue of MJ =

J

  • j=1

ˆ bjˆ bt

j cos2(ˆ

bj,ˆ bJ).

slide-28
SLIDE 28

28

Advantages of SIRdatastream

Computational complexity O(Jnp2) v.s. O(J2np2) for the brute-force method which would consist in applying regularized SIR on the union of the j first blocks for j = 1, . . . , J. Data storage O(np) v.s. O(Jnp) for the brute-force method. (under the assumption n >> max(J, p)). Interpretation of the weights cos2(ˆ bj,ˆ bJ).

slide-29
SLIDE 29

29

Illustration on simulations

Scenario 1 : A common direction in all the 60 blocks.

10 20 30 40 50 60 0.75 0.80 0.85 0.90 0.95 1.00 number of blocs quality measure 10 20 30 40 50 60 0.75 0.80 0.85 0.90 0.95 1.00 number of blocs quality measure 10 20 30 40 50 60 0.75 0.80 0.85 0.90 0.95 1.00 number of blocs quality measure

Current block Previous block 59 56 53 50 47 44 41 38 35 32 29 26 23 20 17 14 11 8 6 4 2 2 3 4 5 6 7 8 9 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60

Left : cos2(ˆ b, b) for SIRdatastream, SIR brute-force and SIR estimators at each time t. Right : cos2(ˆ bj,ˆ bJ). The lighter (yellow) is the color, the larger is the weight. Red color stands for very small squared cosines.

slide-30
SLIDE 30

30

Illustration on simulations

Scenario 2 : The 10th block is an outlier.

10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 number of blocs quality measure 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 number of blocs quality measure 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 number of blocs quality measure

Current block Previous block 59 56 53 50 47 44 41 38 35 32 29 26 23 20 17 14 11 8 6 4 2 2 3 4 5 6 7 8 9 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60

Left : cos2(ˆ b, b) for SIRdatastream, SIR brute-force and SIR estimators at each time t. Right : cos2(ˆ bj,ˆ bJ). The lighter (yellow) is the color, the larger is the weight. Red color stands for very small squared cosines.

slide-31
SLIDE 31

31

Illustration on simulations

Scenario 3 : A drift occurs from the 10th block (b to b′)

10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 number of blocks quality measure 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 number of blocks quality measure 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 number of blocks quality measure Current block Previous block 59 56 53 50 47 44 41 38 35 32 29 26 23 20 17 14 11 8 6 4 2 2 3 4 5 6 7 8 9 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60

Left : cos2(ˆ b, b) for SIRdatastream, SIR brute-force and SIR estimators at each time t. Right : cos2(ˆ bj,ˆ bJ). The lighter (yellow) is the color, the larger is the weight. Red color stands for very small squared cosines.

slide-32
SLIDE 32

32

Illustration on simulations

Scenario 3 (cont’d) :A drift occurs from the 10th block (b to b′)

10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 number of blocks quality measure 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 number of blocks quality measure

Current block Previous block 59 56 53 50 47 44 41 38 35 32 29 26 23 20 17 14 11 8 6 4 2 2 3 4 5 6 7 8 9 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60

Left : cos2(ˆ b, b′) for SIRdatastream and SIR brute-force. Right : cos2(ˆ b, b′)

slide-33
SLIDE 33

33

Illustration on simulations

Scenario 4 : From the 10th block to the last one, there is no common direction.

10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 number of blocs quality measure 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 number of blocs quality measure 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 number of blocs quality measure

Current block Previous block 59 56 53 50 47 44 41 38 35 32 29 26 23 20 17 14 11 8 6 4 2 2 3 4 5 6 7 8 9 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60

Left : cos2(ˆ b, b) for SIRdatastream, SIR brute-force and SIR estimators at each time t. Right : cos2(ˆ bj,ˆ bJ). The lighter (yellow) is the color, the larger is the weight. Red color stands for very small squared cosines.

slide-34
SLIDE 34

34

Outline

1

Sliced Inverse Regression (SIR)

2

Regularization of SIR

3

SIR for data streams

4

Application to real data

slide-35
SLIDE 35

35

Estimation of Mars surface physical properties from hyperspectral images

Context : Observation of the south pole of Mars at the end of summer, collected during orbit 61 by the French imaging spectrometer OMEGA on board Mars Express Mission. 3D image : On each pixel, a spectra containing p = 184 wavelengths is recorded. This portion of Mars mainly contains water ice, CO2 and dust. Goal : For each spectra X ∈ Rp, estimate the corresponding physical parameter Y ∈ R (grain size of CO2).

slide-36
SLIDE 36

36

An inverse problem

Forward problem. Physical modeling of individual spectra with a surface reflectance model. Starting from a physical parameter Y , simulate X = F(Y ). Generation of n = 12, 000 synthetic spectra with the corresponding parameters. = ⇒ Learning database. Inverse problem. Estimate the functional relationship Y = G(X). Dimension reduction assumption G(X) = g(btX). b is estimated by (regularized) SIR, g is estimated by a nonparametric one-dimensional regression.

slide-37
SLIDE 37

37

Estimated function g

Estimated function g between the projected spectra ˆ btX on the first axis of regularized SIR (PCA+ridge) and Y , the grain size of CO2.

slide-38
SLIDE 38

38

Estimated CO2 maps

Grain size of CO2 estimated with SIR (left) and regularized SIR (right) on a hyperspectral image of Mars.

slide-39
SLIDE 39

39

Dimension reduction

In this talk : dimension reduction for regression. In the team Mistis : Unsupervised dimension reduction (nonlinear PCA), Dimension reduction for classification and clustering.

slide-40
SLIDE 40

40

References on this work

Bernard-Michel, C., Douté, S., Fauvel, M., Gardes, L. et Girard, S. (2009). Retrieval of Mars surface physical properties from OMEGA hyperspectral images using Regularized Sliced Inverse Regression. Journal of Geophysical Research - Planets, 114, E06005 Bernard-Michel, C., Gardes, L. et Girard, S. (2009). Gaussian Regularized Sliced Inverse Regression, Statistics and Computing, 19, 85–98. Bernard-Michel, C., Gardes, L. et Girard, S. (2008). A Note

  • n Sliced Inverse Regression with Regularizations, Biometrics,

64, 982–986.

slide-41
SLIDE 41

41

References on SIR

[Li, 1991] Li, K.C. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86, 316–327. [Cook, 2007]. Cook, R.D. (2007). Fisher lecture : Dimension reduction in regression. Statistical Science, 22(1), 1–26. [Zhong et al, 2005] : Zhong, W., Zeng, P., Ma, P., Liu, J.S. and Zhu, Y. (2005). RSIR : Regularized Sliced Inverse Regression for motif discovery. Bioinformatics, 21(22), 4169–4175. [Chiaromonte et al, 2002] : Chiaromonte, F. and Martinelli, J. (2002). Dimension reduction strategies for analyzing global gene expression data with a response. Mathematical Biosciences, 176, 123–144.