Biweight Correlation as a Measure of Distance between Genes on a - - PowerPoint PPT Presentation

▶

Feb 05, 2023 538 likes •840 views

Biweight Correlation as a Measure of Distance between Genes on a Microarray Aya Mitani Pitzer College 06 Advisor: Professor Johanna Hardin Pomona College April 29, 2006 1 About microarray Small chip Contains thousands of probes

SLIDE 1

Biweight Correlation as a Measure of Distance between Genes

n a Microarray

Aya Mitani Pitzer College ’06 Advisor: Professor Johanna Hardin Pomona College April 29, 2006

1

SLIDE 2

About microarray

Small chip
Contains thousands of probes
Measures mRNA activity in a particular cell type
Contains control and treatment sample
Expression level is measured from light intensity

2

SLIDE 3

3

SLIDE 4

4

SLIDE 5

Problem with microarray

Noisy data
Needs robust estimation of correlation
Pearson correlation is often used
One outlier can greatly affect correlation

5

SLIDE 6

Last summer M-estimation

weighed average with points farther from the center given less weight di =

(xi − ˜

µ)′ ˜ Σ−1(xi − ˜ µ) (1) ˜ µ =

i w(di)xi
i w(di)

(2) ˜ Σ =

i w(di)(xi − ˜

µ)(xi − ˜ µ)′

i w(di)

(3) Tukey’s biweight w(di) =

di(1 − (di

c )2)2

di ≤ c di > c Use Minimum Covariance Determinant (MCD) for initial estimation of µ and Σ

6

SLIDE 7

Plot of Biweight weight function (w)

distance weight 1 2 3 4 0.0 0.2 0.4 0.6 0.8 1.0

7

SLIDE 8

Biweight Correlation Coefficient bwcjk = σjk σjjσkk where σjk is biweight estimate of covariance of genej and genek and σjj is biweight estimate of variance of gene j Want to find out the correlation(similarities/differences) of two genes

8

SLIDE 9

−0.5 0.0 0.5 1.0 −0.5 0.0 0.5 Biweight correlation Pearson correlation

9

SLIDE 10

−0.5 0.0 0.5 −4 −3 −2 −1 Gene 14 Gene 86 −0.5 0.0 0.5 1.0 1.5 2.0 −2.0 −1.0 0.0 0.5 Gene 26 Gene 11

10

SLIDE 11

Further work to be done

Computational time
Biweight correlation on clean data

11

SLIDE 12

This Spring

Matrix correlation vs Pair by pair correlation
One-step M-estimation
Median vs MCD
Biweight correlation good for clean data?

12

SLIDE 13

Instead of computing pair by pair correlation, compute correla- tion matrix from biweight covariance matrix simultaneously

di =

(xi − ˜

µ)′ ˜ Σ−1(xi − ˜ µ) (4) ˜ µ =

i w(di)xi
i w(di)

(5) ˜ Σ =

i w(di)(xi − ˜

µ)(xi − ˜ µ)′

i w(di)

(6)

⎛ ⎝

mat.bwc11 . . . mat.bwc1n mat.bwc21 . . . mat.bwc2n . . . ... . . . mat.bwcn1 . . . mat.bwcnn

⎞ ⎠ =

σ11

. . . . . . ... . . . . . . σnn

−1 ⎛ ⎝

σ11 . . . σ1n σ21 . . . σ2n . . . ... . . . σn1 . . . σnn

⎞ ⎠

σ11

. . . . . . ... . . . . . . σnn

−1

mat.bwcjk = bwcjk???

13

SLIDE 14

−0.5 0.0 0.5 1.0 −0.5 0.0 0.5 1.0

10 genes

Matrix Correlation Pair by pair correlation

15

SLIDE 15

One-step M-estimation

−0.5 0.0 0.5 1.0 −0.5 0.0 0.5 1.0

20 genes

ne−step

Converged

Converged M-estimation was doing 10-25 iterations on average (Takes 11 seconds to compute 190 pairs of genes)

16

SLIDE 16

Few-step

−0.5 0.0 0.5 1.0 −0.5 0.0 0.5 1.0

20 genes

3−step Converged −0.5 0.0 0.5 1.0 −0.5 0.0 0.5 1.0

20 genes

5−step Converged

3.5 seconds 5.5 seconds

17

SLIDE 17

−2.0 −1.5 −1.0 −0.5 0.0 0.5 −1.0 −0.5 0.0 0.5 1.0 1.5 Gene 11 Gene 18

18

SLIDE 18

10-step

−0.5 0.0 0.5 1.0 −0.5 0.0 0.5 1.0

20 genes

10−step Converged

8 seconds

19

SLIDE 19

Median instead of MCD

Median for ˜

µ

Median absolute deviation (MAD) for ˜

Σ

MAD(X) = median|xi − median(xi)|

If converged → no difference

20

SLIDE 20

−0.5 0.0 0.5 1.0 −0.5 0.0 0.5 1.0

20 genes

Median converged MCD converged

7 seconds

21

SLIDE 21

Few-step median

−0.5 0.0 0.5 1.0 −0.5 0.0 0.5 1.0

20 genes

Median 3−step MCD converged −0.5 0.0 0.5 1.0 −0.5 0.0 0.5 1.0

20 genes

Median 5−step MCD converged

1.5 seconds 2.5 seconds

22

SLIDE 22

−2.5 −2.0 −1.5 −1.0 −0.5 0.0 0.5 −2 −1 1 2 Gene 17 Gene 7

23

SLIDE 23

10-step median

−0.5 0.0 0.5 1.0 −0.5 0.0 0.5 1.0

20 genes

Median 10−step MCD converged

5 seconds

24

SLIDE 24

10-step median 5-step MCD

−0.5 0.0 0.5 1.0 −0.5 0.0 0.5 1.0

20 genes

Median 10−step MCD converged −0.5 0.0 0.5 1.0 −0.5 0.0 0.5 1.0

20 genes

5−step Converged

5 seconds 5.5 seconds

25

SLIDE 25

Biweight correlation on clean data

How biased/variable compared to Pearson correlation?

0.6 0.7 0.8 0.9 1.0

Pearson correlation

0.5 0.6 0.7 0.8 0.9 1.0

Biweight correlation

0.7636 0.8482 0.7850 0.7523 0.8541 0.7945

26

SLIDE 26

What makes the difference?

0.6 0.7 0.8 0.9 1.0 0.6 0.7 0.8 0.9 1.0

Multivariate normal data

Biweight correlation Pearson correlation

27

SLIDE 27

−3 −2 −1 1 −3 −2 −1 1

bw−pearson=0.1166

row2 row11 −3 −2 −1 1 −2 −1 1 2

bw−pearson=0.1108

row2 row16 −2 −1 1 −2 −1 1 2 3

bw−pearson=0.0523

row6 row17 −2 −1 1 −2 −1 1 2

bw−pearson=0.0003

row5 row15

28

SLIDE 28

Concluding remarks

Biweight correlation is unbiased and similarly variable with

Pearson correlation

Median and median absolute deviation for initiation of ˜

µ and ˜ Σ is as robust as MCD estimators

Median and median absolute deviation for initiation of ˜

µ and ˜ Σ is faster than MCD estimators

Depending on how robust we want the result to be, compu-

tational time can be shortened by number of iterations for speed efficiency

Generally, 5 iterations or more is recommended