Bayesian Methods for Variable Selection with Applications to - - PowerPoint PPT Presentation

▶

Jan 09, 2024 114 likes •533 views

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 1: Mixture Priors for Linear Settings Marina Vannucci Rice University, USA ABS13-Italy 06/17-21/2013 Marina Vannucci (Rice University, USA) Bayesian

SLIDE 1

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data

Part 1: Mixture Priors for Linear Settings Marina Vannucci

Rice University, USA

ABS13-Italy 06/17-21/2013

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 1 / 40

SLIDE 2

Part 1: Mixture Priors for Linear Settings

Linear regression models (univariate and multivariate responses) Extensions to categorical responses and survival outcomes Matlab code Examples from genomics/proteomics Bayesian models for integrative genomics (next part)

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 2 / 40

SLIDE 3

Regression Model

Yn×1 = 1α + Xn×pβ β βp×1 + ε, ε ∼ N(0, σ2I) Introduce latent variable γ γ γ = (γ1, . . . , γp)′ to select variables γj = 1 if variable j included in model γj = 0

therwise

Specify priors for model parameters: βj|σ2 ∼ (1 − γj)δ0(βj) + γjN(0, σ2hj) α|σ2 ∼ N(α0, h0σ2) σ2 ∼ IG(ν/2, λ/2) p(γ γ γ) =

p

wγj(1 − w)1−γj. where δ0(·) is the Dirac function.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 3 / 40

SLIDE 4

Posterior Distribution

Combine data and prior information into a posterior distribution ⇒ interest in posterior distribution p(γ γ γ|Y, X) ∝ p(γ γ γ)

f(Y|X, α, β

β β, σ σ σ)p(α|σ)p(β β β|σ, γ γ γ)p(σ)dαdβ β βdσ p(γ γ γ|Y, X) ∝ g(γ γ γ) = |˜ X′

(γ

γ γ)˜ X(γ γ γ)|−1/2(νλ + S2 γ γ γ)−(n+ν)/2p(γ γ γ) ˜ X(γ γ γ) =

X(γ

γ γ)H

1 2

(γ

γ γ) Ipγ γ γ

, ˜

Y = Y

γ γ γ = ˜ Y′ ˜ Y − ˜ Y′ ˜ X(γ γ γ)(˜ X′

(γ

γ γ)˜ X(γ γ γ))−1 ˜ X′

(γ

γ γ) ˜ Y the residual sum of squares from the least squares regression of ˜ Y on ˜ X(γ γ γ). Fast updating schemes use Cholesky or QR decompositions with efficient algorithms to remove or add columns.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 4 / 40

SLIDE 5

Model Fitting via MCMC

With p variables there are 2p different γ γ γ values. We use Metropolis as stochastic search. At each MCMC iteration we generate a candidate γ γ γnew by randomly choosing one of these moves:

(i) Add or Delete: randomly choose one of the indices in γ γ γold and change its value. (ii) Swap: choose independently and at random a 0 and a 1 in γ γ γold and switch their values.

The proposed γ γ γnew is accepted with probability min p(γ γ γnew|X, Y) p(γ γ γold|X, Y) , 1

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 5 / 40

SLIDE 6

Posterior inference

The stochastic search results in a list of visited models (γ γ γ(0), γ γ γ(1), . . .) and their corresponding relative posterior probabilities p(γ γ γ(0)|X, Y), p(γ γ γ(1)|X, Y) . . . Select variables: in the “best” models, i.e. the γ γ γ’s with highest p(γ γ γ|X, Y) or with largest marginal posterior probabilities p(γj = 1|X, Y) =

p(γj = 1, γ

γ γ(−j)|X, Y)dγ γ γ(−j) ≈

γ γ:γj=1 p

Y|X, γ

γ γ(t) p(γ(t))

r more simply by empirical frequencies in the MCMC output

p(γj = 1|X, Y) = E(γj = 1|X, Y) ≈ #{γ γ γ(t) = 1}

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 6 / 40

SLIDE 7

Multivariate Response

Yn×q = 1α′ + Xn×pBp×q + E, Ei ∼ N(0, Σ Σ Σ) Variable selection via γ γ γ as B B Bj|Σ Σ Σ ∼ (1 − γj)I0 + γjN(0, hjΣ Σ Σ), with B B Bj the j-th row of B B B and I0 a vector of point masses at 0. Need to work with matrix-variate distributions (Dawid, 1981): Y − 1α α α′ − XB ∼ N(In, Σ Σ Σ) α α α − α α α0 ∼ N(h0, Σ Σ Σ) Bγ γ γ − B0γ γ γ ∼ N(Hγ γ γ, Σ Σ Σ) Σ Σ Σ ∼ IW(δ, Q). with IW an inverse-Wishart with parameters δ and Q to be specified.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 7 / 40

SLIDE 8

Posterior Distribution

Combine data and prior information into a posterior distribution ⇒ interest in posterior distribution p(γ γ γ|Y, X) ∝ p(γ γ γ)

f(Y|X, α

α α, B, Σ Σ Σ)p(α α α|Σ Σ Σ)p(B|Σ Σ Σ, γ γ γ)p(Σ Σ Σ)dα α αdBdΣ Σ Σ p(γ γ γ|Y, X) ∝ g(γ γ γ) = |˜ X′

(γ

γ γ)˜ X(γ γ γ)|−q/2|Qγ γ γ|−(n+δ+q−1)/2p(γ γ γ) ˜ X(γ γ γ) =

X(γ

γ γ)H

1 2

(γ

γ γ) Ipγ γ γ

, ˜

Y = Y

γ γ = Q + ˜ Y′ ˜ Y − ˜ Y′ ˜ X(γ γ γ)(˜ X′

(γ

γ γ)˜ X(γ γ γ))−1 ˜ X′

(γ

γ γ) ˜ Y It can be calculated via QR-decomposition (Seber, ch.10, 1984). Use qrdelete and qrinsert algorithms to remove or add a column.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 8 / 40

SLIDE 9

Prediction

Prediction of future Yf given the corresponding Xf can be done: as posterior weighted average of model predictions (BMA) p(Yf |X, Y) =

γ γ p(Yf |X, Y, γ γ γ)p(γ γ γ|X, Y) with p(Yf |X, Y, γ γ γ) a matrix-variate T distribution with mean Xf ˆ Bγ γ γ ˆ Yf =

γ γ

γ γ γ ˆ Bγ γ γ

p(γ

γ γ|X, Y) ˆ Bγ γ γ = (X′ γ γ γXγ γ γ + H−1 γ γ γ )−1X′ γ γ γY as LS or Bayes predictions on single best models as LS or Bayes predictions with “threshold” models (eg, “median” model) obtained from estimated marginal probabilities of inclusion.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 9 / 40

SLIDE 10

Prior Specification

Priors on α α α and Σ Σ Σ vague and largely uninformative α α α′ − α α α′

0 ∼ N(h, Σ

Σ Σ), α α α0 ≡ 0, h → ∞, Σ Σ Σ ∼ IW(δ, Q), δ = 3, Q = kI Choices for Hγ γ γ: Hγ γ γ = c ∗ (X′ γ γ γXγ γ γ)−1 (Zellner g-prior) Hγ γ γ = c ∗ diag(X′ γ γ γXγ γ γ)−1 Hγ γ γ = c ∗ Iγ γ γ Choice of wj = p(γj = 1) : wj = w, w ∼ Beta(a, b) (sparsity). Also, choices that reflect prior information (e.g., gene networks).

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 10 / 40

SLIDE 11

Advantages of Bayesian Approach

Past and collateral information through priors n << p Rich modeling via Markov chain Monte Carlo (MCMC) (for p large) Optimal model averaging prediction Extends to multivariate response

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 11 / 40

SLIDE 12

Main References

GEORGE, E.I. and MCCULLOCH, R.E. (1993). Variable Selection via Gibbs

Sampling. Journal of the American Statistical Association, 88, 881–889.

GEORGE, E.I. and MCCULLOCH, R.E. (1997). Approaches for Bayesian Variable Selection. Statistica Sinica, 7, 339–373. MADIGAN, D. and YORK, J. (1995). Bayesian Graphical Models for Discrete

Data. International Statistical Review, 63, 215–232

BROWN, P.J., VANNUCCI, M. and FEARN, T. (1998). Multivariate Bayesian Variable Selection and Prediction. Journal of the Royal Statistical Society, Series B, 60, 627–641. BROWN, P.J., VANNUCCI, M. and FEARN, T. (2002). Bayes model averaging with selection of regressors. Journal of the Royal Statistical Society, Series B, 64(3), 519–536.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 12 / 40

SLIDE 13

Additional References

Use of g-priors: LIANG, F., PAULO, R., MOLINA, G., CLYDE, M. and BERGER, J. (2008). Mixture of g priors for Bayes variable section. Journal of the American Statistical Association, 103, 410-423. Improving MCMC mixing: BOTTOLO, L. and RICHARDSON, S. (2010). Evolutionary stochastic search for Bayesian Model Exploration. Bayesian Analysis, 5(3), 583-618. The authors propose an evolutionary Monte Carlo scheme combined with a parallel tempering approach that prevents the chain from getting stuck in local modes. Multiplicity: SCOTT, J. and BERGER, J. (2010). Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. The Annals of Statistics, 38(5), 2587-2619. The marginal prior on γ γ γ contains a non-linear penalty which is a function of p and therefore, as p grows, with the number of true variables remaining fixed, the posterior distribution of w concentrates near 0.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 13 / 40

SLIDE 14

Code from my Website

bvsme fast: Bayesian Variable Selection with fast form of QR updating Metropolis search gPrior or diagonal and non-diagonal selection prior Bernoulli priors or Beta-Binomial prior Predictions by LS, BMA and BMA with selection

http://stat.rice.edu/∼marina

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 14 / 40

SLIDE 15

Data augmentation techniques

Binary response case. Basic idea: re-expression of discrete-data regression models as unobserved (latent) continuous data. Aids interpretation and allows convenient MCMC sampling. Used both for logistic and probit regression. Albert and Chib (1993) demonstrated an auxiliary variable approach to simplify binary probit regression. Introduce extra variables into model, y such that z = g(y), g any non-decreasing function for interpretability. It can also be used for multinomial/ordinal data.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 15 / 40

SLIDE 16

Probit link for binary outcome

The auxiliary variable formulation for binary outcomes assumes that a continuous latent variable Yi exists such that The latent value Yi is related to the binary zi via zi = 1 if Yi > 0 zi = 0 if Yi ≤ 0 Associated with the i-th response, the values of p covariates xi1, . . . , xip are observed. The latent value Yi is related to the p covariates by the normal regression model Yi = xi1β1 + . . . + xipβp + εi εi ∼ N(0, 1) [Define yi = ηi + ǫi, with ǫi ∼ N(0, 1) and ηi = x′

iβ (linear predictor - GLM)].

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 16 / 40

SLIDE 17

Then we can show that p(zi = 1|β) = p(zi = 1|yi > 0, β)p(yi > 0|β) + p(zi = 1|yi < 0, β)p(yi < 0|β) = 1 × p(yi > 0|β) + 0 × p(yi < 0|β) = p(yi − ηi > −ηi|β) = Φ(ηi). with ηi = (xi1β1 + . . . + xipβp) and where Φ() is the cdf of a standard normal distribution. The latent values Yi are viewed as additional parameters. Gibbs sampling can be used to obtain posterior draws of β and Y = (Y1, . . . , Yn).

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 17 / 40

SLIDE 18

If we specify, for example, a normal prior density for β β ∼ N(0, S0) the full conditionals are β|Y, X X X, z ∼ Np

X XTX X X + S−1

0 )−1X

X XTY, (X X XTX X X + S−1

0 )−1

Yi|β, X X X, z ∼ N(xiβ, 1) · I{Yi > 0} if zi = 1 N(xiβ, 1) · I{Yi ≤ 0} if zi = 0 Sampling from truncated normal density, y ∼ N(µ, σ2) · I(a < y < b), via the inverse CDF transformation method:

Setting u1 = Φ(a; µ, σ2) and u2 = Φ(b; µ, σ2)

Sampling u ∼ U(u1, u2)

Setting y = Φ−1(u; µ, σ2)

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 18 / 40

SLIDE 19

Probit Models with Binary Response

Response with G = 2 classes: zi ∈ {0, 1} associated with a set of p predictors Xi, i = 1, . . . , n. Data augmentation: Latent (unobserved) yi linearly associated with the Xi’s yi = α + X′

iβ

β β + ǫi, ǫi ∼ N(0, σ2 = 1), i = 1, . . . , n. with intercept α and coefficient vector β β βp×1. Association zi = if yi < 0 1 if otherwise

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 19 / 40

SLIDE 20

Probit Models with Multinomial Response

Response with G classes: zi ∈ {0, 1, ..., G − 1} associated with a set of p predictors Xi, i = 1, · · · , n (gene expressions). Data augmentation: Latent (unobserved) vector Yi linearly associated with the Xi’s Yi = α α α′ + X′

iB + Ei, Ei ∼ N(0, Σ

Σ Σ), i = 1, . . . , n. with intercepts α α α(G−1)×1 and coefficient matrix Bp×(G−1). Association zi =

if yig < 0 for each g

g if yig = max

1≤g≤G−1{yig}

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 20 / 40

SLIDE 21

Variable Selection

We introduce a binary latent vector γ γ γ for variable selection γj = 1 if variable j discriminate the samples γj = 0

therwise

A mixture prior is placed on the jth row of B, given γ γ γ Bj ∼ (1 − γj)I0 + γjN(0, cΣ Σ Σ) Assume γj’s are independent Bernoulli variables Combine data and priors into posterior p(γ γ γ|X, Y). Inference is complicated because response variable is latent. Σ Σ Σ ∼ IW(δ; Q), α α α ∼ N(0, h0Σ Σ Σ), large h0.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 21 / 40

SLIDE 22

MCMC Algorithm

We sample (γ γ γ, Y) by Metropolis within Gibbs Metropolis step to update γ γ γ from [γ γ γ|X, Z, Y]. We update γ γ γ(old) to γ γ γ(new) by:

(a) Add/delete: randomly choose a γj and change its value. (b) Swap: randomly choose a 0 and a 1 in γ γ γold and switch values.

The new candidate γ γ γ(new) is accepted with probability min{p(γ γ γ(new)|X, Z, Y) p(γ γ γ(old)|X, Z, Y) , 1} We sample (Y|γ γ γ, X, Z) from a truncated normal or t distribution with truncation based on Z.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 22 / 40

SLIDE 23

Posterior Inference

Select variables that are in the “best” models

γ γ∗ = argmax

1≤t≤M

p(γ

γ γ(t)|X, Z, Y)

, with

Y = 1 M

M

Y(t) Select variables with largest marginal probabilities p(γj = 1|X, Z, ˆ Y) Predict future Yf by a posterior predictive mean ˆ Yf =

γ γ ˆ Yf(γ γ γ)π(γ γ γ|ˆ Y, X, Z) with Yf(γ γ γ) = 1˜ α′ + Xf(γ γ γ)˜ Bγ γ γ and ˜ α α α, ˜ Bγ γ γ based on ˆ Y

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 23 / 40

SLIDE 24

Code from my website

bvsme prob: Bayesian Variable Selection for classification with fast form of QR updating binary/multinomial/ordinal response Metropolis search gPrior or diagonal and non-diagonal selection prior Bernoulli priors or Beta-Binomial prior Predictions by LS, BMA and BMA with selection

http://stat.rice.edu/∼marina

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 24 / 40

SLIDE 25

Logit Models

More naturally interpretable in terms of odds ratios. Marginalization not possible. For binary data, a data augmented model is zi = x′

iβ

β β + ǫi, with ǫi a scale mixture of normals with marginal logistic, ǫi ∼ N(0, λi) λi = (2ψi)2 ψi ∼ KS, with KS the Kolmogorov-Smirnov distribution. Variable selection is achieved by imposing mixture priors on βj’s. Sampling schemes improve mixing by joint updates of correlated parameters, i.e, (γ γ γ, β β β) using a Metropolis-Hastings with proposal the full conditional of β β β and the add-delete-swap Metropolis for γ γ γ. Also, (z, λ λ λ) from truncated logistics and rejection sampling.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 25 / 40

SLIDE 26

Accelerated Failure Time models for Survival Outcomes

We use accelerated failure time (AFT) models log(Ti) = α + X′

iβ

β β + εi, i = 1, . . . , n. Observe yi = min(ti, ci) and δi = I{ti ≤ ci}, where ci censoring time. We introduce augmented data W = (w1, . . . , wn)′ to impute the censored survival times wi = log(yi) if δi = 1 wi > log(yi) if δi = 0 We consider different distributional assumptions for εi.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 26 / 40

SLIDE 27

Introduce latent vector γ γ γ for variable selection. MCMC steps consist of

(1) Metropolis search to update γ γ γ from f(γ γ γ|X, W). (2) Impute censored failure times, wi with δi = 0, from f(wi|W−i, X, γ γ γ).

Inference on variables based on p(γj = 1|X, W) or p(γ γ γ|X, W W W). Prediction of survival time for future patients

Wf =
γ

γ γ

α′ + Xf(γ γ γ) βγ γ γ

p(γ

γ γ|X, W). Predictive survivor function P(Tf > t|Xf, X, W) ≈

γ γ P

W > w|Xf, X,

W, γ γ γ

p(γ

γ γ|X, W).

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 27 / 40

SLIDE 28

Code from my website

bvsme surv: Bayesian Variable Selection for AFT models with right censoring Metropolis search diagonal selection prior Bernoulli priors or Beta-Binomial prior

http://stat.rice.edu/∼marina

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 28 / 40

SLIDE 29

Main References

ALBERT, J.H. and CHIB, S. (1993), ”Bayesian Analysis of Binary and Polychotomous Response Data”, JASA, 88(422), 669-679. SHA, N., VANNUCCI, M., TADESSE, M.G., BROWN, P.J., DRAGONI, I., DAVIES, N., ROBERTS, T. C., CONTESTABILE, A., SALMON, N., BUCKLEY,

C. and FALCIANI, F. (2004). Bayesian variable selection in multinomial probit

models to identify molecular signatures of disease stage, Biometrics, 60, 812-819. HOLMES, C.C. and HELD, L. (2006). Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis, 1(1), 145-166. See also Comment by Ralf van der Lans (2011) Bayesian Analysis, 6(2), 353-355 and response from authors. SHA, N., TADESSE, M.G. and VANNUCCI, M. (2006). Bayesian variable selection for the analysis of microarray data with censored outcome. Bioinformatics, 22(18), 2262-2268.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 29 / 40

SLIDE 30

Applications to High-Throughput Data: DNA microarrays

DNA fragments arrayed on glass slide or chip Parallel quantification of thousands of genes in a single experiment Identify biomarkers for treatment strategies and diagnostic tools

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 30 / 40

SLIDE 31

Statistical Analyses

Identification of differentially expressed genes (gene selection for sample classification) Discovery of subtypes of tissue/disease that respond differently to treatment (gene selection and sample clustering) Prediction of continuous responses (clinical outcome, survival time) The major challenge is the high-dimensionality of the data. p ≫ n Widely used approaches: t-test, ANOVA, Cox model on single genes (ignores joint effect of genes; multiple testing issue) or dimension reduction techniques, PCA, PLS (leads to linear combinations; cannot assess original variables). Lately emphasis on subset selection methods (LASSO, Bayesian models).

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 31 / 40

SLIDE 32

Identification of Biomarkers of Disease Stage

Data consist of 11 early stage (duration less than 2 years) and 9 late stage (over 15 years) rheumatoid arthritis patients. mRNA samples extracted from peripheral blood and hybridized to custom-made cDNA arrays. 755 gene expressions. Logged and std-ed data Bernoulli prior with expected model size 10 We ran six MCMC chains with very different starting γ vectors.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 32 / 40

SLIDE 33

1 2 3 4 5 6 7 8 9 10 x 10

20 40 60 80 100 120 (a) Iteration number Number of ones 1 2 3 4 5 6 7 8 9 10 x 10

−400 −300 −200 −100 (b) Iteration number Log probs 100 200 300 400 500 600 700 800 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Gene Probability

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 33 / 40

SLIDE 34

Selected genes by best 10 models of each chain and of their union Small sets of functionally related genes involved in cytoskeleton remodeling and motility, and with lymphocytes’ ability to respond to activation. .05(1/20) misclassification error.

Gene Gene Gene Gene Gene Gene Gene Patient (Early 1−11, Late 12−20) 2 4 6 8 10 12 14 16 18 20

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 34 / 40

SLIDE 35

Peak Selection for Protein Mass spectra

Cancer classification based on mass spectra at 15,000 m/z ratios. x-axis: ratio of weight of a molecule to its electrical charge (m/z), y-axis: intensity ∼ abundance of that molecule in the sample. Goal: identification of peaks (proteins or protein fragments) related to a clinical outcome or disease status

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 35 / 40

SLIDE 36

2000 4000 6000 8000 10000 12000 14000 100 200 300 m/z intensity 2000 4000 6000 8000 10000 12000 14000 100 200 300 m/z intensity 2000 4000 6000 8000 10000 12000 14000 100 200 300 m/z intensity

(A) (B) (C)

Serum spectra on 50 (10+11+29) subjects (SELDI-TOF). Ordinal response - tumor grade (ovarian cancer).

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 36 / 40

SLIDE 37

Data Processing

Preprocessing:

Baseline subtraction
Denoising (often by wavelets)
Peak identification
Normalization
Alignment

Analysis:

Model fitting
Validation

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 37 / 40

SLIDE 38

Data processing results in 39 identified peaks. Probit model with Bayesian variable selection applied to 39 peaks. “Best” models with around 7 peptides (6 common). Misclassification errors (2/10, 8/11, 9/29)

10 20 30 40 0.2 0.4 0.6 0.8 1 marginal probability peak 10 20 30 40 0.2 0.4 0.6 0.8 1 marginal probability peak 10 20 30 40 0.2 0.4 0.6 0.8 1 marginal probability peak 10 20 30 40 0.2 0.4 0.6 0.8 1 marginal probability peak

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 38 / 40

SLIDE 39

Survival outcome: Case Study on Breast Cancer (van’t Veer et al. (2002))

Microarray data on 76 patients, 33 who developed distant metastases within 5 years and 43 who did not (censored). Training and test sets (38+38 patients). About 5,000 genes. MSE=1.9 (with 11 genes).

500 1000 1500 2000 2500 3000 3500 4000 4500 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

gene index freq of γj=1

−6 −4 −2 2 4 6 8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 log survival time survival probability

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 39 / 40

SLIDE 40

Main References

SHA, N., VANNUCCI, M., TADESSE, M.G., BROWN, P.J., DRAGONI, I., DAVIES, N., ROBERTS, T. C., CONTESTABILE, A., SALMON, N., BUCKLEY,

C. and FALCIANI, F. (2004). Bayesian variable selection in multinomial probit

models to identify molecular signatures of disease stage, Biometrics, 60, 812-819. KWON, D.W., TADESSE, M.G., SHA, N., PFEIFFER, R.M. and VANNUCCI,

M. (2007). Identifying biomarkers from mass spectrometry data with ordinal
utcome. Cancer Informatics, 3, 19–28.

SHA, N., TADESSE, M.G. and VANNUCCI, M. (2006). Bayesian variable selection for the analysis of microarray data with censored outcome. Bioinformatics, 22(18), 2262-2268. TADESSE, M.G., SHA, N., KIM, S. and VANNUCCI, M. (2006). Identification

f biomarkers in classification and clustering of high-throughput data. In

Bayesian Inference for Gene Expression and Proteomics, K. Anh-Do, P. Mueller and M. Vannucci (Eds). Cambridge University Press.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 40 / 40