[PPT] - Prediction of genetic Values using Neural Networks Paulino Perez 1 PowerPoint Presentation

SLIDE 1

Prediction of genetic Values using Neural Networks

Paulino Perez 1 Daniel Gianola 2 Jose Crossa 1

1CIMMyT-Mexico 2University of Wisconsin, Madison.

September, 2014

SLU,Sweden Prediction of genetic Values using Neural Networks 1/26

SLIDE 2

Introduction

High density marker panels enable genomic selection (GS). Marker based models performs better than pedigree based models (e.g. de los Campos et al., 2009). Most research done with linear additive models (see eq. 1). It might be possible to increase accuracy using non-linear models with dominance and additive effects. yi =

p

j=1

xijβj + ei (1)

SLU,Sweden Prediction of genetic Values using Neural Networks 3/26

SLIDE 4

Introduction

Continued...

Recent studies with non-additive effects:

SLU,Sweden Prediction of genetic Values using Neural Networks 4/26

SLIDE 5

Introduction

Continued...

SLU,Sweden Prediction of genetic Values using Neural Networks 5/26

SLIDE 6

Non linear models and NN

Non linear models and neural networks

yi = µ + f(xi) + ei (2) Any non linear function can be exactly represented as (Kolmogorov’s theorem): f(xi) = f(xi1, ..., xip) =

2p+1

q=1

g p

r=1

λrhq(xir)

(3)

In Neural Networks (NN) non-linear functions are “approximated” as sums of finite series of smooth functions. Most basic and well known NN is the Single Hidden Layer Feed Forward Neural Network (SHLNN).

SLU,Sweden Prediction of genetic Values using Neural Networks 6/26

SLIDE 7

Non linear models and NN

Continued...

Figure 1: Graphical representation of a SHLNN.

SLU,Sweden Prediction of genetic Values using Neural Networks 7/26

SLIDE 8

Non linear models and NN

Continued...

Figure 2: Inputs (e.g. Markers) and output (phenotype) for a SHLNN.

SLU,Sweden Prediction of genetic Values using Neural Networks 8/26

SLIDE 9

Non linear models and NN

Continued...

Prediction has two (automated) steps: Inputs transformed non-linearly in the hidden layer. Outputs from hidden layer combined to obtain predictions. yi = µ +

Combine output from hidden layer

S
k=1

wk gk  bk +

p

j=1

xijβ[k]

j

 

utput from hidden layer

+ei gk(·) is the activation (transformation) function.

SLU,Sweden Prediction of genetic Values using Neural Networks 9/26

SLIDE 10

Model fitting

Parameters to be estimated in a NN are the weights (w1, ..., wS) , biases (b1, ..., bS), connection strengths (β[1]

1 , ...., β[1] p ; ..., β[S] 1 , ...., β[S] p ), µ and σ2 e.

When number of predictors (p) and of neurons (S) increase, the number

f parameters to estimate grows quickly.

= ⇒ Can cause over-fitting. To prevent over fitting use penalized methods, via Bayesian approaches.

SLU,Sweden Prediction of genetic Values using Neural Networks 10/26

SLIDE 11

Model fitting Empirical Bayes

Empirical Bayes

McKay (1995) developed Empirical Bayes approach framework for estimating parameters in a NN. Let θ = (w1, ..., wS, b1, ..., bS, β[1]

1 , ...., β[1] p ; ..., β[S] 1 , ...., β[S] p , µ)′

p(θ|σ2

θ) = MN(0, σ2 θI)

Estimation requires two steps,

1) Obtain conditional posterior modes of the elements in θ assuming σ2

θ, σ2 e

known. These are obtained by maximizing,

p(θ|y, σ2

θ, σ2 e) = p(y|θ, σ2 e)p(θ|σ2 θ)

p(y|σ2

θ, σ2 e)

= p(y|θ, σ2

e)p(θ|σ2 θ)

Rm p(y|θ, σ2

e)p(θ|σ2 θ)dθ

which is equivalent to minimizing the “augmented” sum of squares: F(θ) = 1 2σ2

e n

i=1

ei + 1 2σ2

θ m

j=1

θ2

j

(4)

SLU,Sweden Prediction of genetic Values using Neural Networks 12/26

SLIDE 13

Model fitting Empirical Bayes

Continued...

2) Update σ2

θ, σ2 e by maximizing marginal likelihood of the data p(y|σ2 θ, σ2 e).

The marginal log-likelihood aproximated as: log p(y|σ2

θ, σ2 e) ≈ k + n

2 log β + m 2 log α − 1 2 log |Σ|θ=θmap − F(θ)|θ=θmap where Σ =

∂2 ∂θθ′ F(θ).

It can be shown that this function is maximized when: α = γ 2 m

j=1 θ2 j

, β = n − γ n

i=1 e2 i

, γ = m − 2αTrace(Σ−1) Iterate between 1 and 2 until convergence. NOTE: SIMILAR TO USING BLUP AND ML IN GAUSSIAN LINEAR MODELS.

SLU,Sweden Prediction of genetic Values using Neural Networks 13/26

SLIDE 14

Model fitting Empirical Bayes

Problems with the approach

Huge number of parameters to estimate, m = 1 + S × (1 + 1 + p) where S is the number of neurons and p is the number of covariates. Gauss-Newton algorithm used to minimize (4) requires solving linear systems of order m × m, complexity O(m3). Updating formulas for the variance components requires inverting a matrix of order m × m, complexity O(m3). Alternatives: Derivative free algorithms (may have poor performance, unstable). Parallel computing.

SLU,Sweden Prediction of genetic Values using Neural Networks 14/26

SLIDE 15

Model fitting Empirical Bayes

brnn

We developed an R package (brnn) that implements the Empirical Bayes approach to fiting a NN. It will be available in a few months in the R-mirrors.

Figure 3: Help page for the trainbr package.

SLU,Sweden Prediction of genetic Values using Neural Networks 15/26

SLIDE 16

Case study: Wheat

Case study: additive genetic effects (wheat)

Prediction of Grain yield (GY) and Days to heading (DTH) in wheat lines, 306 wheat lines from Global Wheat Program of CIMMyT. 1,717 binary markers (DArT). Two traits analyzed:

1

GY (5 Environments).

2

DTH (10 Environments).

Bayesian regularized neural networks fitted by using the MCMC approach. Predictive ability of BRNN compared against standard models by generating 50 random partitions with 90% of observations in training and 10% in testing.

SLU,Sweden Prediction of genetic Values using Neural Networks 16/26

SLIDE 17

Case study: Wheat

Continued...

Table 1: Correlations between observed and predicted phenotypes for DTH and GY (“winner” underlined).

NOTE: Non-parametric methods better in 15/15 comparisons.

SLU,Sweden Prediction of genetic Values using Neural Networks 17/26

SLIDE 18

Case study: Wheat

Continued...

Figure 4: Plot of the correlation for each of 50 partitions and 10 environments for days to heading (DTH) in different combination of models.

SLU,Sweden Prediction of genetic Values using Neural Networks 18/26

SLIDE 19

Application examples

Toy examples

#Example 1 #Noise triangle wave function, similar to example 1 in Foresee and Hagan (1997) #Generating the data x1=seq(0,0.23,length.out=25) y1=4*x1+rnorm(25,sd=0.1) x2=seq(0.25,0.75,length.out=50) y2=2-4*x2+rnorm(50,sd=0.1) x3=seq(0.77,1,length.out=25) y3=4*x3-4+rnorm(25,sd=0.1) x=c(x1,x2,x3) y=c(y1,y2,y3) X=as.matrix(x) neurons=2

ut=brnn(y,X,neurons=neurons)

cat("Message: ",out$reason,"\n") plot(x,y,xlim=c(0,1),ylim=c(-1.5,1.5), main="Bayesian Regularization for ANN 1-2-1")

Note:

1

Type library(brnn) and then demo(’Example_1’) to run this example in the R console.

SLU,Sweden Prediction of genetic Values using Neural Networks 19/26

SLIDE 20

Application examples

Continued...

0.0

0.2 0.4 0.6 0.8 1.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 x y Matlab R

SLU,Sweden Prediction of genetic Values using Neural Networks 20/26

SLIDE 21

Application examples

Continued...

#2 Inputs and 1 output #the data used in Paciorek and #Schervish (2004). The data is from a two input one output function with Gaussian noise #with mean zero and standard deviation 0.25. data(twoinput) X=normalize(as.matrix(twoinput[,1:2])) y=as.vector(twoinput[,3]) neurons=10

ut=brnn(y,X,neurons=neurons)

cat("Message: ",out$reason,"\n") f=function(x1,x2,theta,neurons) predictions.nn(X=cbind(x1,x2),theta,neurons) x1=seq(min(X[,1]),max(X[,1]),length.out=50) x2=seq(min(X[,1]),max(X[,1]),length.out=50) z=outer(x1,x2,f,theta=out$theta,neurons=neurons) # calculating the density values transformation_matrix=persp(x1, x2, z, main="Fitted model", sub=expression(y==italic(g)~(bold(x))+e), col="lightgreen",theta=30, phi=20,r=50, d=0.1, expand=0.5,ltheta=90, lphi=180, shade=0.75, ticktype="detailed",nticks=5) points(trans3d(X[,1],X[,2], f(X[,1],X[,2], theta=out$theta,neurons=neurons), transformation_matrix), col = "red")

SLU,Sweden Prediction of genetic Values using Neural Networks 21/26

SLIDE 22

Application examples

Continued...

x1 −1.0 −0.5 0.0 0.5 x2 −1.0 −0.5 0.0 0.5 z −2 2 y = g (x) + e

•
•
SLU,Sweden

Prediction of genetic Values using Neural Networks 22/26

SLIDE 23

Application examples

Application for the wheat dataset

Warning: This analysis can take a while,... We are selected only some

markers. You can select markers based on p-values for example or try to

reduce the dimensionality of your problem using G matrix as input or principal scores.

rm(list=ls()) setwd("/tmp") library(brnn) library(BLR) #Load the wheat dataset data(wheat) #Normalize inputs y=normalize(Y[,1]) X=normalize(X) p=300 #Fit the model with the FULL DATA, but some markers, #You can select the markers based on p-values for example

ut=brnn(y=y,X=X[,1:p],neurons=2)

cat("Message: ",out$reason,"\n") #Obtain predictions yhat_R=predictions.nn(X[,1:p],out$theta,neurons=2) plot(y,yhat_R)

SLU,Sweden Prediction of genetic Values using Neural Networks 23/26

SLIDE 24

Application examples

Continued...

●
●
●
●
●
●
●
−1.0

−0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 y yhat_R

Notes: The function predictions.nn

btains ˆ
y. This function takes as

arguments the vector of estimated parameters and the number of neurons. The vector of estimated parameters can be obtained using the function brnn. The brnn software works faster in the R version developed by Revolution Analytics in Linux environments.

SLU,Sweden Prediction of genetic Values using Neural Networks 24/26

SLIDE 25

Application examples

References

de los Campos G., H. Naya, D. Gianola, J. Crossa, A. Legarra, E. Manfredi, K. Weigel and J. Cotes. 2009. Predicting Quantitative Traits with Regression Models for Dense Molecular Markers and Pedigree, Genetics 182: 375-385. Foresee, F . D., and M. T. Hagan. 1997. Gauss-Newton approximation to Bayesian regularization, Proceedings of the 1997 International Joint Conference on Neural Networks. Gianola D., Fernando R, Stella A. 2006. Genomic-assisted prediction of genetic values with semi-parametric procedures, Genetics 173:1761-1776. Gianola D, van Kamm JBCHM. 2008. Reproducing kernel Hilbert space regression methods for genomic-assisted prediction of quantitative traits. Genetics 178: 2289-2303.

SLU,Sweden Prediction of genetic Values using Neural Networks 25/26

SLIDE 26

Application examples

Continued...

Gianola, D. Okut, H., Weigel, K. and Rosa, G. 2011. Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat. BMC Genetics. MacKay, D.1995. Probable Networks and plausible predictions - a review of practical Bayesian methods, Network: Computation in Neural Systems.

SLU,Sweden Prediction of genetic Values using Neural Networks 26/26

Prediction of genetic Values using Neural Networks

Paulino Perez 1 Daniel Gianola 2 Jose Crossa 1

September, 2014

Contents

Introduction

Non linear models and NN

Model fitting

Case study: Wheat

Application examples

Introduction

xijβj + ei (1)

Continued...

Recent studies with non-additive effects:

Continued...

Non linear models and neural networks

yi = µ + f(xi) + ei (2) Any non linear function can be exactly represented as (Kolmogorov’s theorem): f(xi) = f(xi1, ..., xip) =

g p

λrhq(xir)

In Neural Networks (NN) non-linear functions are “approximated” as sums of finite series of smooth functions. Most basic and well known NN is the Single Hidden Layer Feed Forward Neural Network (SHLNN).

Continued...

Figure 1: Graphical representation of a SHLNN.

Continued...

Figure 2: Inputs (e.g. Markers) and output (phenotype) for a SHLNN.

Continued...

Prediction has two (automated) steps: Inputs transformed non-linearly in the hidden layer. Outputs from hidden layer combined to obtain predictions. yi = µ +

wk gk  bk +

xijβ[k]

 

+ei gk(·) is the activation (transformation) function.

Model fitting

Parameters to be estimated in a NN are the weights (w1, ..., wS) , biases (b1, ..., bS), connection strengths (β[1]

When number of predictors (p) and of neurons (S) increase, the number

= ⇒ Can cause over-fitting. To prevent over fitting use penalized methods, via Bayesian approaches.

Contents

Introduction

Non linear models and NN

Model fitting Empirical Bayes

Case study: Wheat

Application examples

Empirical Bayes

McKay (1995) developed Empirical Bayes approach framework for estimating parameters in a NN. Let θ = (w1, ..., wS, b1, ..., bS, β[1]

p(θ|σ2

Estimation requires two steps,

1) Obtain conditional posterior modes of the elements in θ assuming σ2

p(θ|y, σ2

p(y|σ2

= p(y|θ, σ2

which is equivalent to minimizing the “augmented” sum of squares: F(θ) = 1 2σ2

ei + 1 2σ2

θ2

(4)

Continued...

2) Update σ2

The marginal log-likelihood aproximated as: log p(y|σ2

2 log β + m 2 log α − 1 2 log |Σ|θ=θmap − F(θ)|θ=θmap where Σ =

It can be shown that this function is maximized when: α = γ 2 m

, β = n − γ n

, γ = m − 2αTrace(Σ−1) Iterate between 1 and 2 until convergence. NOTE: SIMILAR TO USING BLUP AND ML IN GAUSSIAN LINEAR MODELS.

Problems with the approach

brnn

We developed an R package (brnn) that implements the Empirical Bayes approach to fiting a NN. It will be available in a few months in the R-mirrors.

Figure 3: Help page for the trainbr package.

Case study: additive genetic effects (wheat)

Prediction of Grain yield (GY) and Days to heading (DTH) in wheat lines, 306 wheat lines from Global Wheat Program of CIMMyT. 1,717 binary markers (DArT). Two traits analyzed:

GY (5 Environments).

DTH (10 Environments).

Bayesian regularized neural networks fitted by using the MCMC approach. Predictive ability of BRNN compared against standard models by generating 50 random partitions with 90% of observations in training and 10% in testing.

Continued...

Table 1: Correlations between observed and predicted phenotypes for DTH and GY (“winner” underlined).

NOTE: Non-parametric methods better in 15/15 comparisons.

Continued...

Figure 4: Plot of the correlation for each of 50 partitions and 10 environments for days to heading (DTH) in different combination of models.

Toy examples

Note:

Type library(brnn) and then demo(’Example_1’) to run this example in the R console.

Continued...

0.2 0.4 0.6 0.8 1.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 x y Matlab R

Continued...

Continued...

x1 −1.0 −0.5 0.0 0.5 x2 −1.0 −0.5 0.0 0.5 z −2 2 y = g (x) + e