[PPT] - Integration of Genetic and Integration of Genetic and Genomic PowerPoint Presentation

SLIDE 1

CAMDA 2006 1

Integration of Genetic and Integration of Genetic and Genomic Approaches for the Genomic Approaches for the Analysis of Chronic Fatigue Analysis of Chronic Fatigue Syndrome Implicates Syndrome Implicates Forkhead Forkhead Box N1 Box N1

Angela Presson, Jeanette Papp, Eric Sobel, and Steve Horvath Biostatistics and Human Genetics University of California, Los Angeles

SLIDE 2

CAMDA 2006 2

CAMDA 2006 Challenge CAMDA 2006 Challenge

DNA Level: ~ 50 Pre-selected SNP’s Organism Level: ~ 70 Clinical Traits

2. Relate SNP data to

Expression data

mRNA Level: ~ 20K genes/array

1. Relate Expression data

to Clinical Trait data

3. Integrate results to find CFS relevant genes.

SLIDE 3

CAMDA 2006 3

Analysis Overview Analysis Overview

1. 1. Construct gene co Construct gene co-

expression network

expression network from from microarray microarray data. (Zhang and Horvath 2005)

data. (Zhang and Horvath 2005)

2. 2. Identify module of interest Identify module of interest using trait data. using trait data. 3. 3. Determine informative Determine informative SNP’s SNP’s and relate them to and relate them to gene co gene co-

expression network.

expression network. 4. 4. Identify genes Identify genes with statistical and biological with statistical and biological significance. significance. 5. 5. Choose subset of CFS and control samples Choose subset of CFS and control samples for for validating the candidate biomarker. validating the candidate biomarker.

SLIDE 4

CAMDA 2006 4

Network = Adjacency Matrix Network = Adjacency Matrix

A network can be represented by an adjacency matrix, A=[aij], that encodes connection strength between a pair of genes.

Two genes have high connection strength if

they have similar expression patterns.

A is a symmetric matrix with entries in [0,1].
Two Network Models:
Unweighted: aij = 1 if two genes are adjacent

(connected) and 0 otherwise.

Weighted: each aij gives the connection strength

between gene pairs.

SLIDE 5

CAMDA 2006 5

Important Task in Important Task in Many Genomic Applications: Many Genomic Applications: Given a network (pathway) of Given a network (pathway) of interacting genes how to find the interacting genes how to find the central players? central players?

SLIDE 6

CAMDA 2006 6

Identifying Key Players of Interest Identifying Key Players of Interest

Imagine you wanted to recruit students to your science program. Popularity alone might suggest the head cheerleader or quarterback. Star Quarterback Head Cheerleader

SLIDE 7

CAMDA 2006 7

But, the head of the chess club But, the head of the chess club would probably be a better bet! would probably be a better bet!

Chess Club President Quarterback Cheerleader

SLIDE 8

CAMDA 2006 8

2. Chess Club, Sport Teams = “Modules”

Gene Module = cluster of highly connected

(similarly expressed) genes in a network.

Two Network Definitions Two Network Definitions

1. Number of friends = “Connectivity”
Gene connectivity = row sum of the adjacency

matrix, sum of genei’s connection strengths.

i ij j

k a =∑

SLIDE 9

CAMDA 2006 9

Gene connectivity vs. Module connectivity Gene connectivity vs. Module connectivity

Whole network connectivity • Intra-modular connectivity
Whole network

connectivity is largely driven by the size of the module containing the gene.

Connectivity within a

module is biologically & mathematically more meaningful than whole network connectivity.

SLIDE 10

CAMDA 2006 10

Analysis Overview Analysis Overview

1.

1. Construct gene co

Construct gene co-

expression network

expression network from from microarray microarray data.

data. (Zhang and Horvath 2005)

(Zhang and Horvath 2005)

2. 2. Identify module of interest Identify module of interest using trait data. using trait data. 3. 3. Determine informative Determine informative SNP’s SNP’s and relate them to and relate them to gene co gene co-

expression network.

expression network. 4. 4. Identify genes Identify genes with statistical and biological with statistical and biological significance. significance. 5. 5. Choose subset of CFS and control samples Choose subset of CFS and control samples for for validating the candidate biomarker. validating the candidate biomarker.

SLIDE 11

CAMDA 2006 11

Revisiting the Adjacency Matrix Revisiting the Adjacency Matrix

Once we found an appropriate β (according to methodology outlined in Zhang and Horvath 2005) we found that our network results were robust to small changes in β.

|Correlation| Adjacency

Step function (hard

thresholding) is indicated by the black, solid line.

Adjacency aij =

|cor(genei, genej)|β.

Power adjacency functions

(soft thresholding) are indicated by colored, dashed lines.

Connection Strength (Adjacency) vs. Correlation

SLIDE 12

CAMDA 2006 12

Four Modules Identified Using Four Modules Identified Using Hierarchical Clustering Hierarchical Clustering

Grey colors indicate genes outside of any module.
MDS plot indicates clear separation of brown, green, turquoise

modules.

Brown Red Turquoise Green

SLIDE 13

CAMDA 2006 13

Analysis Overview Analysis Overview

1. 1. Construct gene co Construct gene co-

expression network

expression network from from microarray microarray data.

data. (Zhang and Horvath 2005)

(Zhang and Horvath 2005)

2.

2. Identify module of interest

Identify module of interest using trait data. using trait data.

3. 3. Determine informative Determine informative SNP’s SNP’s and relate them to and relate them to gene co gene co-

expression network.

expression network. 4. 4. Identify genes Identify genes with statistical and biological with statistical and biological significance. significance. 5. 5. Choose subset of CFS and control samples Choose subset of CFS and control samples for for validating the candidate biomarker. validating the candidate biomarker.

SLIDE 14

CAMDA 2006 14

A clinical trait gives rise to a A clinical trait gives rise to a “Trait Significance” measure “Trait Significance” measure

TraitSignificance(i) = |cor(x(i), TRAIT)|

where x(i) is the gene expression profile of the ith gene. Module Trait Significance = Average(Trait Significance values for genes in a module).

SLIDE 15

CAMDA 2006 15

Trait Significance Results Trait Significance Results

Table shows average trait significance for each module.
Every module was characterized in terms of a group of clinical traits.
Interested in CFS severity trait “CLUSTER” because it contained the

information from 14 clinical traits (evaluation responses).

Focused on the green module (184 genes) since it was related to the

CLUSTER trait.

◄ ◄

SLIDE 16

CAMDA 2006 16

Analysis Overview Analysis Overview

1. 1. Construct gene co Construct gene co-

expression network

expression network from from microarray microarray data.

data. (Zhang and Horvath 2005)

(Zhang and Horvath 2005)

2. 2. Identify module of interest Identify module of interest using trait data. using trait data.

3.

3. Determine informative

Determine informative SNP’s SNP’s and relate and relate them to gene co them to gene co-

expression network.

expression network.

4. 4. Identify genes Identify genes with statistical and biological with statistical and biological significance. significance. 5. 5. Choose subset of CFS and control samples Choose subset of CFS and control samples for for validating the candidate biomarker. validating the candidate biomarker.

SLIDE 17

CAMDA 2006 17

Finding Finding SNPs SNPs associated associated with the CLUSTER trait with the CLUSTER trait

We chose two SNPs with highest CLUSTER correlation.

SNP12 = hCV245410 on 12q21 (p-value = 0.01)
SNP17 = hCV7911132 on 17q21 (p-value = 0.001)

1 2 3 4 5 6 7 8

SNP's Colored by Chromosome

Log(P-Value)

5q34 2p24 7p15 11p15 12q21 17q 22q11.1 X

SNP & Cluster Correlation P-Values hCV7911132, 17q21 hCV245410, 12q21

SLIDE 18

CAMDA 2006 18

(Where SNP data is additively coded).

Conceptually related to a LOD* score at the SNP marker for the

ith gene expression.

Why correlate SNP and gene expression data?
Puts SNP effect on the same footing as trait effect and gene-gene

connection strengths. Effect sizes are important in our analysis.

*LOD = “logarithmic odds”, a traditional measure of linkage between genetic loci.

SNPSignificance = |cor(x(i), SNP)|

Correlation with relevant Correlation with relevant SNPs SNPs defines defines SNP Significance of the SNP Significance of the i ith

th gene

gene

SLIDE 19

CAMDA 2006 19

SNP Filtering & Significance Results SNP Filtering & Significance Results

Table shows the average SNP significance for each module.
Green module genes most correlated with SNP12.
“SNP12 – Sub-sample” = average module correlations with SNP12

among samples that have a particular SNP12 and SNP17 genotype.

Higher correlation(green module,SNP12) in the sample subset.

Module SNP Significance (Standard Error) SNPs Turquoise Grey Red Brown Green SNP12 0.052 (0.002) 0.077 (0.001) 0.036 (0.004) 0.091 (0.004) 0.128 (0.004) SNP17 0.056 (0.002) 0.064 (0.001) 0.045 (0.005) 0.039 (0.003) 0.04 (0.002) SNP12 Sub-sample 0.128 (0.005) 0.144 (0.002) 0.067 (0.009) 0.203 (0.007) 0.186 (0.007)

SLIDE 20

CAMDA 2006 20

Analysis Overview Analysis Overview

1. 1. Construct gene co Construct gene co-

expression network

expression network from from microarray microarray data.

data. (Zhang and Horvath 2005)

(Zhang and Horvath 2005)

2. 2. Identify module of interest Identify module of interest using trait data. using trait data. 3. 3. Determine informative Determine informative SNP’s SNP’s and relate them to and relate them to gene co gene co-

expression network.

expression network.

4.

4. Identify genes

Identify genes with statistical and biological with statistical and biological significance. significance.

5. 5. Choose subset of CFS and control samples Choose subset of CFS and control samples for for validating the candidate biomarker. validating the candidate biomarker.

SLIDE 21

CAMDA 2006 21

Integration of genetic and network analysis Integration of genetic and network analysis

Combined Gene Selection Criteria:

1. CLUSTER trait significance > 0.2.
2. SNP12 significance > 0.2.
3. Genes with high intramodular connectivity (top 50%).

SLIDE 22

CAMDA 2006 22

Eight Most Significant Genes: Eight Most Significant Genes:

► ►

Source: NCBI (http://www.ncbi.nlm.nih.gov) NO 0.007 (0.24) 0.032 (-0.23) Unknown (protein for mgc:2780) BC004179 NO 0.002 (0.27) 0.007 (-0.29) Similar to polynucleotide phosphorylase-like protein and 3-5 RNA exonuclease. XM_067644 YES 0.015 (0.22) 0.007 (-0.29) 1p34.2 MED8 (mediator of RNA polymerase II transcription, subunit 8 homolog (yeast)): regulates transcription. BC010019 YES 0.013 (0.22) 0.012 (-0.27) 20p11.2 CRNKL1 (Crn, crooked neck-like 1 (Drosophila)): expressed in testes, involved in mRNA splicing AF111802 YES 0.012 (0.22) 0.05 (-0.21) 15q21.1 MYEF2 (myelin expression factor 2): myoblast cell differentiation and transcription. AF106685 YES 0.013 (0.22) 0.032 (-0.23) 6p21.1 PEX6 (peroxisomal biogenesis factor 6): absence results in zellweger syndrome (zws), neurological and metabolic defects. AB051077 YES 0.02 (0.21) 0.017 (-0.26) 10q25-q26 PRDX3 (peroxiredoxin 3): Regulates cell proliferation, differentiation, and antioxidant functions. AF118073 YES 0.018 (0.21) 0.055 (-0.21) 17q11-q12 FOXN1 (forkhead box N1): Functions in defense response, T-cell immunodeficiency, and known to cause nudity in mice and humans. Expressed in thymus. NM_003593 Biomarker SNP CLUSTER Locus Gene Symbol (Name) and Information Accession P-Value (Correlation)

SLIDE 23

CAMDA 2006 23

FOXN1 Statistical Significance: FOXN1 Statistical Significance:

Member of the green module that is related to the

CFS severity trait (CLUSTER).

High intramodular network connectivity.
Significantly associated with SNP 12 (p-value =

0.0179), which is significantly associated with CLUSTER (p-value = 0.010).

Moderate direct correlation with the CLUSTER

trait.

SLIDE 24

CAMDA 2006 24

FOXN1 Biological Significance FOXN1 Biological Significance

Mutations in mice & humans cause:
Nudity.
Depleted immune system due to dysfunctional T-cells.
Highly expressed in thymus epithelia cells.
Thymus involved in immune system:
Converts lymphocytes to T-cells.
Releases functional T-cells to combat infection.

(Nehls et al. 1994; Pignata et al., 1996; Adriani et al. 2004)

SLIDE 25

CAMDA 2006 25

Ingenuity Pathway Analysis Ingenuity Pathway Analysis

Cell Cycle Cellular Development Hair and Skin Development

SLIDE 26

CAMDA 2006 26

FOXN1: Validation for FOXN1: Validation for Chronic Fatigue Syndrome Chronic Fatigue Syndrome

Photo source: http://www.crj.co.jp/3membr/04kknai/image/2_6_3img.gif

CFS patients have an overactive immune system & high T-

cell production (Maher et al. 2005). ⇒ FOXN1 may be highly expressed in CFS.

But, how to further investigate this finding?

⇒ There is a FOXN1 knockout mouse available.

It would be relatively easy to explore the relationship

between FOXN1 and fatigue in a mouse model.

SLIDE 27

CAMDA 2006 27

Analysis Overview Analysis Overview

1. 1. Construct gene co Construct gene co-

expression network

expression network from from microarray microarray data.

data. (Zhang and Horvath 2005)

(Zhang and Horvath 2005)

2. 2. Identify module of interest Identify module of interest using trait data. using trait data. 3. 3. Determine informative Determine informative SNP’s SNP’s and relate them to and relate them to gene co gene co-

expression network.

expression network. 4. 4. Identify genes Identify genes with statistical and biological with statistical and biological significance. significance.

5. 5. Choose subset of CFS and control samples Choose subset of CFS and control samples for validating the candidate biomarker. for validating the candidate biomarker.

SLIDE 28

CAMDA 2006 28

Relationship between FOXN1 Relationship between FOXN1 and SNP12 & 17 genotypes and SNP12 & 17 genotypes

The two SNP’s most correlated with the CLUSTER

phenotype identify a sub-phenotype of CFS.

SNP data is additively coded as 0,1,2.
SNP rule:

SNP 12 SNP 17 + 2 1 + 2 We define a sample subgroup where all individuals have 0+2 or 1+2 genotypes.

About 1/3 of the samples satisfy the SNP rule.
For these samples FOXN1 is useful for predicting CFS

severity.

r

SLIDE 29

CAMDA 2006 29

“ “SNP Rule” Aids in Patient Selection SNP Rule” Aids in Patient Selection

Color Key

0.2 0.4 0.6 0.8 1

CLUSTER

FOXN1 PRDX3 PEX6 MYEF2 CRNKL1 MED8 UNK UNK SNP12 Turquoise Grey Red Brown

Green Brown Red Grey Turq SNP12 UNK UNK MED8 CRNKL1 MYEF2 PEX6 PRDX3 FOXN1

A. Correlations

Among All Patients

B. Correlations

Among Patients Satisfying SNP Rule

CLUSTER FOXN1 PRDX3 PEX6 MYEF2 CRNKL1 MED8 UNK UNK SNP12 Turquoise Grey Red Brown

SLIDE 30

CAMDA 2006 30

1 2 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 1 2 1 2 3 4 5 6 7 8

FOXN1 Expression Difference FOXN1 Expression Difference

FOXN1 expression difference is most pronounced in

SNP rule samples.

We selected 13 cases and 15 controls from the

samples satisfying the SNP rule.

All Samples SNP Rule Non SNP Rule Cases(74) Controls(41) Cases(16) Controls(17) Cases(58) Controls(24) ln(FOXN1 Expression)

SLIDE 31

CAMDA 2006 31

Summary Summary

DNA Level: SNP Data Organism Level: Clinical Traits

3b. Related SNP data

to Expression data ►SNP12

mRNA Level: Expression Data

2. Related Expression

data to CLUSTER trait ► GREEN

3a. Related SNP data

to CLUSTER Trait ►SNP12, SNP17

4. FOXN1 has statistical and biological significance.
5. Highest differential FOXN1 expression in subgroup that has a

particular SNP12 & 17 genotype.

1. Constructed gene co-expression network from the microarray data.

SLIDE 32

CAMDA 2006 32

Acknowledgements Acknowledgements

Main Mentor: Steve Horvath, Biostatistics & Human Genetics Group Members Eric Sobel, Jeanette Papp, Jake Lusis Mouse genetics Jake Lusis, Sud Doss, Anatole Ghazalpour Human/chimp brain Mike Oldham, Dan Geschwind

SLIDE 33

CAMDA 2006 33

References References

Adriani, M., Martinez-Mir, A., Fusco, F., Busiello, R., Frank, J.,

Telese, S., Matrecano, E., Ursini, M.V., Christiano, A.M., Pignata, C. (2004). Ann Hum Genet 68, 265–268.

Maher, K. J., Klimas, N. G., Fletcher, M. A. (2005) Clin Exp

Immunol 142, 505–511.

Nehls, M., Pfeifer, D., Schorpp, M., Hedrich, H., and Boehm, T.

(1994). Nature 372, 103–107.

Pignata, C., Fiore, M., Guzzetta, V., Castaldo, A., Sebastio, G.,

Porta, F., and Guarino, A. (1996). Am J Med Genet 65, 167– 170.

Zhang, B. and Horvath, S. (2005). Statistical Applications in

Genetics and Molecular Biology 4, 17.