CAMDA 2006 1
Integration of Genetic and Integration of Genetic and Genomic - - PowerPoint PPT Presentation
Integration of Genetic and Integration of Genetic and Genomic - - PowerPoint PPT Presentation
Integration of Genetic and Integration of Genetic and Genomic Approaches for the Genomic Approaches for the Analysis of Chronic Fatigue Analysis of Chronic Fatigue Syndrome Implicates Syndrome Implicates Forkhead Box N1 Box N1 Forkhead
CAMDA 2006 2
CAMDA 2006 Challenge CAMDA 2006 Challenge
DNA Level: ~ 50 Pre-selected SNP’s Organism Level: ~ 70 Clinical Traits
- 2. Relate SNP data to
Expression data
mRNA Level: ~ 20K genes/array
- 1. Relate Expression data
to Clinical Trait data
- 3. Integrate results to find CFS relevant genes.
CAMDA 2006 3
Analysis Overview Analysis Overview
1. 1. Construct gene co Construct gene co-
- expression network
expression network from from microarray microarray data. (Zhang and Horvath 2005)
- data. (Zhang and Horvath 2005)
2. 2. Identify module of interest Identify module of interest using trait data. using trait data. 3. 3. Determine informative Determine informative SNP’s SNP’s and relate them to and relate them to gene co gene co-
- expression network.
expression network. 4. 4. Identify genes Identify genes with statistical and biological with statistical and biological significance. significance. 5. 5. Choose subset of CFS and control samples Choose subset of CFS and control samples for for validating the candidate biomarker. validating the candidate biomarker.
CAMDA 2006 4
Network = Adjacency Matrix Network = Adjacency Matrix
A network can be represented by an adjacency matrix, A=[aij], that encodes connection strength between a pair of genes.
- Two genes have high connection strength if
they have similar expression patterns.
- A is a symmetric matrix with entries in [0,1].
- Two Network Models:
- Unweighted: aij = 1 if two genes are adjacent
(connected) and 0 otherwise.
- Weighted: each aij gives the connection strength
between gene pairs.
CAMDA 2006 5
Important Task in Important Task in Many Genomic Applications: Many Genomic Applications: Given a network (pathway) of Given a network (pathway) of interacting genes how to find the interacting genes how to find the central players? central players?
CAMDA 2006 6
Identifying Key Players of Interest Identifying Key Players of Interest
Imagine you wanted to recruit students to your science program. Popularity alone might suggest the head cheerleader or quarterback. Star Quarterback Head Cheerleader
CAMDA 2006 7
But, the head of the chess club But, the head of the chess club would probably be a better bet! would probably be a better bet!
Chess Club President Quarterback Cheerleader
CAMDA 2006 8
2. Chess Club, Sport Teams = “Modules”
- Gene Module = cluster of highly connected
(similarly expressed) genes in a network.
Two Network Definitions Two Network Definitions
- 1. Number of friends = “Connectivity”
- Gene connectivity = row sum of the adjacency
matrix, sum of genei’s connection strengths.
i ij j
k a =∑
CAMDA 2006 9
Gene connectivity vs. Module connectivity Gene connectivity vs. Module connectivity
- Whole network connectivity • Intra-modular connectivity
- Whole network
connectivity is largely driven by the size of the module containing the gene.
- Connectivity within a
module is biologically & mathematically more meaningful than whole network connectivity.
CAMDA 2006 10
Analysis Overview Analysis Overview
1.
- 1. Construct gene co
Construct gene co-
- expression network
expression network from from microarray microarray data.
- data. (Zhang and Horvath 2005)
(Zhang and Horvath 2005)
2. 2. Identify module of interest Identify module of interest using trait data. using trait data. 3. 3. Determine informative Determine informative SNP’s SNP’s and relate them to and relate them to gene co gene co-
- expression network.
expression network. 4. 4. Identify genes Identify genes with statistical and biological with statistical and biological significance. significance. 5. 5. Choose subset of CFS and control samples Choose subset of CFS and control samples for for validating the candidate biomarker. validating the candidate biomarker.
CAMDA 2006 11
Revisiting the Adjacency Matrix Revisiting the Adjacency Matrix
Once we found an appropriate β (according to methodology outlined in Zhang and Horvath 2005) we found that our network results were robust to small changes in β.
|Correlation| Adjacency
- Step function (hard
thresholding) is indicated by the black, solid line.
- Adjacency aij =
|cor(genei, genej)|β.
- Power adjacency functions
(soft thresholding) are indicated by colored, dashed lines.
Connection Strength (Adjacency) vs. Correlation
CAMDA 2006 12
Four Modules Identified Using Four Modules Identified Using Hierarchical Clustering Hierarchical Clustering
- Grey colors indicate genes outside of any module.
- MDS plot indicates clear separation of brown, green, turquoise
modules.
Brown Red Turquoise Green
CAMDA 2006 13
Analysis Overview Analysis Overview
1. 1. Construct gene co Construct gene co-
- expression network
expression network from from microarray microarray data.
- data. (Zhang and Horvath 2005)
(Zhang and Horvath 2005)
2.
- 2. Identify module of interest
Identify module of interest using trait data. using trait data.
3. 3. Determine informative Determine informative SNP’s SNP’s and relate them to and relate them to gene co gene co-
- expression network.
expression network. 4. 4. Identify genes Identify genes with statistical and biological with statistical and biological significance. significance. 5. 5. Choose subset of CFS and control samples Choose subset of CFS and control samples for for validating the candidate biomarker. validating the candidate biomarker.
CAMDA 2006 14
A clinical trait gives rise to a A clinical trait gives rise to a “Trait Significance” measure “Trait Significance” measure
TraitSignificance(i) = |cor(x(i), TRAIT)|
where x(i) is the gene expression profile of the ith gene. Module Trait Significance = Average(Trait Significance values for genes in a module).
CAMDA 2006 15
Trait Significance Results Trait Significance Results
- Table shows average trait significance for each module.
- Every module was characterized in terms of a group of clinical traits.
- Interested in CFS severity trait “CLUSTER” because it contained the
information from 14 clinical traits (evaluation responses).
- Focused on the green module (184 genes) since it was related to the
CLUSTER trait.
◄ ◄
CAMDA 2006 16
Analysis Overview Analysis Overview
1. 1. Construct gene co Construct gene co-
- expression network
expression network from from microarray microarray data.
- data. (Zhang and Horvath 2005)
(Zhang and Horvath 2005)
2. 2. Identify module of interest Identify module of interest using trait data. using trait data.
3.
- 3. Determine informative
Determine informative SNP’s SNP’s and relate and relate them to gene co them to gene co-
- expression network.
expression network.
4. 4. Identify genes Identify genes with statistical and biological with statistical and biological significance. significance. 5. 5. Choose subset of CFS and control samples Choose subset of CFS and control samples for for validating the candidate biomarker. validating the candidate biomarker.
CAMDA 2006 17
Finding Finding SNPs SNPs associated associated with the CLUSTER trait with the CLUSTER trait
We chose two SNPs with highest CLUSTER correlation.
- SNP12 = hCV245410 on 12q21 (p-value = 0.01)
- SNP17 = hCV7911132 on 17q21 (p-value = 0.001)
1 2 3 4 5 6 7 8
SNP's Colored by Chromosome
- Log(P-Value)
5q34 2p24 7p15 11p15 12q21 17q 22q11.1 X
SNP & Cluster Correlation P-Values hCV7911132, 17q21 hCV245410, 12q21
CAMDA 2006 18
(Where SNP data is additively coded).
- Conceptually related to a LOD* score at the SNP marker for the
ith gene expression.
- Why correlate SNP and gene expression data?
- Puts SNP effect on the same footing as trait effect and gene-gene
connection strengths. Effect sizes are important in our analysis.
*LOD = “logarithmic odds”, a traditional measure of linkage between genetic loci.
SNPSignificance = |cor(x(i), SNP)|
Correlation with relevant Correlation with relevant SNPs SNPs defines defines SNP Significance of the SNP Significance of the i ith
th gene
gene
CAMDA 2006 19
SNP Filtering & Significance Results SNP Filtering & Significance Results
- Table shows the average SNP significance for each module.
- Green module genes most correlated with SNP12.
- “SNP12 – Sub-sample” = average module correlations with SNP12
among samples that have a particular SNP12 and SNP17 genotype.
- Higher correlation(green module,SNP12) in the sample subset.
Module SNP Significance (Standard Error) SNPs Turquoise Grey Red Brown Green SNP12 0.052 (0.002) 0.077 (0.001) 0.036 (0.004) 0.091 (0.004) 0.128 (0.004) SNP17 0.056 (0.002) 0.064 (0.001) 0.045 (0.005) 0.039 (0.003) 0.04 (0.002) SNP12 Sub-sample 0.128 (0.005) 0.144 (0.002) 0.067 (0.009) 0.203 (0.007) 0.186 (0.007)
CAMDA 2006 20
Analysis Overview Analysis Overview
1. 1. Construct gene co Construct gene co-
- expression network
expression network from from microarray microarray data.
- data. (Zhang and Horvath 2005)
(Zhang and Horvath 2005)
2. 2. Identify module of interest Identify module of interest using trait data. using trait data. 3. 3. Determine informative Determine informative SNP’s SNP’s and relate them to and relate them to gene co gene co-
- expression network.
expression network.
4.
- 4. Identify genes
Identify genes with statistical and biological with statistical and biological significance. significance.
5. 5. Choose subset of CFS and control samples Choose subset of CFS and control samples for for validating the candidate biomarker. validating the candidate biomarker.
CAMDA 2006 21
Integration of genetic and network analysis Integration of genetic and network analysis
Combined Gene Selection Criteria:
- 1. CLUSTER trait significance > 0.2.
- 2. SNP12 significance > 0.2.
- 3. Genes with high intramodular connectivity (top 50%).
CAMDA 2006 22
Eight Most Significant Genes: Eight Most Significant Genes:
► ►
Source: NCBI (http://www.ncbi.nlm.nih.gov) NO 0.007 (0.24) 0.032 (-0.23) Unknown (protein for mgc:2780) BC004179 NO 0.002 (0.27) 0.007 (-0.29) Similar to polynucleotide phosphorylase-like protein and 3-5 RNA exonuclease. XM_067644 YES 0.015 (0.22) 0.007 (-0.29) 1p34.2 MED8 (mediator of RNA polymerase II transcription, subunit 8 homolog (yeast)): regulates transcription. BC010019 YES 0.013 (0.22) 0.012 (-0.27) 20p11.2 CRNKL1 (Crn, crooked neck-like 1 (Drosophila)): expressed in testes, involved in mRNA splicing AF111802 YES 0.012 (0.22) 0.05 (-0.21) 15q21.1 MYEF2 (myelin expression factor 2): myoblast cell differentiation and transcription. AF106685 YES 0.013 (0.22) 0.032 (-0.23) 6p21.1 PEX6 (peroxisomal biogenesis factor 6): absence results in zellweger syndrome (zws), neurological and metabolic defects. AB051077 YES 0.02 (0.21) 0.017 (-0.26) 10q25-q26 PRDX3 (peroxiredoxin 3): Regulates cell proliferation, differentiation, and antioxidant functions. AF118073 YES 0.018 (0.21) 0.055 (-0.21) 17q11-q12 FOXN1 (forkhead box N1): Functions in defense response, T-cell immunodeficiency, and known to cause nudity in mice and humans. Expressed in thymus. NM_003593 Biomarker SNP CLUSTER Locus Gene Symbol (Name) and Information Accession P-Value (Correlation)
CAMDA 2006 23
FOXN1 Statistical Significance: FOXN1 Statistical Significance:
- Member of the green module that is related to the
CFS severity trait (CLUSTER).
- High intramodular network connectivity.
- Significantly associated with SNP 12 (p-value =
0.0179), which is significantly associated with CLUSTER (p-value = 0.010).
- Moderate direct correlation with the CLUSTER
trait.
CAMDA 2006 24
FOXN1 Biological Significance FOXN1 Biological Significance
- Mutations in mice & humans cause:
- Nudity.
- Depleted immune system due to dysfunctional T-cells.
- Highly expressed in thymus epithelia cells.
- Thymus involved in immune system:
- Converts lymphocytes to T-cells.
- Releases functional T-cells to combat infection.
(Nehls et al. 1994; Pignata et al., 1996; Adriani et al. 2004)
CAMDA 2006 25
Ingenuity Pathway Analysis Ingenuity Pathway Analysis
Cell Cycle Cellular Development Hair and Skin Development
CAMDA 2006 26
FOXN1: Validation for FOXN1: Validation for Chronic Fatigue Syndrome Chronic Fatigue Syndrome
Photo source: http://www.crj.co.jp/3membr/04kknai/image/2_6_3img.gif
- CFS patients have an overactive immune system & high T-
cell production (Maher et al. 2005). ⇒ FOXN1 may be highly expressed in CFS.
- But, how to further investigate this finding?
⇒ There is a FOXN1 knockout mouse available.
- It would be relatively easy to explore the relationship
between FOXN1 and fatigue in a mouse model.
CAMDA 2006 27
Analysis Overview Analysis Overview
1. 1. Construct gene co Construct gene co-
- expression network
expression network from from microarray microarray data.
- data. (Zhang and Horvath 2005)
(Zhang and Horvath 2005)
2. 2. Identify module of interest Identify module of interest using trait data. using trait data. 3. 3. Determine informative Determine informative SNP’s SNP’s and relate them to and relate them to gene co gene co-
- expression network.
expression network. 4. 4. Identify genes Identify genes with statistical and biological with statistical and biological significance. significance.
5. 5. Choose subset of CFS and control samples Choose subset of CFS and control samples for validating the candidate biomarker. for validating the candidate biomarker.
CAMDA 2006 28
Relationship between FOXN1 Relationship between FOXN1 and SNP12 & 17 genotypes and SNP12 & 17 genotypes
- The two SNP’s most correlated with the CLUSTER
phenotype identify a sub-phenotype of CFS.
- SNP data is additively coded as 0,1,2.
- SNP rule:
SNP 12 SNP 17 + 2 1 + 2 We define a sample subgroup where all individuals have 0+2 or 1+2 genotypes.
- About 1/3 of the samples satisfy the SNP rule.
- For these samples FOXN1 is useful for predicting CFS
severity.
- r
CAMDA 2006 29
“ “SNP Rule” Aids in Patient Selection SNP Rule” Aids in Patient Selection
Color Key
0.2 0.4 0.6 0.8 1
CLUSTER
FOXN1 PRDX3 PEX6 MYEF2 CRNKL1 MED8 UNK UNK SNP12 Turquoise Grey Red Brown
Green Brown Red Grey Turq SNP12 UNK UNK MED8 CRNKL1 MYEF2 PEX6 PRDX3 FOXN1
- A. Correlations
Among All Patients
- B. Correlations
Among Patients Satisfying SNP Rule
CLUSTER FOXN1 PRDX3 PEX6 MYEF2 CRNKL1 MED8 UNK UNK SNP12 Turquoise Grey Red Brown
CAMDA 2006 30
1 2 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 1 2 1 2 3 4 5 6 7 8FOXN1 Expression Difference FOXN1 Expression Difference
- FOXN1 expression difference is most pronounced in
SNP rule samples.
- We selected 13 cases and 15 controls from the
samples satisfying the SNP rule.
All Samples SNP Rule Non SNP Rule Cases(74) Controls(41) Cases(16) Controls(17) Cases(58) Controls(24) ln(FOXN1 Expression)
CAMDA 2006 31
Summary Summary
DNA Level: SNP Data Organism Level: Clinical Traits
- 3b. Related SNP data
to Expression data ►SNP12
mRNA Level: Expression Data
- 2. Related Expression
data to CLUSTER trait ► GREEN
- 3a. Related SNP data
to CLUSTER Trait ►SNP12, SNP17
- 4. FOXN1 has statistical and biological significance.
- 5. Highest differential FOXN1 expression in subgroup that has a
particular SNP12 & 17 genotype.
- 1. Constructed gene co-expression network from the microarray data.
CAMDA 2006 32
Acknowledgements Acknowledgements
Main Mentor: Steve Horvath, Biostatistics & Human Genetics Group Members Eric Sobel, Jeanette Papp, Jake Lusis Mouse genetics Jake Lusis, Sud Doss, Anatole Ghazalpour Human/chimp brain Mike Oldham, Dan Geschwind
CAMDA 2006 33
References References
- Adriani, M., Martinez-Mir, A., Fusco, F., Busiello, R., Frank, J.,
Telese, S., Matrecano, E., Ursini, M.V., Christiano, A.M., Pignata, C. (2004). Ann Hum Genet 68, 265–268.
- Maher, K. J., Klimas, N. G., Fletcher, M. A. (2005) Clin Exp
Immunol 142, 505–511.
- Nehls, M., Pfeifer, D., Schorpp, M., Hedrich, H., and Boehm, T.
(1994). Nature 372, 103–107.
- Pignata, C., Fiore, M., Guzzetta, V., Castaldo, A., Sebastio, G.,
Porta, F., and Guarino, A. (1996). Am J Med Genet 65, 167– 170.
- Zhang, B. and Horvath, S. (2005). Statistical Applications in
Genetics and Molecular Biology 4, 17.