Genome-wide association studies
Fernando Rivadeneira MD PhD1,2
1Department of Internal Medicine
2Department of Epidemiology
SNPs and Diseases Molecular School of Medicine Monday, November 12th, 2018
Genome-wide association studies Fernando Rivadeneira MD PhD 1,2 1 - - PowerPoint PPT Presentation
Genome-wide association studies Fernando Rivadeneira MD PhD 1,2 1 Department of Internal Medicine 2 Department of Epidemiology SNPs and Diseases Molecular School of Medicine Monday, November 12 th , 2018 Topic outline - Rationale GWAS Approach
1Department of Internal Medicine
2Department of Epidemiology
SNPs and Diseases Molecular School of Medicine Monday, November 12th, 2018
G→A C → T G→ A
Region in LD
Present-day Ancestor
mutations fall on it determine patterns of haplotype structure
mutations on different branches will have lower and often low association
Tags: SNP 1 SNP 3 SNP 6 3 in total Test for association: SNP 1 SNP 3 SNP 6
A/T 1 G/A 2 G/C 3 T/C 4 G/C 5 A/C 6
high r2 high r2 high r2
After Carlson et al. (2004) AJHG 74:106
Tags: SNP 1 SNP 3 2 in total Test for association: SNP 1 captures 1+2 SNP 3 captures 3+5 “AG” haplotype captures SNP 4+6
A/T 1 G/A 2 G/C 3 T/C 4 G/C 5 A/C 6
tags in multi-marker test should be in high LD in order to avoid
Rivadeneira & Makitie TEM 2016
Probably real
(impossible to identify with current methods)
Few examples
Modified from McCarthy et al., Nat Genet Rev 2008
Hypothesis- free approach
Of 3,000,000,000 bases in human genome ~10,000,000 positions show variation ~4,000,000 catalogued as common variation ~2,200,000 in CEU ~80-90% are captured by typing 500K markers
Of 3,000,000,000 bases in human genome ~10,000,000 positions show variation ~4,000,000 catalogued as common variation ~2,200,000 in CEU ~80-90% are captured by typing 500K markers
*from Mark McCarthy
AA→ BB→ AB→ . . . AB→ SNP1 SNP2 SNP3 . . . SNP500,000 AA AB BB AA BB AB
AA→ BB→ AB→ . . . AB→ SNP1 SNP2 SNP3 . . . SNP500,000 AA AB BB AA BB AB
AA→ BB→ AB→ . . . AB→ SNP1 SNP2 SNP3 . . . SNP500,000 1 2 3 4 5 6 7 8 14 18 X
10 12 AA AB BB AA BB AB
AA→ BB→ AB→ . . . AB→ SNP1 SNP2 SNP3 . . . SNP500,000 1 2 3 4 5 6 7 8 14 18 X 10 12 AA AB BB AA BB AB
p<0.05 threshold results in ~20,000 hypotheses Follow-up Set
Meta-analysis full datasets
Population stratification
MAF> 1% GT SNPs: 512,849 RS-I Call Rate > 98% 466,389 RS-II pHWE > 1x10-6
514,073 RS-III
Imputed SNPs: 2,543,887 Sample call rate < 98% Missing DNA Gender mismatch Excess autosomal heterozigocity Duplicates or family relations IBS>97% Ethnic outliers (IBS distances > 4SD) Missing traits
24
pedigrees, trios, sibs)
proband families)
GEnetic Factors of OSteoporosis
GENETIC INVESTIGATIONS OF ANTHROPOMETRIC TRAITS
=> To avoid multiple testing problems the first genetic analyses are usually run using additive models which preserve power across different scenarios
Traits: Disease state or QT in natural units QT-> Standardized age-adjusted residuals from gender- stratified regression Trait = α + βAge + βAge2 Imputation: MACH, IMPUTE, BIM-BAM, PLINK r2>0.3, ratio Obs/Exp variance > 0.01, MAF > 0.01, HWE? Minor allele from HapMap CEU (+) strand => Reference Analysis: Performed by each cohort: MACH2QTL/BIN, SNPTEST, ProbABEL, PLINK Adjustment population stratification => Genomic control λ < 1.05, corrected SE = SE * √ λ Meta-analysis: METAL, PLINK, MetABEL: inverse variance weighted standard: fixed effects Heterogeneity: random effects for variants with I2 > 50 Significance: GWS α < 5 x 10-8 after double GC correction
Traits: Disease state or QT in natural units QT-> Standardized age-adjusted residuals from gender- stratified regression Trait = α + βAge + βAge2 Imputation: MACH, IMPUTE, BIM-BAM, PLINK r2>0.3, ratio Obs/Exp variance > 0.01, MAF > 0.01, HWE? Minor allele from HapMap CEU (+) strand => Reference Analysis: Performed by each cohort: MACH2QTL/BIN, SNPTEST, ProbABEL, PLINK Adjustment population stratification => Genomic control λ < 1.05, corrected SE = SE * √ λ Meta-analysis: METAL, PLINK, MetABEL: inverse variance weighted standard: fixed effects Heterogeneity: random effects for variants with I2 > 50 Significance: GWS α < 5 x 10-8 after double GC correction
Population Adm ixture
Population Stratification
SPURIOUS ASSOCIATION OF TRAIT WITH GENETIC ANCESTRY MARKERS
In Rotterdam Study datasets... managed with exclusion of ~2-5% of population. In Generation R Study ~50% of participants are of non-Northern European ancestry Early deviations denote spurious results… all genome associated Correction for 4-20 PC does the trick to correct for stratification
True association
Expected Observed
RED HAIR COLOR –Generation R
Observed Expected
High stratification
37
Traits: Disease state or QT in natural units QT-> Standardized age-adjusted residuals from gender- stratified regression Trait = α + βAge + βAge2 Imputation: MACH, IMPUTE, BIM-BAM, PLINK r2>0.3, ratio Obs/Exp variance > 0.01, MAF > 0.01, HWE? Minor allele from HapMap CEU (+) strand => Reference Analysis: Performed by each cohort: MACH2QTL/BIN, SNPTEST, ProbABEL, PLINK Adjustment population stratification => Genomic control λ < 1.05, corrected SE = SE * √ λ Meta-analysis: METAL, PLINK, MetABEL: inverse variance weighted standard: fixed effects Heterogeneity: random effects for variants with I2 > 50 Significance: GWS α < 5 x 10-8 after double GC correction
H0: No Association HA: Association
Reject H0 Association
Accept H0 No association
FIXED FACTORS MODIFIABLE FACTORS
H0: No Association HA: Association
Reject H0 Association
Accept H0 No association
FIXED FACTORS MODIFIABLE FACTORS
Samples size needs to be increased by factor 1/r2
H0: No Association HA: Association
Reject H0 Association
Accept H0 No association
FIXED FACTORS MODIFIABLE FACTORS
Traits: Disease state or QT in natural units QT-> Standardized age-adjusted residuals from gender- stratified regression Trait = α + βAge + βAge2 Imputation: MACH, IMPUTE, BIM-BAM, PLINK r2>0.3, ratio Obs/Exp variance > 0.01, MAF > 0.01, HWE? Minor allele from HapMap CEU (+) strand => Reference Analysis: Performed by each cohort: MACH2QTL/BIN, SNPTEST, ProbABEL, PLINK Adjustment population stratification => Genomic control λ < 1.05, corrected SE = SE * √ λ Meta-analysis: METAL, PLINK, MetABEL: inverse variance weighted standard: fixed effects Heterogeneity: random effects for variants with I2 > 50 Significance: GWS α < 5 x 10-8 after double GC correction
Phased approach “Genotyping” Of 3,000,000,000 bases in human genome ~10,000,000 positions show variation ~4,000,000 catalogued as common variation ~2,200,000 in CEU ~80-90% are captured by typing 500K markers
AA→ BB→ AB→ . . . AB→ SNP1 SNP2 SNP3 . . . SNP500,000 AA AB BB AA BB AB
rare AND common “Sequencing” Joint meta-analysis “Imputation”
H0: No Association HA: Association
Reject H0 Association
Accept H0 No association
FIXED FACTORS MODIFIABLE FACTORS
H0: No Association HA: Association
Reject H0 Association
Accept H0 No association
FIXED FACTORS MODIFIABLE FACTORS
50
H0: No Association HA: Association
Reject H0 Association
Accept H0 No association
FIXED FACTORS MODIFIABLE FACTORS
pedigrees, trios, sibs)
proband families)
http://www.nealelab.is/blog/2017/7/19/rapid-gwas-of-thousands-of- phenotypes-for-337000-samples-in-the-uk-biobank https://data.broadinstitute.org/alkesgroup/UKBB/) https://biobankengine.stanford.edu
LD-Hub Pathway Analysis Animal models FineMap
Diverse approaches to follow-up GWAS findings:
Genetic correlations with other traits
Gene Prioritization and biological relevance of the variants
FINEMAP: efficient variable selection using summary data from genome-wide association studies Requirements
variants (beta effect /SE)
between the variants.
ENCODE ANALYSIS
MCF-7 CTCF ChIA-PET
Gene Prioritization and biological relevance
Animal models: The mouse Phenotype Consortium
http://www.mousephenotype.org/
(through imputation) has been and will continue being successful
approach (control for stratification and other biases)
highest level of evidence for true associations
continue being the favored approach
to the biological relevance of the identified genes
evident revealing new biology and many translational opportunities