[PPT] - ANALYSIS OF MULTIPLE RELATED PHENOTYPES IN GENOME-WIDE ASSOCIATION PowerPoint Presentation

SLIDE 1

ANALYSIS OF MULTIPLE RELATED PHENOTYPES IN GENOME-WIDE ASSOCIATION STUDIES

Taesung Park1

Sohee Oh1, Iksoo Huh1, and Seung-Yeoun Lee2

1Department of Statistics, Seoul National University, South Korea 2Department of Applied Statistics, Sejong Univeristy, South Korea

1

GIW 2016

SLIDE 2

Genome Wide Association Studies (GWAS)

Studies of genetic variation across

the entire genome

Single Nucleotide Polymorphism

(SNP)

DNA sequence variations that

ccur when a single nucleotide is

altered

Designed to identify associations

between genetic markers &

bservable traits,
r the presence/absence
f a disease or condition

Rely on SNP chip technologies

SLIDE 4

Genome Wide Association Studies (GWAS)

Successful in complex traits and diseases

height, body mass index, blood pressure
asthma, cancer, diabetes, heart disease

and mental illnesses

SLIDE 5

Association test

Univariate and single SNP analysis

Focus on one trait and single SNP

ne trait

SNP K SNP J SNP I SNP 1 SNP 2

… …

Trait 1

SNP 1M

) , ( ~ ,

2 3 2 1

σ ε ε β β β β N SNP Age Sex y

i i i i i

+ + + + =

SLIDE 6

Improving power

Common complex traits are related with many genes Not easy to identify genetic variants with high significance at

α=5×10-8

Further, these variants explain only small fraction of disease

etiology

Need to develop a more powerful method for identifying genetic

variants

Meta analysis by increasing sample size Multiple SNP analysis: gene-gene interaction Joint analysis with the correlated phenotypes

SLIDE 7

Univariate + multiple SNP analysis

Focus on one trait and multiple SNPs

SNP K

… …

SNP-SNP Interaction

SNP J SNP I SNP 1 SNP 2

accumulated additive effects on multiple SNPs

SNP 500K

Association test

ne trait

Trait 1

SLIDE 8

Multivariate approach

Multivariate analysis

Focus on multiple related traits and single SNP

SNP K SNP J SNP I SNP 1 SNP 2

… …

Trait 1 Trait 2 Trait 3 Trait 4 Trait 5

Related traits

SNP 1M

SLIDE 9

Multivariate approach

Examples: multiple related phenotypes

Obesity

BMI, Waist circumference, Weight, WHR, Body Fat

Hyperlipidemia

Total cholesterol, HDL/LDL cholesterol, Triglyceride

Metabolic Syndrome

Waist circumference, triglyceride, HDL cholesterol,

blood pressure (SBP, DBP), Insulin resistance

SLIDE 10

Multivariate approach

Existing Methods

1)

MultiPhen (O’Reilly et al., 2012)

Proportional odds model

2)

Efficient algorithm for GWAS (Zhou and Stephens, 2014)

Linear mixed model

SLIDE 11

Multivariate approach

Identify genetic variants associated with multiple related traits

Extension of the univariate linear model to the multivariate linear

model with a response vector

Univariate variances are replaced by a covariance matrix

Joint analysis

Analyze several traits simultaneously Account for correlation structure of multiple traits in the model Allows different slopes(SNP effects) model for each trait

Different association direction => Hetrogeneous model

Common slope(SNP effect) model

Same association direction with similar effect sizes => Homogeneous model

SLIDE 13

Multivariate general linear model (1)

Let yij denote the value of trait j from subject i, for i=1,…,n, j=1,…,m The linear model for the trait j

is a vector of SNPs and covariates

is a vector of p unknown parameters

represents the effect of the kth SNP on the trait j

This models allows one SNP to have different effects on the traits

ij T i ij p k kj ik ij

j

x x y ε β ε β + = + =∑

=1

T pj j j

) ,..., (

1

β β β =

T ip i i

x x x ) ,..., (

1

=

kj

β

SLIDE 14

The multivariate general linear model (2)

is a vector of m responses from the ith subject

is a vector of m residuals for the ith subject

The vector

where In denotes the n×n identity matrix and the operator is the direct (Kronecker) product

T im i i

) ,..., (

1

ε ε ε =

( )

∑ , ~

m m i

N ε

1 × nm

( )

Σ ⊗ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ =

n nm nm n

I N , ~

1

ε ε ε !

⊗

1

( ,..., )T

i i im

y y y =

SLIDE 15

Multivariate general linear model (3)

Covariance (correlation) structure: matrix

Specify how the traits within a subject are related

Unstructured (UN)

Sturcutred covariane Compound Symmetry (CS) First-order autoregressive (AR(1))

⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛

2 2 1 2 2 2 12 1 12 2 1 m m m m m

σ σ σ σ σ σ σ σ σ ! " # " " ! !

⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛ + + +

2 2 1 2 1 2 1 2 1 2 2 1 2 1 2 1 2 1 2 2 1

σ σ σ σ σ σ σ σ σ σ σ σ ! " # " " ! ! ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛

− − − − 2 2 2 2 1 2 2 2 2 2 1 2 2

σ σ ρ σ ρ σ ρ σ ρσ σ ρ ρσ σ ! " # " " ! !

m m m m

m m ×

SLIDE 16

The multivariate general linear model (4)

Matrix formulation

( )

errors random

f

matrix matrix parameter matrix design known matrix data

1 1 1 11 1 1 1 11 1 1 1 11 1 1 1 11

m n E m p B p n x x x x x x X m n y y y y y y Y

T n T nm n m m pm p m T n T np n p T n T nm n m

× ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ = ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ = ⋅ × = ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ = ⋅ × ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ = ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ = ⋅ × ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ = ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ = ⋅ ε ε ε ε ε ε β β β β β β ! " " " # " " " " " # " ! " " " # " ! " " " # "

Σ ⊗ = ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ =

n n

I y y Var XB Y E !

1

and ) ( where

, E XB Y + =

SLIDE 17

Consider related-phenotypes simultaneously Allow for correlation between phenotypes in the model Detect genetic variants which have modest effects in univariate approach Provide some chances to capture pleiotropic genes

Model

Hetro model with separate slopes (different genetic effects on each phenotype) Homo model with common slope (same genetic effects on all phenotypes) Unstructured variance-covariance structure

Test statistics

Wilk’s Λ statistic

∏

=

+ = + = Λ

k i i

E H E

1 1

1 | | | | λ

The multivariate general linear model (5)

SLIDE 18

Korea Association Resoure (KARE) Project

To identify genetic factors of

quantitative clinical traits and life-style related diseases (eg. T2DM) from Genome-Wide Association Study using population-based cohorts

Objective

Over 10,000 subjects from two

community-based cohorts in Korea (Ansung & Ansan cohorts)

Affymetrix 5.0

Genotyping

First high density large scale GWA Study performed in the East Asian population

Courtesy of KNIH

SLIDE 20

KARE: Characteristics

Baseline study Ansung Ansan Participants 5,018 5,020 Sex (women/men) 2,778/ 2,240 2,497/ 2,523 Age (mean) 55.5 49.1 40th (%) 31.2 62.8 50th (%) 29.1 23.0 60> (%) 39.6 14.3

Courtesy of KNIH

KARE

SLIDE 21

KARE data

Data Description

8,842 subjects from two community-based cohorts in Korea

(Ansung& Ansan cohorts)

Filtering Threshold

HWE < 10-6 MAF < 0.01 Missing Proportion in each genotype > 0.05 Missing imputation: HapMap JPT/CHB reference panel

SNPs: 327,872

SLIDE 22

Obesity

Obesity related phenotypes

BMI, Waist circumference, Weight, and WHR

BMI = Weight/Height(m)2 WHR = Waist / Hip circumference

Which genes are associated with obesity related phenotypes?

BMI Waist Weight WHR BMI 1 Waist 0.7607 1 Weight 0.7308 0.6862 1 WHR 0.3819 0.7971 0.2920 1

SLIDE 23

Obesity: Univariate Analysis

Most GWAS are conducted under this framework Focus on one phenotype and single SNP Obesity related phenotypes

Separate univariate analyses

1 41 31 21 11 01 1

ε β β β β β + + + + + = SNP Area Age Sex Y

BMI: Waist: Weight:

2 42 32 22 12 02 2

ε β β β β β + + + + + = SNP Area Age Sex Y

3 43 33 23 13 03 3

ε β β β β β + + + + + = SNP Area Age Sex Y

WHR:

4 44 34 24 14 04 4

ε β β β β β + + + + + = SNP Area Age Sex Y

SLIDE 24

Obesity: Univariate Analysis Results

Number of significant genetic variants at a given level of α

P-value ≤ 10-7 10-7< p ≤ 10-6 10-6< p ≤ 10-5 10-5< p ≤ 10-4 BMI 1 6 23 Waist 7 39 Weight 3 5 32 WHR 4 7 25

SLIDE 25

BMI Waist Weight WHR

SLIDE 26

Overlay Plot

Some SNPs have consistent significant effects on all four phenotypes Want to confirm by statistical testing Want to know whether joint analysis (multivariate analysis) of all correlated

phenotypes increase power or not

SLIDE 27

Results of Multivariate Analysis

P-value ≤ 10-7 10-7< p ≤ 10-6 10-6< p ≤ 10-5 10-5< p ≤ 10-4 BMI 1 6 23 Waist 7 39 Weight 3 5 32 WHR 4 7 25 Multivariate analysis 53 48 89 220 ≤ 10-12 10-12< p ≤ 10-10 10-10< p ≤ 10-8 10-8< p ≤ 10-7 2 3 20 28

SLIDE 28

Metabolic syndrome related traits

Phenotypes

Waist circumference (WC) Systolic blood pressure (SBP) Diastolic blood pressure (DBP) Triglyceride (TG) à ln transformed High-density lipoprotein (HDLc)à - ln transformed Fasting plasma glucose (FPG) à ln transformed

SLIDE 29

Correlation matrix

WC SBP DBP TG

HDLc

FPG WC 1 0.293 0.320 0.355 0.287 0.188 SBP 0.293 1 0.812 0.204 0.024 0.143 DBP 0.320 0.812 1 0.218 0.044 0.141 TG 0.355 0.204 0.218 1 0.439 0.181

HDLc

0.287 0.024 0.044 0.439 1 0.048 FPG 0.188 0.143 0.141 0.181 0.048 1

SLIDE 30

Model and covariates

Covariates

Sex, Age and Area

Genetic mode

Additive mode

Univariate approach

Six traits

i i i i i

SNP Area Age Sex y ε β β β β β + + + + + =

4 3 2 1

SLIDE 31

Results from univariate approach

P-value <5x10-8: 0 SNP P-value <5x10-8: 2 SNP P-value <5x10-8: 1 SNP P-value <5x10-8: 34 SNPs P-value <5x10-8: 19 SNPs P-value <5x10-8: 1 SNP

SLIDE 32

Overlay plot: Univariate approach

Some SNPs have consistent significant pattern on all four phenotypes For Improving power, conduct joint analysis

Multivariate analysis by considering all correlated phenotypes

SLIDE 33

KARE result of multivariate approach

P-value <5x10-8: 52 SNPs

SLIDE 34

Overlay plot: Univariate and Multivariate approaches

SLIDE 35

KARE Results

Number of significant genetic variants at a given level of α

P-value ≤ 10-9 10-9< p≤ 10-8 10-8< p ≤ 10-7 10-7< p ≤ 10-6 10-6< p ≤ 10-5 WC 8 SBP 1 1 3 DBP 1 1 3 HDL 28 6 1 11 TG 18 3 13 18 FPG 4 5 24 Multivariate 38 8 10 17 25

at α=5x10-8

A total of 41 variants in eleven chromosomal regions were identified for at least one of the six

phenotypes (Univariate approach)

52 variants in twelve chromosomal regions passed the GW significance (multivariate approach)

Among them, 9 chromosomal regions have been identified in both univariate and multivariate approaches.

SLIDE 36

Results from Multivariate approach

Locus SNP Nearby Gene MAF (Minor allele) Multivariate Univariate F-statistic P Phenotyp e Beta P Phenotype Beta P 2p23.3a rs780094 GCKR 0.463 (C) 13.997 6.45×10-16 WC 0.002 8.77×10-01

HDLc

0.012 4.33×10-01 SBP

0.003 8.34×10-01

TG

0.102 1.46×10-11

DBP 0.003 8.16×10-01 FPG 0.044 3.38×10-03 8p21.3c rs10503669 LDL 0.124 (A) 20.157 1.54×10-23 WC 0.017 4.28×10-01

HDLc
0.196 6.42×10-18

SBP

0.031 1.46×10-01

TG

0.178 3.21×10-15

DBP

0.018 4.10×10-01

FPG

0.021 3.57×10-01

8q24.13e rs2001945 0.416 (G) 7.965 1.38×10-08 WC

0.004 7.66×10-01
HDLc
0.007 6.25×10-01

SBP

0.004 8.00×10-01

TG 0.077 2.77×10-07 DBP

0.007 6.19×10-01

FPG

0.031 3.73×10-02

9q31.1d rs12686004 ABCA1 0.214 (A) 11.720 4.00×10-13 WC

0.016 3.69×10-01
HDLc

0.125 1.03×10-11 SBP

0.028 9.67×10-02

TG 0.000 9.99×10-01 DBP

0.016 3.55×10-01

FPG 0.033 7.17×10-02 11q12.2b rs174547 FADS1 0.324 (C) 9.106 5.91×10-10 WC

0.008 5.91×10-01
HDLc

0.055 6.32×10-04 SBP

0.020 1.86×10-01

TG 0.068 2.08×10-05 DBP

0.005 7.54×10-01

FPG

0.073 4.42×10-06

12q24.11d rs12229654 MYL2 0.144 (G) 18.168 4.52×10-21 WC

0.092 1.29×10-05
HDLc

0.132 1.04×10-09 SBP

0.066 1.03×10-03

TG

0.057 8.43×10-03

DBP

0.071 6.78×10-04

FPG

0.098 5.04×10-06

SLIDE 37

Locus SNP Nearby Gene Minor allele / MAF Multivariate Univariate F-statistic P Phenotype Beta P Phenotype Beta P 12q24.12a rs2238153 ATXN2 0.460 (A) 7.754 2.46×10-08 WC

0.033 2.79×10-02
HDLc

0.066 1.70×10-05 SBP

0.037 9.84×10-03

TG

0.026 8.45×10-02

DBP

0.021 1.67×10-01

FPG

0.024 1.15×10-01

12q24.13a rs11066280 C12orf5 1 0.173 (A) 24.880 2.02×10-29 WC

0.089 4.86×10-06
HDLc

0.145 5.81×10-13 SBP

0.074 6.49×10-05

TG

0.077 1.03×10-04

DBP

0.079 5.17×10-05

FPG

0.086 1.49×10-05

12q24.13b rs2072134 OAS3 0.115 (A) 13.942 7.55×10-16 WC

0.070 2.50×10-03
HDLc

0.143 1.88×10-09 SBP

0.077 4.40×10-04

TG

0.049 3.69×10-02

DBP

0.077 8.30×10-04

FPG

0.066 4.95×10-03

15q22.1b rs16940212 0.340 (T) 17.299 5.41×10-20 WC 0.002 9.18×10-01

HDLc
0.132 1.37×10-16

SBP

0.012 4.25×10-01

TG 0.020 2.11×10-01 DBP

0.005 7.25×10-01

FPG

0.002 8.94×10-01

16p12.3c rs7197218 XYLT1 0.014 (G) 7.935 1.49×10-08 WC

0.033 6.07×10-01
HDLc
0.191 3.69×10-03

SBP

0.012 4.25×10-01

TG

0.106 1.04×10-01

DBP 0.061 3.34×10-01 FPG 0.352 6.29×10-08 16q13b rs6499863 CETP 0.104 (A) 9.116 5.76×10-10 WC 0.014 5.60×10-01

HDLc

0.152 1.06×10-09 SBP

0.003 8.99×10-01

TG

0.005 8.25×10-01

DBP 0.030 2.18×10-01 FPG 0.016 5.09×10-01

Results from Multivariate approach

SLIDE 38

Reported loci in GWAS catalog

Locus SNP Nearby Gene MAF (Minor allele) GWAS catalog 2p23.3a rs780094 GCKR 0.463 (C) Manning et al., (Nat Genet, 2012): Fasting glucose, Kristiansson et al., (Circ Cardiovasc Genet, 2012): Metabolic Syndrome, Suhre et al., (Nature, 2012): Metabolic trait, Dupuis (Nat Genet, 2010): Fasting glucose and insulin, Aulchenko et al., Nat Genet, 2008):TG, Wallace et al., (Am J Hum Genet, 2008): LDL Kim et al., (Nat Genet, 2011): Metabolite levels, TG 8p21.3c rs10503669 LDL 0.124 (A) Kim et al., (Nat Genet, 2011), Willer et al., (Nat Genet, 2008): HDL, TG 8q24.13e rs2001945 0.416 (G) Kim et al., (Nat Genet, 2011): TG 9q31.1d rs12686004 ABCA1 0.214 (A) Kim et al., (Nat Genet, 2011): HDL 11q12.2b rs174547 FADS1 0.324 (C) Han et al., (Bone, 2012): appendicular lean mass, Kettunen et al., (Nat Genet, 2012): human serum metabolite levels, Suhre et al., (Nature, 2011), Illig et al., (Nat genet, 2010): metabolism, Kathiresan et al., (Nat genet, 2008): HDL, TG

Newly identified in Korean population from this study

12q24.11d rs12229654 MYL2 0.144 (G) Kim et al., (Nat Genet, 2011): HDL,GGT Go et al., (J Hum Genet): T2D 12q24.13a rs11066280 C12orf5 0.173 (A) Kim et al., (Nat Genet, 2011): metabolite levels, Kato et al., (Nat Genet, 2011): BP 12q24.13b rs2072134 OAS3 0.115 (A) Kim et al., (Nat Genet, 2011): HDL 15q22.1b rs16940212 0.340 (T) Kim et al., (Nat Genet, 2011): HDL 16q13b rs6499863 CETP 0.104 (A) Ridker et al., (Circ Cardiovasc Genet, 2009): HDL, Kathiresan et al., (Nat Genet, 2008): HDL, TG, Saxena et al., (Science, 2007): TG Kim et al., (Nat Genet, 2011): HDL 12q24.12a rs2238153 ATXN2 0.460 (A)

Newly identified from this study

16p12.3c rs7197218 XYLT1 0.014 (G)

Newly identified from this study

SLIDE 39

Simulation Study

Performance comparison between Univariate and Multivariate

approach

Effect of Correlation between phenotypes Minor allele frequency Genetic effect size (coefficient of genetic variants) Association direction of genetic variants

Same vs. Different

Type I error Power

SLIDE 41

Simulation settings (1)

Single SNP association Minor allele frequency

MAF: 0.01 ~ 0.3

SNP generation

Under HWE assumption

Marginal correlation coefficient between phenotypes

# of phenotypes: 2, 4, 8 (quantitative responses)

correlation={0, 0.25, 0.5, 0.75}

Genetic effect size (β) Association direction Correlated-phenotype generations using Multivariate Normal distribution

SLIDE 42

Simulation settings (2)

Heterogeneous model when the number of trait is 8 Homogeneous model

common genetic effect

) 8 ,..., 2 , 1 ( ,

1

= + + = j x y

ij i j j ij

ε β β

15 11 18 17 16 15 14 13 12 11

β β β β β β β β β β − = = = = = = =

) 8 ,..., 2 , 1 ( ,

1

= + + = j x y

ij i j ij

ε β β

SLIDE 43

Type I error rates and Power calculations

Empirical type I error rate (false positive error rates)

Null hypothesis for univariate analysis Null hypothesis

for heterogeneous model

Null hypothesis

for homogeneous model

Generating 100,000 replicates for 1,000 samples

Four correlated phenotypes

Generating 10,000,000 replicates for 10,000 samples

Empirical power (true positive rate)

Various nonzero values of and for heterogeneous model Various nonzero values of for homogeneous model

Generating 1,000 replicates for 1,000 samples

) (or

15 11

= = β β

) ,..., , ( ) ,...., , (

18 12 11

= β β β

1 =

β

11

β

15

β

1

β

SLIDE 44

Type I error rates (1)

– two phenotypes, MAF=0.15 (β11,β15)=(0,0)

α Correlation (ρ) Univariate approach Multivariate approach Univariate 1 Univariate 2 0.05 5.23×10-02 4.98×10-02 5.03×10-02 0.25 4.88×10-02 4.97×10-02 4.95×10-02 0.5 5.02×10-02 5.00×10-02 5.05×10-02 0.75 4.95×10-02 4.91×10-02 5.01×10-02 0.01 1.02×10-02 9.90×10-03 9.58×10-03 0.25 9.66×10-03 1.02×10-02 1.01×10-02 0.5 1.03×10-02 9.53×10-03 9.75×10-03 0.75 1.02×10-02 1.04×10-02 1.01×10-02 10-3 9.50×10-04 8.50×10-04 8.30×10-04 0.25 9.10×10-04 1.04×10-03 9.10×10-04 0.5 8.80×10-04 1.04×10-03 8.50×10-04 0.75 1.07×10-03 1.02×10-03 1.05×10-03 10-4 9.00×10-05 7.00×10-05 1.10×10-04 0.25 6.00×10-05 1.60×10-04 5.00×10-05 0.5 1.10×10-04 1.20×10-04 1.20×10-04 0.75 1.30×10-04 1.30×10-04 1.00×10-04

SLIDE 45

α Correlation (ρ) Univariate approach Multivariate approach Univariate 1 Univariate 2 10-5 2.00×10-05 2.00×10-05 2.00×10-05 0.25 0.00×10+00 0.00×10+00 0.00×10+00 0.5 1.00×10-05 2.00×10-05 2.00×10-05 0.75 2.00×10-05 3.00×10-05 1.00×10-05 10-6 0.00×10+00 0.00×10+00 0.00×10+00 0.25 0.00×10+00 0.00×10+00 0.00×10+00 0.5 0.00×10+00 0.00×10+00 0.00×10+00 0.75 1.00×10-05 0.00×10+00 0.00×10+00 10-7 0.00×10+00 0.00×10+00 0.00×10+00 0.25 0.00×10+00 0.00×10+00 0.00×10+00 0.5 0.00×10+00 0.00×10+00 0.00×10+00 0.75 0.00×10+00 0.00×10+00 0.00×10+00

SLIDE 46

Type I error rates (2)

– Four phenotypes with 108 replicates

MAF α Univariate approach Multivariate approach Univariate 1 Univariate 2 Univariate 3 Univariate 4 0.01 0.05 5.00×10-02 5.00×10-02 5.00×10-02 5.00×10-02 5.00×10-02 0.01 1.00×10-02 1.00×10-02 1.00×10-02 1.00×10-02 1.00×10-02 10-3 1.00×10-03 1.00×10-03 1.00×10-03 1.00×10-03 1.00×10-03 10-4 9.93×10-05 1.01×10-04 1.00×10-04 1.00×10-04 1.00×10-04 10-5 1.02×10-05 9.64×10-06 1.02×10-05 1.06×10-05 1.05×10-05 10-6 9.50×10-07 1.13×10-06 9.80×10-07 1.11×10-06 8.80×10-07 10-7 7.00×10-08 1.30×10-07 1.00×10-07 9.00×10-08 1.10×10-07 10-8 1.00×10-08 2.00×10-08 1.00×10-08 3.00×10-08 1.00×10-08 0.03 0.05 5.00×10-02 5.00×10-02 5.00×10-02 5.00×10-02 5.00×10-02 0.01 1.00×10-02 1.00×10-02 1.00×10-02 1.00×10-02 1.00×10-02 10-3 1.01×10-03 1.00×10-03 1.00×10-03 1.00×10-03 1.00×10-03 10-4 1.00×10-04 9.98×10-05 9.96×10-05 1.01×10-04 1.01×10-04 10-5 1.04×10-05 9.95×10-06 9.74×10-06 1.02×10-05 1.03×10-05 10-6 1.10×10-06 1.09×10-06 9.30×10-07 1.00×10-06 1.27×10-06 10-7 1.10×10-07 7.00×10-08 6.00×10-08 1.40×10-07 1.80×10-07 10-8 1.00×10-08 0.00×10+00 0.00×10+00 3.00×10-08 1.00×10-08

SLIDE 47

MAF α Univariate approach Multivariate approach Univariate 1 Univariate 2 Univariate 3 Univariate 4 0.05 0.05 5.00×10-02 5.00×10-02 5.00×10-02 5.00×10-02 5.00×10-02 0.01 9.98×10-03 1.00×10-02 1.00×10-02 1.00×10-02 1.00×10-02 10-3 9.96×10-04 9.97×10-04 9.98×10-04 1.00×10-03 1.00×10-03 10-4 9.80×10-05 9.95×10-05 1.01×10-04 1.02×10-04 1.02×10-04 10-5 9.67×10-06 1.06×10-05 1.02×10-05 1.01×10-05 9.87×10-06 10-6 9.30×10-07 9.60×10-07 1.00×10-06 8.90×10-07 1.01×10-06 10-7 3.00×10-08 6.00×10-08 1.30×10-07 8.00×10-08 1.70×10-07 10-8 0.00×10+00 2.00×10-08 2.00×10-08 4.00×10-08 1.00×10-08 0.07 0.05 5.00×10-02 5.00×10-02 5.00×10-02 5.00×10-02 5.00×10-02 0.01 1.00×10-02 1.00×10-02 9.99×10-03 9.99×10-03 1.00×10-02 10-3 1.00×10-03 1.00×10-03 9.99×10-04 9.97×10-04 1.00×10-03 10-4 1.01×10-04 9.99×10-05 9.93×10-05 9.91×10-05 1.02×10-04 10-5 1.01×10-05 1.00×10-05 1.03×10-05 1.06×10-05 1.03×10-05 10-6 9.30×10-07 8.70×10-07 9.70×10-07 9.70×10-07 1.06×10-06 10-7 1.20×10-07 9.00×10-08 7.00×10-08 1.40×10-07 1.40×10-07 10-8 2.00×10-08 3.00×10-08 0.00×10+00 0.00×10+00 1.00×10-08 0.09 0.05 5.00×10-02 5.00×10-02 5.00×10-02 5.00×10-02 5.00×10-02 0.01 9.99×10-03 9.99×10-03 1.00×10-02 9.99×10-03 1.00×10-02 10-3 1.01×10-03 9.99×10-04 9.99×10-04 1.00×10-03 1.00×10-03 10-4 1.02×10-04 1.00×10-04 1.01×10-04 1.00×10-04 9.85×10-05 10-5 1.03×10-05 1.00×10-05 1.06×10-05 1.01×10-05 9.44×10-06 10-6 9.70×10-07 1.05×10-06 1.03×10-06 1.22×10-06 1.00×10-06 10-7 1.10×10-07 1.40×10-07 1.40×10-07 1.80×10-07 9.00×10-08 10-8 0.00×10+00 1.00×10-08 1.00×10-08 3.00×10-08 1.00×10-08

SLIDE 48

Power (1)

Same genetic effect size: β11=β15= 0.1

α=0.05

SLIDE 49

Power (2)

Opposite directions of association: (β11,β15)=(-0.05,0.05)

α=0.05

SLIDE 50

Power (3) Opposite directions of association: (β11,β15)=(-0.1,0.1)

α=0.05 α=0.01 α=10-4 α=10-6

SLIDE 51

Power (4)

Homogeneous effect: MultiPhen vs. Multivariate linear model

α=0.05

SLIDE 52

Summary of simulation studies

Multivariate analysis preserves type I error Multivariate analysis is more powerful than univariate analysis

Low correlation among the phenotypes Homogeneous multivaraiate model: same direction with same effect sizes

Common slope model

Heterogeneous multivariate model: different association direction for the

phenotypes

Possible false-positive error

SLIDE 53

Summary

Multivariate approach

Analyzing several phenotypes simultaneously Considering correlation between related–phenotype in one model

Suitable to detect pleiotropic genes

Reducing multiple comparison problems by reducing the number of tests Improving power for identifying genetic variants with related phenotypes in

GWAS

SLIDE 55

Summary

More powerful than univariate analysis

Low correlation between phenotypes Same association direction with similar genetic effect

Common slope model can improve power

Different association direction

Carefully investigated to avoid any false-positive errors

Can be applicable to the repeated measures in Cohort studies Rare variants studies from next generation sequencing (NGS) data

SLIDE 56

Multivariate + multiple analysis

Focus on multiple related phenotypes and multiple SNPs

SNP K

… …

SNP-SNP Interaction

SNP J SNP I SNP 1 SNP 2

accumulated additive effects on multiple SNPs

Trait 1 Trait 2 Trait 3 Trait 4 Trait 5

Related phenotypes

SNP 500K

Extensions

SLIDE 57

Choi J, Park T. (2013) Multivariategeneralized multifactor dimensionality

reduction to detect gene-gene interactions., BMC Syst Biol.

Yu W, Kwon MS, Park T. (2015). MultivariateQuantitative Multifactor

Dimensionality Reduction for Detecting Gene-Gene Interactions., Hum Hered.

Won S, Kim W, Lee S, Lee Y, Sung J, Park T. (2015). Family-based

association analysis: a fast and efficient method of multivariate association analysis with multiple variants., BMC Bioinformatics.

Yu W, Lee S, Park T. (2016). A unified multifactor dimensionality reduction

framework for detecting gene-gene interactions, Bioinformatics

Lee S, Kim Y, Park T. (2016). Rare Variant Association Test with Multiple

Phenotypes. Genetic Epi (in press)

Extensions

SLIDE 58

Acknowledgement

This work was supported by a grant funded by the Bio-Synergy

Research Project (2013M3A9C4078158) of the Ministry of science, ICT, and Future Planning, also through the NRF, Korea and by a NRF grant (2015R1A5A6001906).

SLIDE 59

Thank you!!!

SLIDE 60

Conclusion

52 variants in twelve chromosomal regions were identified from multivariate

approach at α=5x10-8

Identified chromosomal regions reported to be associated with lipids,

diabetes-related and metabolic related phenotypes in the previous GWAS

Three chromosomal regions newly identified

Simulation studies

Homogeneous genetic effect model is more efficient and provides more

power

The direction of genetic effects for the phenotypes are the same and of

similar magnitude

Multivariate approach provides higher power than univariate approach

The directions of genetic effects for the phenotypes were different It is important to understand underlying biological background between ge

ne and phenotype (∵Conditional association between genetic variants and phenotypes)

SLIDE 61

Acknowledgement

q Bioinformatics and Biostatistics Lab., SNU § Sohee Oh, JaehoonLee, Kyunga Kim, Dankyu Yoon, Min-Seok Kwon, Junghyun Namkung, q Center for Genome Science , KNIH, KCDC § Yoon Shin Cho , Min Jin Go, Young Jin Kim § Hyung-Lae Kim, Bok-Ghee Han, Jong-Young Lee q KARE Consortium § Bermseok Oh, Kyung Hee University q DNA Link co. Korea § Jong-Eun Lee q Cohort PIs

Nam Han Cho, Chol Shin

SLIDE 62

Korea

Geographical Location of the Cohorts

KARE

SLIDE 63

KARE: Result

SNP Clinical Data

2009 Nature genetics

Detection of 11 SNPs influencing traits in Korean population Blood pressure, pulse rate, BMI, height, waist-hip ratio, bone mineral density

KARE

SLIDE 64

GWAS meta-analysis using KARE

Nature, 2010

Detection of 95 loci influencing traits in 100K European population and replication study in non-European populations (East Asians, South Asians, and African Americans) Total cholesterol (TC), LDL-C, HDL-C, TG Identifying potential novel drug targets for treatment of extreme Lipid phenotypes and prevention of coronary artery disease (CAD)

SNP Clinical Data Lipid Traits European /Non-European

Biological, Clinical, and Population Relevance of 95 Loci Mapped for Serum Lipid Concentrations

Tanya M. Teslovich1,118, Kiran Musunuru2,3,4,5,6,118, Albert V. Smith7,8, Andrew C. Edmondson9,10, Ioannis M. Stylianou10, Masahiro Koseki11, James P . Pirruccello2,5,6, Samuli Ripatti12,13, ….. , Yoon Shin Cho29, Min Jin Go29, Young Jin Kim29, Jong-Young Lee29, Taesung Park30, Kyunga J. Kim31,32, ..... , Gonçalo R. Abecasis1,119, Michael Boehnke1,119, Sekar Kathiresan2,3,4,5,119

KARE

SLIDE 65

Results

SLIDE 66

Quantile-Quantile plot from a multivariate approach

SLIDE 67

Quantile-Quantile plots from univariate approaches

SLIDE 68

Quantile-Quantile plots

SLIDE 69

Genetic association study approaches

One trait + Single genetic variant One trait + Multiple genetic variants Multiple traits + Single genetic variant Multiple traits + Multiple genetic variants

Genetic Association Study

SLIDE 70

Functional enrichment analysis

Functional enrichment analysis for the identified genes from

multivariate analysis (DAVIS)

Enrichement score: 1.48 Gene Count P-value Benjamini Lipid metabolism ABCA1, CETP, FADS1 3 8.8E-3 4.0E-1 Small molecule metabolic process ABCA1, CETP, FADS1, GCKR 4 2.8E-2 9.5E-1 Transport ABCA1, CETP, FADS1 3 1.4E-1 8.4E-1

ANALYSIS OF MULTIPLE RELATED PHENOTYPES IN GENOME-WIDE ASSOCIATION STUDIES

Taesung Park1

GIW 2016

Contents

Introduction 1 Multivariate analysis 2 Application: Korean Association REsource (KARE) Project 3 Simulation Study 4 Conclusion 5

Genome Wide Association Studies (GWAS)

 Studies of genetic variation across

the entire genome

 Single Nucleotide Polymorphism

(SNP)

altered

 Designed to identify associations

between genetic markers &

 Rely on SNP chip technologies

Genome Wide Association Studies (GWAS)

 Successful in complex traits and diseases

and mental illnesses

Association test

 Univariate and single SNP analysis

Trait 1

Improving power

 Common complex traits are related with many genes  Not easy to identify genetic variants with high significance at

α=5×10-8

 Further, these variants explain only small fraction of disease

etiology

 Need to develop a more powerful method for identifying genetic

variants

 Univariate + multiple SNP analysis

Association test

Trait 1

Multivariate approach

 Multivariate analysis

Multivariate approach

 Examples: multiple related phenotypes

 Obesity

 Hyperlipidemia

 Metabolic Syndrome

blood pressure (SBP, DBP), Insulin resistance

Multivariate approach

 Existing Methods

1)

MultiPhen (O’Reilly et al., 2012)

2)

Efficient algorithm for GWAS (Zhou and Stephens, 2014)

Contents

Introduction 1 Multivariate analysis 2 Application: Korean Association REsource (KARE) Project 3 Simulation Study 4 Conclusion 5

Multivariate approach

 Identify genetic variants associated with multiple related traits

 Extension of the univariate linear model to the multivariate linear

model with a response vector

 Univariate variances are replaced by a covariance matrix

 Joint analysis

Multivariate general linear model (1)

 Let yij denote the value of trait j from subject i, for i=1,…,n, j=1,…,m  The linear model for the trait j 

is a vector of SNPs and covariates



is a vector of p unknown parameters



represents the effect of the kth SNP on the trait j

 This models allows one SNP to have different effects on the traits

The multivariate general linear model (2)



is a vector of m responses from the ith subject



is a vector of m residuals for the ith subject

  The vector

where In denotes the n×n identity matrix and the operator is the direct (Kronecker) product

( )

( )

⊗

Multivariate general linear model (3)

 Covariance (correlation) structure: matrix

m m ×

The multivariate general linear model (4)

 Matrix formulation

, E XB Y + =

 Model

 Test statistics

∏

The multivariate general linear model (5)

Studies of genetic variation across

Single Nucleotide Polymorphism

Designed to identify associations

Rely on SNP chip technologies

Successful in complex traits and diseases

Univariate and single SNP analysis

Common complex traits are related with many genes Not easy to identify genetic variants with high significance at

Further, these variants explain only small fraction of disease

Need to develop a more powerful method for identifying genetic

Univariate + multiple SNP analysis

Multivariate analysis

Examples: multiple related phenotypes

Obesity

Hyperlipidemia

Metabolic Syndrome

Existing Methods

Identify genetic variants associated with multiple related traits

Extension of the univariate linear model to the multivariate linear

Univariate variances are replaced by a covariance matrix

Joint analysis

Let yij denote the value of trait j from subject i, for i=1,…,n, j=1,…,m The linear model for the trait j

This models allows one SNP to have different effects on the traits

The vector

Covariance (correlation) structure: matrix

Matrix formulation

Model

Test statistics

Data Description

Filtering Threshold

Obesity related phenotypes

Number of significant genetic variants at a given level of α

Phenotypes

Covariates

Genetic mode

Univariate approach

Number of significant genetic variants at a given level of α

Performance comparison between Univariate and Multivariate

Empirical type I error rate (false positive error rates)

Empirical power (true positive rate)

Multivariate analysis preserves type I error Multivariate analysis is more powerful than univariate analysis