ANALYSIS OF MULTIPLE RELATED PHENOTYPES IN GENOME-WIDE ASSOCIATION - - PowerPoint PPT Presentation

analysis of multiple related phenotypes in genome wide
SMART_READER_LITE
LIVE PREVIEW

ANALYSIS OF MULTIPLE RELATED PHENOTYPES IN GENOME-WIDE ASSOCIATION - - PowerPoint PPT Presentation

GIW 2016 ANALYSIS OF MULTIPLE RELATED PHENOTYPES IN GENOME-WIDE ASSOCIATION STUDIES Taesung Park 1 Sohee Oh 1 , Iksoo Huh 1 , and Seung-Yeoun Lee 2 1 Department of Statistics, Seoul National University, South Korea 2 Department of Applied


slide-1
SLIDE 1

ANALYSIS OF MULTIPLE RELATED PHENOTYPES IN GENOME-WIDE ASSOCIATION STUDIES

Taesung Park1

Sohee Oh1, Iksoo Huh1, and Seung-Yeoun Lee2

1Department of Statistics, Seoul National University, South Korea 2Department of Applied Statistics, Sejong Univeristy, South Korea

1

GIW 2016

slide-2
SLIDE 2

Contents

Introduction 1 Multivariate analysis 2 Application: Korean Association REsource (KARE) Project 3 Simulation Study 4 Conclusion 5

slide-3
SLIDE 3

Genome Wide Association Studies (GWAS)

— Studies of genetic variation across

the entire genome

— Single Nucleotide Polymorphism

(SNP)

— DNA sequence variations that

  • ccur when a single nucleotide is

altered

— Designed to identify associations

between genetic markers &

  • bservable traits,
  • r the presence/absence
  • f a disease or condition

— Rely on SNP chip technologies

slide-4
SLIDE 4

Genome Wide Association Studies (GWAS)

— Successful in complex traits and diseases

  • height, body mass index, blood pressure
  • asthma, cancer, diabetes, heart disease

and mental illnesses

slide-5
SLIDE 5

Association test

— Univariate and single SNP analysis

— Focus on one trait and single SNP

  • ne trait

SNP K SNP J SNP I SNP 1 SNP 2

… …

Trait 1

SNP 1M

) , ( ~ ,

2 3 2 1

σ ε ε β β β β N SNP Age Sex y

i i i i i

+ + + + =

slide-6
SLIDE 6

Improving power

— Common complex traits are related with many genes — Not easy to identify genetic variants with high significance at

α=5×10-8

— Further, these variants explain only small fraction of disease

etiology

— Need to develop a more powerful method for identifying genetic

variants

— Meta analysis by increasing sample size — Multiple SNP analysis: gene-gene interaction — Joint analysis with the correlated phenotypes

slide-7
SLIDE 7

— Univariate + multiple SNP analysis

— Focus on one trait and multiple SNPs

SNP K

… …

SNP-SNP Interaction

SNP J SNP I SNP 1 SNP 2

accumulated additive effects on multiple SNPs

SNP 500K

Association test

  • ne trait

Trait 1

slide-8
SLIDE 8

Multivariate approach

— Multivariate analysis

— Focus on multiple related traits and single SNP

SNP K SNP J SNP I SNP 1 SNP 2

… …

Trait 1 Trait 2 Trait 3 Trait 4 Trait 5

Related traits

SNP 1M

slide-9
SLIDE 9

Multivariate approach

— Examples: multiple related phenotypes

— Obesity

— BMI, Waist circumference, Weight, WHR, Body Fat

— Hyperlipidemia

— Total cholesterol, HDL/LDL cholesterol, Triglyceride

— Metabolic Syndrome

— Waist circumference, triglyceride, HDL cholesterol,

blood pressure (SBP, DBP), Insulin resistance

slide-10
SLIDE 10

Multivariate approach

— Existing Methods

1)

MultiPhen (O’Reilly et al., 2012)

— Proportional odds model

2)

Efficient algorithm for GWAS (Zhou and Stephens, 2014)

— Linear mixed model

slide-11
SLIDE 11

Contents

Introduction 1 Multivariate analysis 2 Application: Korean Association REsource (KARE) Project 3 Simulation Study 4 Conclusion 5

slide-12
SLIDE 12

Multivariate approach

— Identify genetic variants associated with multiple related traits

— Extension of the univariate linear model to the multivariate linear

model with a response vector

— Univariate variances are replaced by a covariance matrix

— Joint analysis

— Analyze several traits simultaneously — Account for correlation structure of multiple traits in the model — Allows different slopes(SNP effects) model for each trait

— Different association direction => Hetrogeneous model

— Common slope(SNP effect) model

— Same association direction with similar effect sizes => Homogeneous model

slide-13
SLIDE 13

Multivariate general linear model (1)

— Let yij denote the value of trait j from subject i, for i=1,…,n, j=1,…,m — The linear model for the trait j —

is a vector of SNPs and covariates

—

is a vector of p unknown parameters

—

represents the effect of the kth SNP on the trait j

— This models allows one SNP to have different effects on the traits

ij T i ij p k kj ik ij

j

x x y ε β ε β + = + =∑

=1

T pj j j

) ,..., (

1

β β β =

T ip i i

x x x ) ,..., (

1

=

kj

β

slide-14
SLIDE 14

The multivariate general linear model (2)

—

is a vector of m responses from the ith subject

—

is a vector of m residuals for the ith subject

— — The vector

where In denotes the n×n identity matrix and the operator is the direct (Kronecker) product

T im i i

) ,..., (

1

ε ε ε =

( )

∑ , ~

m m i

N ε

1 × nm

( )

Σ ⊗ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ =

n nm nm n

I N , ~

1

ε ε ε !

1

( ,..., )T

i i im

y y y =

slide-15
SLIDE 15

Multivariate general linear model (3)

— Covariance (correlation) structure: matrix

— Specify how the traits within a subject are related

—

Unstructured (UN)

—

Sturcutred covariane Compound Symmetry (CS) First-order autoregressive (AR(1))

⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛

2 2 1 2 2 2 12 1 12 2 1 m m m m m

σ σ σ σ σ σ σ σ σ ! " # " " ! !

⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛ + + +

2 2 1 2 1 2 1 2 1 2 2 1 2 1 2 1 2 1 2 2 1

σ σ σ σ σ σ σ σ σ σ σ σ ! " # " " ! ! ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛

− − − − 2 2 2 2 1 2 2 2 2 2 1 2 2

σ σ ρ σ ρ σ ρ σ ρσ σ ρ ρσ σ ! " # " " ! !

m m m m

m m ×

slide-16
SLIDE 16

The multivariate general linear model (4)

— Matrix formulation

( )

errors random

  • f

matrix matrix parameter matrix design known matrix data

1 1 1 11 1 1 1 11 1 1 1 11 1 1 1 11

m n E m p B p n x x x x x x X m n y y y y y y Y

T n T nm n m m pm p m T n T np n p T n T nm n m

× ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ = ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ = ⋅ × = ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ = ⋅ × ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ = ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ = ⋅ × ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ = ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ = ⋅ ε ε ε ε ε ε β β β β β β ! " " " # " " " " " # " ! " " " # " ! " " " # "

Σ ⊗ = ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ =

n n

I y y Var XB Y E !

1

and ) ( where

, E XB Y + =

slide-17
SLIDE 17

— Consider related-phenotypes simultaneously — Allow for correlation between phenotypes in the model — Detect genetic variants which have modest effects in univariate approach — Provide some chances to capture pleiotropic genes

— Model

— Hetro model with separate slopes (different genetic effects on each phenotype) — Homo model with common slope (same genetic effects on all phenotypes) — Unstructured variance-covariance structure

— Test statistics

— Wilk’s Λ statistic

=

+ = + = Λ

k i i

E H E

1 1

1 | | | | λ

The multivariate general linear model (5)

slide-18
SLIDE 18

Contents

Introduction 1 Multivariate analysis 2 Application: Korean Association REsource (KARE) Project 3 Simulation Study 4 Conclusion 5

slide-19
SLIDE 19

Korea Association Resoure (KARE) Project

  • To identify genetic factors of

quantitative clinical traits and life-style related diseases (eg. T2DM) from Genome-Wide Association Study using population-based cohorts

Objective

  • Over 10,000 subjects from two

community-based cohorts in Korea (Ansung & Ansan cohorts)

  • Affymetrix 5.0

Genotyping

First high density large scale GWA Study performed in the East Asian population

Courtesy of KNIH

slide-20
SLIDE 20

KARE: Characteristics

Baseline study Ansung Ansan Participants 5,018 5,020 Sex (women/men) 2,778/ 2,240 2,497/ 2,523 Age (mean) 55.5 49.1 40th (%) 31.2 62.8 50th (%) 29.1 23.0 60> (%) 39.6 14.3

Courtesy of KNIH

KARE

slide-21
SLIDE 21

KARE data

— Data Description

— 8,842 subjects from two community-based cohorts in Korea

(Ansung& Ansan cohorts)

— Filtering Threshold

— HWE < 10-6 — MAF < 0.01 — Missing Proportion in each genotype > 0.05 — Missing imputation: HapMap JPT/CHB reference panel

— SNPs: 327,872

slide-22
SLIDE 22

Obesity

— Obesity related phenotypes

— BMI, Waist circumference, Weight, and WHR

— BMI = Weight/Height(m)2 — WHR = Waist / Hip circumference

— Which genes are associated with obesity related phenotypes?

BMI Waist Weight WHR BMI 1 Waist 0.7607 1 Weight 0.7308 0.6862 1 WHR 0.3819 0.7971 0.2920 1

slide-23
SLIDE 23

Obesity: Univariate Analysis

— Most GWAS are conducted under this framework — Focus on one phenotype and single SNP — Obesity related phenotypes

— Separate univariate analyses

1 41 31 21 11 01 1

ε β β β β β + + + + + = SNP Area Age Sex Y

BMI: Waist: Weight:

2 42 32 22 12 02 2

ε β β β β β + + + + + = SNP Area Age Sex Y

3 43 33 23 13 03 3

ε β β β β β + + + + + = SNP Area Age Sex Y

WHR:

4 44 34 24 14 04 4

ε β β β β β + + + + + = SNP Area Age Sex Y

slide-24
SLIDE 24

Obesity: Univariate Analysis Results

— Number of significant genetic variants at a given level of α

P-value ≤ 10-7 10-7< p ≤ 10-6 10-6< p ≤ 10-5 10-5< p ≤ 10-4 BMI 1 6 23 Waist 7 39 Weight 3 5 32 WHR 4 7 25

slide-25
SLIDE 25

BMI Waist Weight WHR

slide-26
SLIDE 26

Overlay Plot

— Some SNPs have consistent significant effects on all four phenotypes — Want to confirm by statistical testing — Want to know whether joint analysis (multivariate analysis) of all correlated

phenotypes increase power or not

slide-27
SLIDE 27

Results of Multivariate Analysis

P-value ≤ 10-7 10-7< p ≤ 10-6 10-6< p ≤ 10-5 10-5< p ≤ 10-4 BMI 1 6 23 Waist 7 39 Weight 3 5 32 WHR 4 7 25 Multivariate analysis 53 48 89 220 ≤ 10-12 10-12< p ≤ 10-10 10-10< p ≤ 10-8 10-8< p ≤ 10-7 2 3 20 28

slide-28
SLIDE 28

Metabolic syndrome related traits

— Phenotypes

— Waist circumference (WC) — Systolic blood pressure (SBP) — Diastolic blood pressure (DBP) — Triglyceride (TG) à ln transformed — High-density lipoprotein (HDLc)à - ln transformed — Fasting plasma glucose (FPG) à ln transformed

slide-29
SLIDE 29

Correlation matrix

WC SBP DBP TG

  • HDLc

FPG WC 1 0.293 0.320 0.355 0.287 0.188 SBP 0.293 1 0.812 0.204 0.024 0.143 DBP 0.320 0.812 1 0.218 0.044 0.141 TG 0.355 0.204 0.218 1 0.439 0.181

  • HDLc

0.287 0.024 0.044 0.439 1 0.048 FPG 0.188 0.143 0.141 0.181 0.048 1

slide-30
SLIDE 30

Model and covariates

— Covariates

— Sex, Age and Area

— Genetic mode

— Additive mode

— Univariate approach

— Six traits —

i i i i i

SNP Area Age Sex y ε β β β β β + + + + + =

4 3 2 1

slide-31
SLIDE 31

Results from univariate approach

P-value <5x10-8: 0 SNP P-value <5x10-8: 2 SNP P-value <5x10-8: 1 SNP P-value <5x10-8: 34 SNPs P-value <5x10-8: 19 SNPs P-value <5x10-8: 1 SNP

slide-32
SLIDE 32

Overlay plot: Univariate approach

— Some SNPs have consistent significant pattern on all four phenotypes — For Improving power, conduct joint analysis

  • Multivariate analysis by considering all correlated phenotypes
slide-33
SLIDE 33

KARE result of multivariate approach

P-value <5x10-8: 52 SNPs

slide-34
SLIDE 34

Overlay plot: Univariate and Multivariate approaches

slide-35
SLIDE 35

KARE Results

— Number of significant genetic variants at a given level of α

P-value ≤ 10-9 10-9< p≤ 10-8 10-8< p ≤ 10-7 10-7< p ≤ 10-6 10-6< p ≤ 10-5 WC 8 SBP 1 1 3 DBP 1 1 3 HDL 28 6 1 11 TG 18 3 13 18 FPG 4 5 24 Multivariate 38 8 10 17 25

— at α=5x10-8

— A total of 41 variants in eleven chromosomal regions were identified for at least one of the six

phenotypes (Univariate approach)

— 52 variants in twelve chromosomal regions passed the GW significance (multivariate approach)

—

Among them, 9 chromosomal regions have been identified in both univariate and multivariate approaches.

slide-36
SLIDE 36

Results from Multivariate approach

Locus SNP Nearby Gene MAF (Minor allele) Multivariate Univariate F-statistic P Phenotyp e Beta P Phenotype Beta P 2p23.3a rs780094 GCKR 0.463 (C) 13.997 6.45×10-16 WC 0.002 8.77×10-01

  • HDLc

0.012 4.33×10-01 SBP

  • 0.003 8.34×10-01

TG

  • 0.102 1.46×10-11

DBP 0.003 8.16×10-01 FPG 0.044 3.38×10-03 8p21.3c rs10503669 LDL 0.124 (A) 20.157 1.54×10-23 WC 0.017 4.28×10-01

  • HDLc
  • 0.196 6.42×10-18

SBP

  • 0.031 1.46×10-01

TG

  • 0.178 3.21×10-15

DBP

  • 0.018 4.10×10-01

FPG

  • 0.021 3.57×10-01

8q24.13e rs2001945 0.416 (G) 7.965 1.38×10-08 WC

  • 0.004 7.66×10-01
  • HDLc
  • 0.007 6.25×10-01

SBP

  • 0.004 8.00×10-01

TG 0.077 2.77×10-07 DBP

  • 0.007 6.19×10-01

FPG

  • 0.031 3.73×10-02

9q31.1d rs12686004 ABCA1 0.214 (A) 11.720 4.00×10-13 WC

  • 0.016 3.69×10-01
  • HDLc

0.125 1.03×10-11 SBP

  • 0.028 9.67×10-02

TG 0.000 9.99×10-01 DBP

  • 0.016 3.55×10-01

FPG 0.033 7.17×10-02 11q12.2b rs174547 FADS1 0.324 (C) 9.106 5.91×10-10 WC

  • 0.008 5.91×10-01
  • HDLc

0.055 6.32×10-04 SBP

  • 0.020 1.86×10-01

TG 0.068 2.08×10-05 DBP

  • 0.005 7.54×10-01

FPG

  • 0.073 4.42×10-06

12q24.11d rs12229654 MYL2 0.144 (G) 18.168 4.52×10-21 WC

  • 0.092 1.29×10-05
  • HDLc

0.132 1.04×10-09 SBP

  • 0.066 1.03×10-03

TG

  • 0.057 8.43×10-03

DBP

  • 0.071 6.78×10-04

FPG

  • 0.098 5.04×10-06
slide-37
SLIDE 37

Locus SNP Nearby Gene Minor allele / MAF Multivariate Univariate F-statistic P Phenotype Beta P Phenotype Beta P 12q24.12a rs2238153 ATXN2 0.460 (A) 7.754 2.46×10-08 WC

  • 0.033 2.79×10-02
  • HDLc

0.066 1.70×10-05 SBP

  • 0.037 9.84×10-03

TG

  • 0.026 8.45×10-02

DBP

  • 0.021 1.67×10-01

FPG

  • 0.024 1.15×10-01

12q24.13a rs11066280 C12orf5 1 0.173 (A) 24.880 2.02×10-29 WC

  • 0.089 4.86×10-06
  • HDLc

0.145 5.81×10-13 SBP

  • 0.074 6.49×10-05

TG

  • 0.077 1.03×10-04

DBP

  • 0.079 5.17×10-05

FPG

  • 0.086 1.49×10-05

12q24.13b rs2072134 OAS3 0.115 (A) 13.942 7.55×10-16 WC

  • 0.070 2.50×10-03
  • HDLc

0.143 1.88×10-09 SBP

  • 0.077 4.40×10-04

TG

  • 0.049 3.69×10-02

DBP

  • 0.077 8.30×10-04

FPG

  • 0.066 4.95×10-03

15q22.1b rs16940212 0.340 (T) 17.299 5.41×10-20 WC 0.002 9.18×10-01

  • HDLc
  • 0.132 1.37×10-16

SBP

  • 0.012 4.25×10-01

TG 0.020 2.11×10-01 DBP

  • 0.005 7.25×10-01

FPG

  • 0.002 8.94×10-01

16p12.3c rs7197218 XYLT1 0.014 (G) 7.935 1.49×10-08 WC

  • 0.033 6.07×10-01
  • HDLc
  • 0.191 3.69×10-03

SBP

  • 0.012 4.25×10-01

TG

  • 0.106 1.04×10-01

DBP 0.061 3.34×10-01 FPG 0.352 6.29×10-08 16q13b rs6499863 CETP 0.104 (A) 9.116 5.76×10-10 WC 0.014 5.60×10-01

  • HDLc

0.152 1.06×10-09 SBP

  • 0.003 8.99×10-01

TG

  • 0.005 8.25×10-01

DBP 0.030 2.18×10-01 FPG 0.016 5.09×10-01

Results from Multivariate approach

slide-38
SLIDE 38

Reported loci in GWAS catalog

Locus SNP Nearby Gene MAF (Minor allele) GWAS catalog 2p23.3a rs780094 GCKR 0.463 (C) Manning et al., (Nat Genet, 2012): Fasting glucose, Kristiansson et al., (Circ Cardiovasc Genet, 2012): Metabolic Syndrome, Suhre et al., (Nature, 2012): Metabolic trait, Dupuis (Nat Genet, 2010): Fasting glucose and insulin, Aulchenko et al., Nat Genet, 2008):TG, Wallace et al., (Am J Hum Genet, 2008): LDL Kim et al., (Nat Genet, 2011): Metabolite levels, TG 8p21.3c rs10503669 LDL 0.124 (A) Kim et al., (Nat Genet, 2011), Willer et al., (Nat Genet, 2008): HDL, TG 8q24.13e rs2001945 0.416 (G) Kim et al., (Nat Genet, 2011): TG 9q31.1d rs12686004 ABCA1 0.214 (A) Kim et al., (Nat Genet, 2011): HDL 11q12.2b rs174547 FADS1 0.324 (C) Han et al., (Bone, 2012): appendicular lean mass, Kettunen et al., (Nat Genet, 2012): human serum metabolite levels, Suhre et al., (Nature, 2011), Illig et al., (Nat genet, 2010): metabolism, Kathiresan et al., (Nat genet, 2008): HDL, TG

Newly identified in Korean population from this study

12q24.11d rs12229654 MYL2 0.144 (G) Kim et al., (Nat Genet, 2011): HDL,GGT Go et al., (J Hum Genet): T2D 12q24.13a rs11066280 C12orf5 0.173 (A) Kim et al., (Nat Genet, 2011): metabolite levels, Kato et al., (Nat Genet, 2011): BP 12q24.13b rs2072134 OAS3 0.115 (A) Kim et al., (Nat Genet, 2011): HDL 15q22.1b rs16940212 0.340 (T) Kim et al., (Nat Genet, 2011): HDL 16q13b rs6499863 CETP 0.104 (A) Ridker et al., (Circ Cardiovasc Genet, 2009): HDL, Kathiresan et al., (Nat Genet, 2008): HDL, TG, Saxena et al., (Science, 2007): TG Kim et al., (Nat Genet, 2011): HDL 12q24.12a rs2238153 ATXN2 0.460 (A)

Newly identified from this study

16p12.3c rs7197218 XYLT1 0.014 (G)

Newly identified from this study

slide-39
SLIDE 39

Contents

Introduction 1 Multivariate analysis 2 Application: Korean Association REsource (KARE) Project 3 Simulation Study 4 Conclusion 5

slide-40
SLIDE 40

Simulation Study

— Performance comparison between Univariate and Multivariate

approach

— Effect of Correlation between phenotypes — Minor allele frequency — Genetic effect size (coefficient of genetic variants) — Association direction of genetic variants

— Same vs. Different

— Type I error — Power

slide-41
SLIDE 41

Simulation settings (1)

— Single SNP association — Minor allele frequency

— MAF: 0.01 ~ 0.3

— SNP generation

— Under HWE assumption

— Marginal correlation coefficient between phenotypes

— # of phenotypes: 2, 4, 8 (quantitative responses) —

correlation={0, 0.25, 0.5, 0.75}

— Genetic effect size (β) — Association direction — Correlated-phenotype generations using Multivariate Normal distribution

slide-42
SLIDE 42

Simulation settings (2)

— Heterogeneous model when the number of trait is 8 — Homogeneous model

— common genetic effect

) 8 ,..., 2 , 1 ( ,

1

= + + = j x y

ij i j j ij

ε β β

15 11 18 17 16 15 14 13 12 11

β β β β β β β β β β − = = = = = = =

) 8 ,..., 2 , 1 ( ,

1

= + + = j x y

ij i j ij

ε β β

slide-43
SLIDE 43

Type I error rates and Power calculations

— Empirical type I error rate (false positive error rates)

— Null hypothesis for univariate analysis — Null hypothesis

for heterogeneous model

— Null hypothesis

for homogeneous model

— Generating 100,000 replicates for 1,000 samples

— Four correlated phenotypes

— Generating 10,000,000 replicates for 10,000 samples

— Empirical power (true positive rate)

— Various nonzero values of and for heterogeneous model — Various nonzero values of for homogeneous model

— Generating 1,000 replicates for 1,000 samples

) (or

15 11

= = β β

) ,..., , ( ) ,...., , (

18 12 11

= β β β

1 =

β

11

β

15

β

1

β

slide-44
SLIDE 44

Type I error rates (1)

– two phenotypes, MAF=0.15 (β11,β15)=(0,0)

α Correlation (ρ) Univariate approach Multivariate approach Univariate 1 Univariate 2 0.05 5.23×10-02 4.98×10-02 5.03×10-02 0.25 4.88×10-02 4.97×10-02 4.95×10-02 0.5 5.02×10-02 5.00×10-02 5.05×10-02 0.75 4.95×10-02 4.91×10-02 5.01×10-02 0.01 1.02×10-02 9.90×10-03 9.58×10-03 0.25 9.66×10-03 1.02×10-02 1.01×10-02 0.5 1.03×10-02 9.53×10-03 9.75×10-03 0.75 1.02×10-02 1.04×10-02 1.01×10-02 10-3 9.50×10-04 8.50×10-04 8.30×10-04 0.25 9.10×10-04 1.04×10-03 9.10×10-04 0.5 8.80×10-04 1.04×10-03 8.50×10-04 0.75 1.07×10-03 1.02×10-03 1.05×10-03 10-4 9.00×10-05 7.00×10-05 1.10×10-04 0.25 6.00×10-05 1.60×10-04 5.00×10-05 0.5 1.10×10-04 1.20×10-04 1.20×10-04 0.75 1.30×10-04 1.30×10-04 1.00×10-04

slide-45
SLIDE 45

α Correlation (ρ) Univariate approach Multivariate approach Univariate 1 Univariate 2 10-5 2.00×10-05 2.00×10-05 2.00×10-05 0.25 0.00×10+00 0.00×10+00 0.00×10+00 0.5 1.00×10-05 2.00×10-05 2.00×10-05 0.75 2.00×10-05 3.00×10-05 1.00×10-05 10-6 0.00×10+00 0.00×10+00 0.00×10+00 0.25 0.00×10+00 0.00×10+00 0.00×10+00 0.5 0.00×10+00 0.00×10+00 0.00×10+00 0.75 1.00×10-05 0.00×10+00 0.00×10+00 10-7 0.00×10+00 0.00×10+00 0.00×10+00 0.25 0.00×10+00 0.00×10+00 0.00×10+00 0.5 0.00×10+00 0.00×10+00 0.00×10+00 0.75 0.00×10+00 0.00×10+00 0.00×10+00

slide-46
SLIDE 46

Type I error rates (2)

– Four phenotypes with 108 replicates

MAF α Univariate approach Multivariate approach Univariate 1 Univariate 2 Univariate 3 Univariate 4 0.01 0.05 5.00×10-02 5.00×10-02 5.00×10-02 5.00×10-02 5.00×10-02 0.01 1.00×10-02 1.00×10-02 1.00×10-02 1.00×10-02 1.00×10-02 10-3 1.00×10-03 1.00×10-03 1.00×10-03 1.00×10-03 1.00×10-03 10-4 9.93×10-05 1.01×10-04 1.00×10-04 1.00×10-04 1.00×10-04 10-5 1.02×10-05 9.64×10-06 1.02×10-05 1.06×10-05 1.05×10-05 10-6 9.50×10-07 1.13×10-06 9.80×10-07 1.11×10-06 8.80×10-07 10-7 7.00×10-08 1.30×10-07 1.00×10-07 9.00×10-08 1.10×10-07 10-8 1.00×10-08 2.00×10-08 1.00×10-08 3.00×10-08 1.00×10-08 0.03 0.05 5.00×10-02 5.00×10-02 5.00×10-02 5.00×10-02 5.00×10-02 0.01 1.00×10-02 1.00×10-02 1.00×10-02 1.00×10-02 1.00×10-02 10-3 1.01×10-03 1.00×10-03 1.00×10-03 1.00×10-03 1.00×10-03 10-4 1.00×10-04 9.98×10-05 9.96×10-05 1.01×10-04 1.01×10-04 10-5 1.04×10-05 9.95×10-06 9.74×10-06 1.02×10-05 1.03×10-05 10-6 1.10×10-06 1.09×10-06 9.30×10-07 1.00×10-06 1.27×10-06 10-7 1.10×10-07 7.00×10-08 6.00×10-08 1.40×10-07 1.80×10-07 10-8 1.00×10-08 0.00×10+00 0.00×10+00 3.00×10-08 1.00×10-08

slide-47
SLIDE 47

MAF α Univariate approach Multivariate approach Univariate 1 Univariate 2 Univariate 3 Univariate 4 0.05 0.05 5.00×10-02 5.00×10-02 5.00×10-02 5.00×10-02 5.00×10-02 0.01 9.98×10-03 1.00×10-02 1.00×10-02 1.00×10-02 1.00×10-02 10-3 9.96×10-04 9.97×10-04 9.98×10-04 1.00×10-03 1.00×10-03 10-4 9.80×10-05 9.95×10-05 1.01×10-04 1.02×10-04 1.02×10-04 10-5 9.67×10-06 1.06×10-05 1.02×10-05 1.01×10-05 9.87×10-06 10-6 9.30×10-07 9.60×10-07 1.00×10-06 8.90×10-07 1.01×10-06 10-7 3.00×10-08 6.00×10-08 1.30×10-07 8.00×10-08 1.70×10-07 10-8 0.00×10+00 2.00×10-08 2.00×10-08 4.00×10-08 1.00×10-08 0.07 0.05 5.00×10-02 5.00×10-02 5.00×10-02 5.00×10-02 5.00×10-02 0.01 1.00×10-02 1.00×10-02 9.99×10-03 9.99×10-03 1.00×10-02 10-3 1.00×10-03 1.00×10-03 9.99×10-04 9.97×10-04 1.00×10-03 10-4 1.01×10-04 9.99×10-05 9.93×10-05 9.91×10-05 1.02×10-04 10-5 1.01×10-05 1.00×10-05 1.03×10-05 1.06×10-05 1.03×10-05 10-6 9.30×10-07 8.70×10-07 9.70×10-07 9.70×10-07 1.06×10-06 10-7 1.20×10-07 9.00×10-08 7.00×10-08 1.40×10-07 1.40×10-07 10-8 2.00×10-08 3.00×10-08 0.00×10+00 0.00×10+00 1.00×10-08 0.09 0.05 5.00×10-02 5.00×10-02 5.00×10-02 5.00×10-02 5.00×10-02 0.01 9.99×10-03 9.99×10-03 1.00×10-02 9.99×10-03 1.00×10-02 10-3 1.01×10-03 9.99×10-04 9.99×10-04 1.00×10-03 1.00×10-03 10-4 1.02×10-04 1.00×10-04 1.01×10-04 1.00×10-04 9.85×10-05 10-5 1.03×10-05 1.00×10-05 1.06×10-05 1.01×10-05 9.44×10-06 10-6 9.70×10-07 1.05×10-06 1.03×10-06 1.22×10-06 1.00×10-06 10-7 1.10×10-07 1.40×10-07 1.40×10-07 1.80×10-07 9.00×10-08 10-8 0.00×10+00 1.00×10-08 1.00×10-08 3.00×10-08 1.00×10-08

slide-48
SLIDE 48

Power (1)

Same genetic effect size: β11=β15= 0.1

α=0.05

slide-49
SLIDE 49

Power (2)

Opposite directions of association: (β11,β15)=(-0.05,0.05)

α=0.05

slide-50
SLIDE 50

Power (3) Opposite directions of association: (β11,β15)=(-0.1,0.1)

α=0.05 α=0.01 α=10-4 α=10-6

slide-51
SLIDE 51

Power (4)

Homogeneous effect: MultiPhen vs. Multivariate linear model

α=0.05

slide-52
SLIDE 52

Summary of simulation studies

— Multivariate analysis preserves type I error — Multivariate analysis is more powerful than univariate analysis

— Low correlation among the phenotypes — Homogeneous multivaraiate model: same direction with same effect sizes

— Common slope model

— Heterogeneous multivariate model: different association direction for the

phenotypes

— Possible false-positive error

slide-53
SLIDE 53

Contents

Introduction 1 Multivariate analysis 2 Application: Korean Association REsource (KARE) Project 3 Simulation Study 4 Conclusion 5

slide-54
SLIDE 54

Summary

— Multivariate approach

— Analyzing several phenotypes simultaneously — Considering correlation between related–phenotype in one model

— Suitable to detect pleiotropic genes

— Reducing multiple comparison problems by reducing the number of tests — Improving power for identifying genetic variants with related phenotypes in

GWAS

slide-55
SLIDE 55

Summary

— More powerful than univariate analysis

— Low correlation between phenotypes — Same association direction with similar genetic effect

— Common slope model can improve power

— Different association direction

— Carefully investigated to avoid any false-positive errors

— Can be applicable to the repeated measures in Cohort studies — Rare variants studies from next generation sequencing (NGS) data

slide-56
SLIDE 56

— Multivariate + multiple analysis

— Focus on multiple related phenotypes and multiple SNPs

SNP K

… …

SNP-SNP Interaction

SNP J SNP I SNP 1 SNP 2

accumulated additive effects on multiple SNPs

Trait 1 Trait 2 Trait 3 Trait 4 Trait 5

Related phenotypes

SNP 500K

Extensions

slide-57
SLIDE 57

— Choi J, Park T. (2013) Multivariategeneralized multifactor dimensionality

reduction to detect gene-gene interactions., BMC Syst Biol.

— Yu W, Kwon MS, Park T. (2015). MultivariateQuantitative Multifactor

Dimensionality Reduction for Detecting Gene-Gene Interactions., Hum Hered.

— Won S, Kim W, Lee S, Lee Y, Sung J, Park T. (2015). Family-based

association analysis: a fast and efficient method of multivariate association analysis with multiple variants., BMC Bioinformatics.

— Yu W, Lee S, Park T. (2016). A unified multifactor dimensionality reduction

framework for detecting gene-gene interactions, Bioinformatics

— Lee S, Kim Y, Park T. (2016). Rare Variant Association Test with Multiple

  • Phenotypes. Genetic Epi (in press)

Extensions

slide-58
SLIDE 58

Acknowledgement

— This work was supported by a grant funded by the Bio-Synergy

Research Project (2013M3A9C4078158) of the Ministry of science, ICT, and Future Planning, also through the NRF, Korea and by a NRF grant (2015R1A5A6001906).

slide-59
SLIDE 59

Thank you!!!

slide-60
SLIDE 60

Conclusion

— 52 variants in twelve chromosomal regions were identified from multivariate

approach at α=5x10-8

— Identified chromosomal regions reported to be associated with lipids,

diabetes-related and metabolic related phenotypes in the previous GWAS

— Three chromosomal regions newly identified

— Simulation studies

— Homogeneous genetic effect model is more efficient and provides more

power

— The direction of genetic effects for the phenotypes are the same and of

similar magnitude

— Multivariate approach provides higher power than univariate approach

— The directions of genetic effects for the phenotypes were different — It is important to understand underlying biological background between ge

ne and phenotype (∵Conditional association between genetic variants and phenotypes)

slide-61
SLIDE 61

Acknowledgement

q Bioinformatics and Biostatistics Lab., SNU § Sohee Oh, JaehoonLee, Kyunga Kim, Dankyu Yoon, Min-Seok Kwon, Junghyun Namkung, q Center for Genome Science , KNIH, KCDC § Yoon Shin Cho , Min Jin Go, Young Jin Kim § Hyung-Lae Kim, Bok-Ghee Han, Jong-Young Lee q KARE Consortium § Bermseok Oh, Kyung Hee University q DNA Link co. Korea § Jong-Eun Lee q Cohort PIs

  • Nam Han Cho, Chol Shin
slide-62
SLIDE 62

Korea

Geographical Location of the Cohorts

KARE

slide-63
SLIDE 63

KARE: Result

SNP Clinical Data

2009 Nature genetics

Detection of 11 SNPs influencing traits in Korean population Blood pressure, pulse rate, BMI, height, waist-hip ratio, bone mineral density

KARE

slide-64
SLIDE 64

GWAS meta-analysis using KARE

Nature, 2010

Detection of 95 loci influencing traits in 100K European population and replication study in non-European populations (East Asians, South Asians, and African Americans) Total cholesterol (TC), LDL-C, HDL-C, TG Identifying potential novel drug targets for treatment of extreme Lipid phenotypes and prevention of coronary artery disease (CAD)

SNP Clinical Data Lipid Traits European /Non-European

Biological, Clinical, and Population Relevance of 95 Loci Mapped for Serum Lipid Concentrations

Tanya M. Teslovich1,118, Kiran Musunuru2,3,4,5,6,118, Albert V. Smith7,8, Andrew C. Edmondson9,10, Ioannis M. Stylianou10, Masahiro Koseki11, James P . Pirruccello2,5,6, Samuli Ripatti12,13, ….. , Yoon Shin Cho29, Min Jin Go29, Young Jin Kim29, Jong-Young Lee29, Taesung Park30, Kyunga J. Kim31,32, ..... , Gonçalo R. Abecasis1,119, Michael Boehnke1,119, Sekar Kathiresan2,3,4,5,119

KARE

slide-65
SLIDE 65

Results

slide-66
SLIDE 66

Quantile-Quantile plot from a multivariate approach

slide-67
SLIDE 67

Quantile-Quantile plots from univariate approaches

slide-68
SLIDE 68

Quantile-Quantile plots

slide-69
SLIDE 69

Genetic association study approaches

One trait + Single genetic variant One trait + Multiple genetic variants Multiple traits + Single genetic variant Multiple traits + Multiple genetic variants

Genetic Association Study

slide-70
SLIDE 70

Functional enrichment analysis

— Functional enrichment analysis for the identified genes from

multivariate analysis (DAVIS)

Enrichement score: 1.48 Gene Count P-value Benjamini Lipid metabolism ABCA1, CETP, FADS1 3 8.8E-3 4.0E-1 Small molecule metabolic process ABCA1, CETP, FADS1, GCKR 4 2.8E-2 9.5E-1 Transport ABCA1, CETP, FADS1 3 1.4E-1 8.4E-1