Searching for the genetic basis of complex traits in humans and - - PowerPoint PPT Presentation

searching for the genetic basis of complex traits in
SMART_READER_LITE
LIVE PREVIEW

Searching for the genetic basis of complex traits in humans and - - PowerPoint PPT Presentation

Searching for the genetic basis of complex traits in humans and primates Vasily Ramensky UCLA Center for Neurobehavioral Genetics 22/03/16 University of California Los Angeles Center for Neurobehavioral Genetics -- 35-year project aimed to


slide-1
SLIDE 1

Searching for the genetic basis of complex traits in humans and primates

Vasily Ramensky UCLA Center for Neurobehavioral Genetics 22/03/16

slide-2
SLIDE 2

University of California Los Angeles Center for Neurobehavioral Genetics

slide-3
SLIDE 3
  • - 35-year project aimed to decrease the global economic

and health impact of depression by 50% by 2050

  • - 100,000 individuals to be enrolled
  • - The largest UCLA research initiative thus far, with an

anticipated budget of $525 million for the first 10 years

slide-4
SLIDE 4

Projects

Finnish Metabolic Sequencing: genetic basis of quantitative metabolic traits in the Finnish population

  • - Target gene sequencing in >6,000 NFBC1966 members
  • - Whole exome sequencing in 20,000 individual

Vervet monkeys: non-human primates in biomedical research

  • - Whole genome sequencing of >700 members of Vervet Research Colony

Tourette Syndrome: genetic basis of Tourette Syndrome

  • - Exome and targeted sequencing of >100 members of large TS pedigrees
  • - GWAS studies of large TS cohorts

Bipolar disorder: genetic factors that contribute to risk for bipolar disorder

  • - Whole genome sequencing of 450 members of large pedigrees from

Colombia and Costa Rica with severe form of bipolar disorder

slide-5
SLIDE 5

Finnish Metabolic Sequencing

slide-6
SLIDE 6
  • - Founder population, inhabited Northern Finnland in the 1600s
  • - Genetic isolate, homogeneous in genetic and environmental

background, enriched in potentially damaging variants

  • - Birth cohort: no age as a confounder; longitudinal data
  • - Quantitative heritable traits:

* body mass index, * fasting serum concentrations of lipids, * glucose and insulin, * inflammation CRP, * blood pressure

Finnish Metabolic Sequencing

Northern Finnland Birth Cohort 1966

slide-7
SLIDE 7

Finnish Metabolic Sequencing

GWAS in NFBC66: Sabatti et al., 2009

31 associations to 6 traits, 9 associations previously unreported

slide-8
SLIDE 8

Finnish Metabolic Sequencing

GWAS in NFBC66: Sabatti et al., 2009

slide-9
SLIDE 9

Finnish Metabolic Sequencing

GWAS in NFBC66: Sabatti et al., 2009

Identified loci explained little of trait variability => contribution of rare variants?

slide-10
SLIDE 10

Genetic architecture of complex traits

slide-11
SLIDE 11
  • - 78 genes in 6,121 samples, 17 loci on 10 chr
  • - 2,234 variants, 76% with MAF<=0.5%
  • - Single variant tests: variants with MAF>0.1% in additive

genetic model

  • - Gene-level tests: missense variants with MAF<1%
  • - Goal: new single variant signals independent from GWAS
  • r associations at the gene level

Finnish Metabolic Sequencing

Targeted sequencing in NFBC66 and FUSION

slide-12
SLIDE 12
slide-13
SLIDE 13

Why?

  • - Insertions and deletions
  • - Epistatic interactions
  • - Compound heterozygotes
  • - Testing all rare missense variants
  • - Non-coding regulatory variants

Finnish Metabolic Sequencing

Targeted sequencing in NFBC66 and FUSION

slide-14
SLIDE 14

Tourette Syndrome

slide-15
SLIDE 15

Tourette Syndrome

  • - An inherited disorder, childhood onset (prevalence 0.4-3.8%)
  • - Multiple physical (motor) and vocal tics
  • - Linkage studies of large families: genetic signal on chr2p
  • - No significant associations for coding exome variants
  • - Exome + targeted non-coding regions on chr2p in 109

individuals from 15 large TS families (65 affected, 35 not affected, 9 unknown)

  • - Genotyping of candidate variants in >700 individuals from

sib-pair families (UCLA)

  • - GWAS studies in multiple cohorts
slide-16
SLIDE 16

Tourette Syndrome

Candidate variants in the chr2p region

Pos, Mbp Region dbSNP AAF Idx Segregation Aff (Fam) Chi2 Epigenomic info

59.1 FLJ30838 FunSeq enhancer 0.91% 5 9 (4) 0.30 Enh H9 Neuronal Progen Cells (REMC) 60.5 AC007381 Intron 0.78% 2 8 (3) 0.04 Fetal Brain (REMC) 60.8 N/A 9.4% 30 (10) 0.001 LBL enh

// Idx: conserv. mammals, primates, CADD, DANN, fatHMM-mkl

slide-17
SLIDE 17

Jeremiah Scharf, Dongmei Yu

slide-18
SLIDE 18

Tourette Syndrome

LINC01122

BrainSpan: RNA-seq in 524 prenatal and postnatal samples Time points Brain regions

slide-19
SLIDE 19

BCL11A

Tourette Syndrome

BrainSpan: RNA-seq in 524 prenatal and postnatal samples Time points Brain regions

slide-20
SLIDE 20
  • 3
  • 2
  • 1

1 2 3 4

  • 2
  • 1

1 2 3 4

Normalized expression (X-Xmean)/Xstdev Rcorr=0.723

LINC01122 BCL11A

slide-21
SLIDE 21

Tourette Syndrome

Annotation of “anonymous” lincRNA 1) Search for genes coexpressed with query Q:

  • - Threshold: genes with Rcorr > R0
  • - Forward: genes in Q’s top x% // contaminated by

“promiscuous” genes

  • - Reverse: genes for which Q is in top x%
  • - Reverse-back-reverse (Gene’s best friends by Sasha

Favorov) 2) Check enriched GO terms for top ranked genes

slide-22
SLIDE 22

Tourette Syndrome

GO annotations for reverse and forward ranks

slide-23
SLIDE 23
slide-24
SLIDE 24

Sequencing in the VRC

slide-25
SLIDE 25

N ~ 2X104

Vervet Research Colony

slide-26
SLIDE 26

Non-human primates vs. humans and rodents

  • - Low sequence divergence, syntenic blocks
  • - Phenotypic similarity (brain/behavior, infectious

diseases, metabolism)

  • - Invasive studies are possible
  • - Controlled environment
  • - Longitudinal approaches are possible

Sequencing in the VRC

slide-27
SLIDE 27

Examples of available phenotypes:

  • - Brain and behavior: MRI, CSF monoamines, novelty

seeking, intruder challenge, anxiety, mother-infant interaction, sleep/circadian rhythms, cortisol, oxytocin

  • - Metabolism and growth: lipids, glycemic measures,

adipokines/leptin, vitamin D, morphometry (BMI)

  • - Microbiome at multiple body sites
  • - Life history traits and disease history
  • - RNA-seq: eQTLs from multiple tissues

Sequencing in the VRC

slide-28
SLIDE 28

Non-human primates vs. humans and rodents

  • - Low sequence divergence, syntenic blocks
  • - Phenotypic similarity (brain/behavior, infectious

diseases, metabolism)

  • - Invasive studies are possible
  • - Controlled environment
  • - Longitudinal approaches are possible
  • - No reference datasets (dbSNP, Encode, etc.)
  • - Not all tools work for highly inbred populations

Sequencing in the VRC

slide-29
SLIDE 29

Blue: Founders. Orange: sequenced monkeys, size ~ coverage

Sequencing in the VRC

slide-30
SLIDE 30
  • - WGS of >700 samples with varying coverage (1..30x)
  • - Reference genome C.sabaeus 1.1: 29 + 2 chr

Workflow:

  • - Raw variant calling with GATK, genotype refinement in trios
  • - Postprocessing: genotype conflicts, Mendelian errors, low qual
  • - Phasing in 99 = 82 HC + 17 LC samples with Beagle
  • - Phasing and imputation in 620 LC, 99 as reference haplotypes
  • - Postprocessing: Mendelian errors, QC, quality flags
  • - Two independent call sets: 16.7 mln SNVs genomewide,

1.3 mln extended exome SNVs and indels

Sequencing in the VRC

slide-31
SLIDE 31

NR annotation Variants %

  • Upstream-1000 325,968 23.8

Downstream-1000 284,953 20.8 Intron 174,171 12.7 3-UTR 167,523 12.2 Non-coding 144,099 10.5 5-UTR 102,395 7.5 Synon 79,477 5.8 Missense 75,436 5.5 Coding-exon-indel 10,325 0.8 Stop-gain 1,514 0.1 Donor 1,352 0.1 Acceptor 1,191 0.1 Stop-loss 187 0.0

  • Total 1,368,591

COMPLEX 50502 3.7 DEL 133861 9.8 INS 69993 5.1 SNV 1114235 81.4

Sequencing in the VRC

Variant annotation

slide-32
SLIDE 32

Alternative allele count distributions by type

Sequencing in the VRC

slide-33
SLIDE 33

Constrained human genes in vervets

  • - ExAC: exomes in 60,706 humans
  • - 3,230 genes depleted with PTVs (protein-truncating

variants: indels, splice site, stop gain)

  • - 3,118 constrained genes (96.5%) have vervet orthologs
  • - Of them, 1,256 vervet genes harbor 2,212 PTVs (total

13,665)

  • - Genes with multiple PTVs: not constrained in vervets?

Genes with few PTVs: check respective phenotypes

Sequencing in the VRC

slide-34
SLIDE 34

Unconstrained genes with many PTVs

Sequencing in the VRC

slide-35
SLIDE 35

Constrained genes with many PTVs

Sequencing in the VRC

slide-36
SLIDE 36

Alt allele counts for PTVs

Sequencing in the VRC

slide-37
SLIDE 37
slide-38
SLIDE 38

New methods to interpret genome variation

slide-39
SLIDE 39

New methods to interpret variation

Protein-truncating variants: why are they tolerated? Data:

  • - ExAC: ~60,000 human exomes
  • - Vervets: ~15,000 PTVs in 719 exomes
  • - Available microexon data

Approach

  • - Protein structure: models and features
slide-40
SLIDE 40

New methods to interpret variation

Good old missense variants Motivation?

  • - Prediction targeted at specific protein families
  • - Need to explain the mechanism
  • - Account for intragenic compensation
  • - Traditional training sets need revision
slide-41
SLIDE 41
slide-42
SLIDE 42

New methods to interpret variation

Good old missense variants Motivation?

  • - Prediction targeted at specific protein families
  • - Need to explain the mechanism
  • - Account for intragenic compensation
  • - Traditional training sets need revision

Data:

  • - New and emerging: NGS-based (ExAC)
  • - Old and forgotten:

functional experiments // How an impact on biochemical function translates to the clinical and population levels?

slide-43
SLIDE 43

New methods to interpret variation

Compensated pathogenic deviation

  • - A source of prediction errors for existing methods
  • - Fundamental mechanism of protein evolution and resistance

development for pathogens Data:

  • - Protein mutation databases: functional effect of M1, M1+M2…
  • - Literature-based
slide-44
SLIDE 44

New methods to interpret variation

Non-coding variation Data

  • - Genome sequence markup: genes and their elements
  • - Population-based variant frequencies (dbSNP, WGS)
  • - Genotype-phenotype associations (ClinVar, eQTLs, GWAS)
  • - Comparative genomics: conservation
  • - TF binding sites: experimental (ChIP-seq) and predicted
  • - Epigenomics data (REMC, ENCODE)

Problems

  • - Training sets
  • - Tissue specificity
slide-45
SLIDE 45

New methods to interpret variation

Non-coding genes: lincRNAs

  • - ~1/3 are specific to human lineage
  • - large fraction is brain-specific
  • - known to regulate neighboring protein-coding genes
  • - involved in gene expression regulation

// Derrien, et al. (2012) Genome Res Q: Can we attempt at more systematic annotation of non- coding RNA genes?

  • - Data: large-scale RNA-seq datasets (BrainSpan)
  • - Method: Gene’s best friends: thoughtful analysis of gene

expression correlation

slide-46
SLIDE 46

UCLA Nelson Freimer, Susan Service, Alden Huang, Giovanni Coppola, Jae-Hoon Sul, Ivette Zelaya, Yu Huang, Nam Tran, Christopher Schmitt, Yoon Jung, Terri Teshiba, Margaret Chu, Eleazar Eskin Mass General Hospital Jeremiah Scharf, Dongmei Yu University of Michigan Michael Boehnke, Tanya Teslovich, Christian Fuchsberger Washington University Wes Warren, Daniel Koboldt, Richard Wilson, University of Helsinki Samuli Ripatti University of Chicago Nancy Cox, Vasa Trubetskoy, Lea Davis Stanford University Chiara Sabatti Johns Hopkins Univ Alexander Favorov

Acknowledgments