Cindy G. Boer Genetic Laboratory Internal Medicine Erasmus MC - - PowerPoint PPT Presentation

cindy g boer
SMART_READER_LITE
LIVE PREVIEW

Cindy G. Boer Genetic Laboratory Internal Medicine Erasmus MC - - PowerPoint PPT Presentation

Gene Regulation, Epigenetics & Databases Cindy G. Boer Genetic Laboratory Internal Medicine Erasmus MC Congratulations! A genome-wide significant GWAS hit! (and what to do now?) GWAS identifies SNPs not Genes! We want to know Causal


slide-1
SLIDE 1

Gene Regulation, Epigenetics & Databases

Cindy G. Boer

Genetic Laboratory Internal Medicine Erasmus MC

slide-2
SLIDE 2

Congratulations!

A genome-wide significant GWAS hit! (and what to do now?)

slide-3
SLIDE 3

GWAS identifies SNPs not Genes!

We want to know  Causal gene & disease mechanism……

This question presents us with 2 problems:

  • 1. What is the causal variant ?
  • 2. What is the causal cell type(s)?

Causal variant  Causal cell type  Causal gene

slide-4
SLIDE 4

Identification of Causal variant?

Linkage Disequilibrium (LD)

  • GWAS Association between disease trait and (tag) SNP

– Array designed on LD structure not functional SNP (imputation)

  • None, few, tens even hundreds of SNPs in LD with top SNP

GWAS!

Locus zoom plot

  • LD structure

plotted

  • SNPs high LD
  • (r2 >0.8 or r2 >

0.6)

Castaño Betancourt, et al.,(2016), PLOS genetics

slide-5
SLIDE 5

Genome-wide association signal

Step 1: Annotation!

Top SNP (+SNPs LD >0.8)  (one SNP/multiple) located in the coding sequence of a gene

  • Synonymous? Or Non-Synonymous?
  • Gene? What is known, what does it do?

– Damaging effect of the hit? (first part of the practical)

slide-6
SLIDE 6

Genome-wide association signal

Step 1: annotation [realistic scenario]

Most GWAS findings are located in non-coding regions of the genome [M.T. Maurano et al., Science, 337, 1190 (2012)] – Introns or intergenic – ~ 98.5% human genome is non-coding

Difficult to link SNP  Gene Phenotype

slide-7
SLIDE 7

Regulatory elements

GWAS SNPs are enriched for regulatory elements. Regulatory regions

Promoters, enhancers, inhibitors, insulators, transcription factor binding sites etc. 1. What is a regulatory region/how is a regulatory region defined? 2. How will you know if your hit is located in a regulatory region?

[M.T. Maurano et al., Science, 337, 1190 (2012)]

slide-8
SLIDE 8

Databases & Tools: Online collection of (molecular) biological data ant tools that are:

  • Structured & Searchable
  • Publically available
  • Updated periodically & Cross-referenced
  • Literature
  • Data from research

Bioinformatics: “Mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information”

slide-9
SLIDE 9

Gene Regulation databases

slide-10
SLIDE 10

The Central Dogma (of molecular biology)

Epigenomics: All epigenetic modifications on the genetic material of a cell

slide-11
SLIDE 11

Epigenetics

“Epigenetic mechanisms can control the functions of noncoding sequences of DNA”. The regulation and control

  • f gene expression is

essential for cell function, survival, differentiation

slide-12
SLIDE 12

Histones & Chromatin

slide-13
SLIDE 13

DNA structure & Regulation

DNase hypersensitive regions  open chromatin configuration

slide-14
SLIDE 14

DNA structure & Regulation

slide-15
SLIDE 15

The Histone Code

Histone code: multiple histone post translational modifications (PTMs)  specific unique downstream functions Specific proteins involved in gene control recognize and interrogate the patterns of histone modifications:

  • Ex. RNA polymerase II, Transcription factors & DNA binding proteins
  • Transcription factor recruitment
  • Chromatin shape and function
slide-16
SLIDE 16

Epigenetics: Histone Code

Inactive Promoter Active Promoter H3K27me3 H3K4me3 [promoter specific] DNA methylation H2A.Z [histone variant] Inactive Enhancer Active Enhancer H3K9me2 H3K4me1 [enhancer specific] DNA methylation H2A.Z [histone variant]

Hundreds histone PTM’s Known!

slide-17
SLIDE 17
slide-18
SLIDE 18

Regulatory regions: Chromatin States

ENCODE/ROADMAP

  • “15-state model”
  • Histone

modifications

  • DNAse sites
  • TF-binding Sites

Roadmap Epigenomics Consortium, et al., Nature 2015

slide-19
SLIDE 19

Epigenetics: symphony No. 9

slide-20
SLIDE 20

DNA binding proteins

DNA-binding proteins: Transcription factors, nucleases, other DN binding proteins Non-specific binding: polymerases, histones Specific binding: Transcription factors, nucleases Specific binding  recognition consensus sequence  Change in consensus sequence  change in DNA binding affinity?  change in gene regulation/expression?

slide-21
SLIDE 21

Consensus sequences

  • DNA binding motif: “recognition sequence”
  • Found in databases:

– JASPAR database – Integrated in HaploReg (practical)

Can also be affected by methylation! (EWAS)

slide-22
SLIDE 22

CTCF methylation

CTCF binding is affected by methylation in it’s core sequence

Proper CTCF functioning is essential! “severe dysregulation of CTCF in cancer cells” Mouse mutants CTCF – embryonic lethal

slide-23
SLIDE 23
slide-24
SLIDE 24

So Far we have:

Annotation:

  • Location (Chr/Bp)
  • Coding/non-coding
  • DNA regulatory elements

– (and open chromatin sites)

  • Transcription factor binding sites

GWAS & EWAS goal

Identify novel targets/genes involved in phenotype X So far only annotation, No (potential) causal gene

slide-25
SLIDE 25

Gene Regulation

Adapted from: Alberts, Molecular Biology of the Cell 5th Edition, figure 7-44

Typical eukaryotic gene regulation

  • Complex 3D looping (CTCF)
  • Multiple regulatory regions
  • Involvement of multiple transcription factors
  • Can be cell type specific

Gene regulation is highly complex!

slide-26
SLIDE 26

Gene Regulation

  • ~1 MB (1000.0000 base pairs) long range

regulation

– Sonic Hedgehog, essential developmental gene

slide-27
SLIDE 27
slide-28
SLIDE 28

Circadian rhythm : Epigenetics

  • Mammalian circadian clock
  • Oscillation of ~ 24h

– Light-dark cycle (melatonin secretion), Feed cycle

  • A conserved transcriptional–translational

auto-regulatory loop generates molecular

  • scillations of ‘clock genes’ at the cellular level

PARP1- and CTCF-Mediated Interactions between Active and Repressed Chromatin at the Lamina Promote Oscillating Transcription, Zhao et al., 2015 Molecular Cell

slide-29
SLIDE 29

Complex 3D structure [Movie Time]

slide-30
SLIDE 30

SNP to gene: even more complicated than you thought

Cannon, ME et al., 2018, American Journal of Hum Genet

Even if authors did everything they could to determine the causal gene, they might be wrong!

slide-31
SLIDE 31

Finding [causal] Genes

Cell type specify is useful & Important:

  • Gene expression levels (RNA-seq)

– Predicted promoter activity in cell type – Predicted gene activity (ex active gene transcription mark: H3K36me3)

  • Gene expression – Genotype

– eQTL’s! (Thursday lecture/practical) – Also Cell type specific!

slide-32
SLIDE 32

Phenotype - Alzheimer

Enhancer Marks in Brain? Enhancer Marks in Heart?

slide-33
SLIDE 33

Causal Genes: Example

 Enhancer site (likely) to regulate gene 1 or gene 2 (or both)?

slide-34
SLIDE 34

Cell type selection:

  • Not in all cases the selection of target tissue will be easy:

– Cell fate – Cell state and Cell type – Complex diseases & phenotypes

Availability of material & data Proxy tissues:

  • Same lineage, similar functioning tissue
  • (gene of interest) expression vs no expression
  • Tools & databases to select target tissue
  • GWAS SNPs are enriched for gene regulatory regions….in

target cell type!

slide-35
SLIDE 35
slide-36
SLIDE 36

Genome-wide association signal

Cannon, ME et al., 2018, American Journal of Hum Genet

slide-37
SLIDE 37

..How to Find?

  • Where is your hit (SNP) located?

– Chromosome & position – Near or in which genes

  • Coding variant

– Synonymous/non-synonymous

  • Regulatory regions
  • 3D structure of the genome
  • Candidate gene

– gene function

  • Cell type?
slide-38
SLIDE 38

Databases & Tools: Online collection of (molecular) biological data ant tools that are:

  • Structured & Searchable
  • Publically available
  • Updated periodically & Cross-referenced
  • Literature
  • Data from research

Bioinformatics: “Mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information”

slide-39
SLIDE 39

Bioinformatic databases & Tools

  • Cross-referenced!
  • Also do own cross

reference!

  • Regular Updated!
slide-40
SLIDE 40

Biological databases

  • Pubmed – Literature database
  • Categorized databases: too much to name

– Genomic variation: dbSNP, HapMa .... – Sequence: NCBI RefSeq database, Entrez Nucleotide, miRbase... – Proteins: RCSB protein databand, UniProt, SMART... – Pathways: KEGG, Reactome, STRING... – DNA annotation: ENCODE, ROADMAP epigenetics

  • Genome Browsers: genomic database, integrating all

data associated to genome annotation & function.

  • Mining Tools: FUMA & HaploReg
slide-41
SLIDE 41

Genome Browser

  • Displaying, viewing and accessing genome

annotation data

  • Genome annotation:

– DNA-variation information, epigenetic regulation, transcription, translation, disease information...

  • Links to other specialized Databases
slide-42
SLIDE 42

Difference?

  • NCBI, UCSC and EnsEMBL use the same human genome

assembly generated by NCBI

– Release timing and data availability can differ between sites

  • NOTE: the version of the genome assembly

– Annotation location and availability will be different between different assemblies

  • Own preference which to use
  • Practical: mainly UCSC and some forays into other databases,

including NCBI, EnsEMBL & ENCODE

slide-43
SLIDE 43

..How to Find?

  • Where is your hit (SNP) located?

– Chromosome & position – Near or in which genes

  • Coding variant

– Synonymous/non-synonymous

  • Regulatory regions
  • 3D structure of the genome
  • Candidate gene

– gene function

  • Cell type?
slide-44
SLIDE 44

Mining Tools

FUMA Functional Mapping and Annotation of Genome-Wide Association Studies

– Monday Practical & Todays practical – Novel Tool!

slide-45
SLIDE 45

Mining Tools

HaploReg

HaploReg is a tool for exploring annotations of the noncoding genome at variants on haplotype blocks, such as candidate regulatory SNPs at disease-associated loci.

  • Mine ENCODE & RADMAP data  be careful! Not always up to-date
  • r gives clear information!
slide-46
SLIDE 46
slide-47
SLIDE 47

Your Research

Play with the tools Lot’s of (useful) information

  • Be critical

– Check the outcome – Know the data – References – Hypothesis building only!

Go and get lost...

(and write down where you went)

Your research NEEDS biological databases!

slide-48
SLIDE 48

The Practical

  • UCSC genome browser  links to other databases & data

– Ensembl, ENCODE, ROADMAP, HaploREG, FUMA, GTEX………..

  • 3 part practical

I. Beginner database and bioinformatictools (FUMA, UCSC, HaploReg) II. Advanced: adding regulatory data and gene expression data III. More Advanced: Adding 3D chromatin structure to your annotation

Focus on “real life” examples  Use for your own research!

slide-49
SLIDE 49

UCSC Genome Browser

slide-50
SLIDE 50

UCSC Genome Browser

slide-51
SLIDE 51

UCSC Genome Browser

slide-52
SLIDE 52

Hints for the Practical

  • Ask us anything (me, Linda & Joost)
  • (related to the practical or genetics)
  • DNA is LARGE and a 3D molecule

– So check your surroundings! (i.e. zoom out)

  • Can I click on it? YES

 more information!  more track control!

  • GIYF: Google is your friend
  • Practical is in 3 parts

– Intro – standard – difficult

  • CHROME or FIREFOX!

& Enjoy (or try to)

slide-53
SLIDE 53

Questions?