EVA: Exome Variation Analyzer, a convivial tool for filtering - - PowerPoint PPT Presentation

eva exome variation analyzer a convivial tool for
SMART_READER_LITE
LIVE PREVIEW

EVA: Exome Variation Analyzer, a convivial tool for filtering - - PowerPoint PPT Presentation

NETTAB 2011 October 12-14, 2011, Pavia, Italy EVA: Exome Variation Analyzer, a convivial tool for filtering strategies S. Coutant 1,2 , A. Lefebvre 2 , M. Lonard 2 , . Prieur- Gaston 2 , D. Campion 1 , T. Lecroq 2 and H. Dauchel 2 1.


slide-1
SLIDE 1

EVA – NETTAB 2011

EVA: Exome Variation Analyzer, a convivial tool for filtering strategies

  • S. Coutant1,2, A. Lefebvre2, M. Léonard2, É. Prieur-

Gaston2, D. Campion1, T. Lecroq2 and H. Dauchel2

  • 1. University of Rouen, France, INSERM: National Institute of Health and Medical Research

U614: Molecular genetics of cancer and neuropsychiatric diseases

  • 2. University of Rouen, France, LITIS EA 4108: Computer science, information processing and systems laboratory

NETTAB 2011

October 12-14, 2011, Pavia, Italy

slide-2
SLIDE 2

EVA – NETTAB 2011

Identifying relevant genes

2 / 25

Use of genetic markers :

  • Quantitative Trait Locus mapping
  • Linkage Analysis
  • ...
  • Genome-Wide Association Study

→ Molecular basis for nearly 3,000 Mendelian disorders is known

N.O. Stitziel, A. Kiezun & S. Sunyaev. Computational and statistical approaches to analysing variants identified by exome sequencing. Genome Biology 12(9) 2011, 227

slide-3
SLIDE 3

EVA – NETTAB 2011

NGS: New Generation Sequencing

NGS

DNA-seq RNA-seq ChIP-seq

De novo sequencing Targeted sequencing Exome

3 / 25

  • J. Shendure & H. Ji. Next-generation DNA sequencing. Nature Biotechnology 26(10) (2008)

1135-1145

slide-4
SLIDE 4

EVA – NETTAB 2011

Exome Sequencing

4 / 25

The last issue of Genome Biology (volume 12 issue 9, 2011) is completely dedicated to exome sequencing

Exome sequencing in Nature Genetics:

  • 2010: 6 studies
  • 2011: 18 studies
  • Editorial. Nature Genetics 43 921 (2011)
slide-5
SLIDE 5

EVA – NETTAB 2011

The Agilent SureSelect Human All Exon Kit version 1 captures:

  • 180,000 CCDS database (NCBI)
  • 700 miRNA
  • 300 ncRNA

Exome

The “exome” represents all the exons in the genome (ie, the transcribed region of the genes)

Human exome:

  • 180,000 exons
  • ~30 Mb vs. ~3Gb for the whole genome
  • ~1% of the total human genome

38Mb (3 µg DNA needed) gene

Capture

5 / 25

slide-6
SLIDE 6

EVA – NETTAB 2011

Identifying a gene responsible in a Mendelian disorder was proved possible using whole exome sequencing.

Proof of concept

August 2009 6 / 25

slide-7
SLIDE 7

EVA – NETTAB 2011

17,000 cSNPs per individual: 95% in dbSNP 166 indels per individual: 63% in dbSNP

Recurrence strategy

Filters needed Exome sequencing:

7 / 25

Compare to ~3 million SNPs per individual (in the whole genome)

slide-8
SLIDE 8

EVA – NETTAB 2011

17,000 cSNPs per individual: 95% in dbSNP 166 indels per individual: 63% in dbSNP

Recurrence strategy

Fig2 : From Ng S B, et al. Nature 461, 272-276 (2009). 1 Nonsynonymous cSNP Not in dbSNP Not in HapMap Not in dbSNP + HapMap Predicted damaging

1 2 3 4

individuals Number of genes affected by at least one cSNP in

Filters needed Exome sequencing:

7 / 25

slide-9
SLIDE 9

EVA – NETTAB 2011

17,000 cSNPs by individual: 95% in dbSNP 166 indels by individual: 63% in dbSNP

Recurrence strategy

Fig2 : From Ng S B, et al. Nature 461, 272-276 (2009). 1 Nonsynonymous cSNP Not in dbSNP Not in HapMap Not in dbSNP + HapMap Predicted damaging

1 2 3 4

individuals Number of genes affected by at least one cSNP in

Filters needed Exome sequencing :

7 / 25

Freeman-Sheldon syndrome

slide-10
SLIDE 10

EVA – NETTAB 2011

Problematic: clinical bioinformatics

Mapping & variations detection

CASAVA + bioinformatics processing

NGS sequencing

Illumina - GA IIx

?

8 / 25

slide-11
SLIDE 11

EVA – NETTAB 2011

Problematic

Mapping & variations detection

CASAVA + bioinformatics processing

NGS sequencing

Illumina - GA IIx

8 / 25

We need to Filter variations To make the clinician Autonomous And to make a step towards Personalized medecine

slide-12
SLIDE 12

EVA – NETTAB 2011

Mapping & variations detection

CASAVA + bioinformatics processing

NGS sequencing

Illumina - GA IIx

ExomeDB

EVA integration module EVA

The EVA tool consists of:

  • a database: ExomeDB
  • a browser
  • several filters and search tools

8 / 25

EVA - Exome Variation Analyzer

slide-13
SLIDE 13

EVA – NETTAB 2011

Structure

  • Developed in mySQL (ver 5.0)
  • Principal tables: Individual, Variation and Gene

Database: ExomeDB

VARIATION GENE INDIVIDUAL

id_ind indName

  • rigin

. . . id_var position chrom base_ref base_mut . . . id_gen geneName chrom start end . . .

9 / 25

slide-14
SLIDE 14

EVA – NETTAB 2011

Integration module

10 / 25

C O N F I D E N T I A L

  • Every new project is subject to a remote loading using an online integration
  • module. This module accepts .txt files and .xls files
  • The integrated data are: lists of variations (SNP, InDel) + their annotations

(position, mutation type, ...)

  • Output of a CASAVA-like analysis pipeline. The tool is optimised to admit data

coming from IntegraGen, biotechnology society, Évry, France

slide-15
SLIDE 15

EVA – NETTAB 2011 10 / 25

Genomic position

  • Every new project is subject to a remote loading using an online integration
  • module. This module accepts .txt files and .xls files
  • The integrated data are: lists of variations (SNP, InDel) + their annotations

(position, mutation type, ...)

  • Output of a CASAVA-like analysis pipeline. The tool is optimised to admit data

coming from IntegraGen, biotechnology society, Évry, France

Integration module

C O N F I D E N T I A L

slide-16
SLIDE 16

EVA – NETTAB 2011 10 / 25

Number of read bases

Integration module

C O N F I D E N T I A L

  • Every new project is subject to a remote loading using an online integration
  • module. This module accepts .txt files and .xls files
  • The integrated data are: lists of variations (SNP, InDel) + their annotations

(position, mutation type, ...)

  • Output of a CASAVA-like analysis pipeline. The tool is optimised to admit data

coming from IntegraGen, biotechnology society, Évry, France

slide-17
SLIDE 17

EVA – NETTAB 2011 10 / 25

Quality and coverage

Integration module

C O N F I D E N T I A L

  • Every new project is subject to a remote loading using an online integration
  • module. This module accepts .txt files and .xls files
  • The integrated data are: lists of variations (SNP, InDel) + their annotations

(position, mutation type, ...)

  • Output of a CASAVA-like analysis pipeline. The tool is optimised to admit data

coming from IntegraGen, biotechnology society, Évry, France

slide-18
SLIDE 18

EVA – NETTAB 2011 10 / 25

Mutated base / reference base

Integration module

C O N F I D E N T I A L

  • Every new project is subject to a remote loading using an online integration
  • module. This module accepts .txt files and .xls files
  • The integrated data are: lists of variations (SNP, InDel) + their annotations

(position, mutation type, ...)

  • Output of a CASAVA-like analysis pipeline. The tool is optimised to admit data

coming from IntegraGen, biotechnology society, Évry, France

slide-19
SLIDE 19

EVA – NETTAB 2011 10 / 25

Gene annotations: gene name and functional class

Integration module

C O N F I D E N T I A L

  • Every new project is subject to a remote loading using an online integration
  • module. This module accepts .txt files and .xls files
  • The integrated data are: lists of variations (SNP, InDel) + their annotations

(position, mutation type, ...)

  • Output of a CASAVA-like analysis pipeline. The tool is optimised to admit data

coming from IntegraGen, biotechnology society, Évry, France

slide-20
SLIDE 20

EVA – NETTAB 2011

Web Interface

Browse Search Filters

11 / 25

slide-21
SLIDE 21

EVA – NETTAB 2011

Filters

Recurrence Strategy - 1st step: select project

[Variations overview]

12 / 25

14 exomes in early autosomic dominant Alzheimer pathology without identified mutations

slide-22
SLIDE 22

EVA – NETTAB 2011

Filters

Recurrence Strategy - 1st step: select project

[Variations overview]

12 / 25

14 exomes in early autosomic dominant Alzheimer pathology without identified mutations

Sequenced individuals

slide-23
SLIDE 23

EVA – NETTAB 2011

Filters

Recurrence Strategy - 1st step: select project

[Variations overview]

In dbSNP Not in dbSNP

12 / 25

14 exomes in early autosomic dominant Alzheimer pathology without identified mutations

slide-24
SLIDE 24

EVA – NETTAB 2011

Filters

Recurrence Strategy - 1st step: select project

[Variations overview]

12 / 25

14 exomes in early autosomic dominant Alzheimer pathology without identified mutations

Exonic / Intronic

slide-25
SLIDE 25

EVA – NETTAB 2011

Filters

Recurrence Strategy - 1st step: select project

[Variations overview]

12 / 25

14 exomes in early autosomic dominant Alzheimer pathology without identified mutations

Single variation / Insertion - deletion

slide-26
SLIDE 26

EVA – NETTAB 2011

Filters

Recurrence Strategy - 1st step: select project

[Variations overview]

12 / 25

14 exomes in early autosomic dominant Alzheimer pathology without identified mutations

Single variation categories: Synonym - Missense - Stop loss - Nonsense

slide-27
SLIDE 27

EVA – NETTAB 2011

Filters

Recurrence Strategy - 1st step: select project

[Variations overview]

12 / 25

14 exomes in early autosomic dominant Alzheimer pathology without identified mutations

Indel categories: Frameshift - No Frameshift

slide-28
SLIDE 28

EVA – NETTAB 2011

Filters

Recurrence Strategy - 1st step: select project

[Variations overview]

12 / 25

14 exomes in early autosomic dominant Alzheimer pathology without identified mutations

Canonical splice site mutation

slide-29
SLIDE 29

EVA – NETTAB 2011

Filters

Recurrence Strategy - 1st step: select project

[Variations overview]

12 / 25

~14,106 + ~1066 = ~15,172

~16,500 in Ng S B, et al. Nature 461, 272- 276 (2009).

14 exomes in early autosomic dominant Alzheimer pathology without identified mutations

slide-30
SLIDE 30

EVA – NETTAB 2011

Filters

Recurrence Strategy - 2nd step: apply filters Filters can be combined at will by clinicians to address different kinds of questions. The combination is transformed into a SQL query and sent to the ExomeDB database.

13 / 25

slide-31
SLIDE 31

EVA – NETTAB 2011

Filters

Recurrence Strategy - 2nd step: apply filters Filters can be combined at will by clinicians to address different kinds of questions. The combination is transformed into a SQL query and sent to the ExomeDB database.

13 / 25

slide-32
SLIDE 32

EVA – NETTAB 2011

Filters

Recurrence Strategy - 2nd step: apply filters Filters can be combined at will by clinicians to address different kinds of questions. The combination is transformed into a SQL query and sent to the ExomeDB database.

13 / 25

slide-33
SLIDE 33

EVA – NETTAB 2011

[Variations overview]

Filters

Recurrence Strategy - 2nd step: display filtered variations

14 / 25

Only remain unknown, non synonym and high quality variations. The number of variations per individual decreases from ~15,172 to ~330

slide-34
SLIDE 34

EVA – NETTAB 2011

The displayed result is a list

  • f potential candidate genes

Specify the number of individuals

Filters

Recurrence Strategy - 3rd step: retrieve corresponding gene & control the recurrence stringency

15 / 25

Search for the most affected gene in the specified number of individuals

slide-35
SLIDE 35

EVA – NETTAB 2011

Results of recurrence strategy

Number of individuals Number of genes with remaining variations 14 / 14 13 / 14 12 / 14 11 / 14 10 / 14 9 / 14 8 / 14 1

7 / 14

3 6 / 14 3 5 / 14 7 4 / 14 31 3 / 14 112 2 / 14 542 1 / 14 2730

16 / 25

The number of candidate genes drasticaly decreases with the number of individuals

Wet and dry biological investigations 1 gene (publication submitted)

slide-36
SLIDE 36

EVA – NETTAB 2011

Useful external databases links Areas not captured during the pre sequencing protocol [Gene detail] External interpretation tools

Filters

Results - gene details, variations overview, variations list, variation details.

17 / 25

slide-37
SLIDE 37

EVA – NETTAB 2011

Useful external databases links Areas not captured during the pre sequencing protocol Variations overview for the gene [Variations overview] [Gene detail] External interpretation tools

Filters

Results - gene details, variations overview, variations list, variation details.

17 / 25

CONFIDENTIAL

slide-38
SLIDE 38

EVA – NETTAB 2011

[Variations list]

Filters

Results - gene details, variations overview, variations list, variation details.

18 / 25

C O N F I D E N T I A L

slide-39
SLIDE 39

EVA – NETTAB 2011

[Variation details]

SNP variation annotations (position, gene, mutation) and quality information.

Filters

Results - gene details, variations overview, variations list, variation details.

19 / 25

slide-40
SLIDE 40

EVA – NETTAB 2011

[Variation details]

SNP variation annotations (position, gene, mutation) and quality information.

Filters

Results - gene details, variations overview, variations list, variation details.

19 / 25

slide-41
SLIDE 41

EVA – NETTAB 2011

[Variation details]

SNP variation annotations (position, gene, mutation) and quality information.

Filters

Results - gene details, variations overview, variations list, variation details.

19 / 25

slide-42
SLIDE 42

EVA – NETTAB 2011

[Variation details]

SNP variation annotations (position, gene, mutation) and quality information.

Filters

Results - gene details, variations overview, variations list, variation details.

19 / 25

slide-43
SLIDE 43

EVA – NETTAB 2011

[Variation details]

SNP variation annotations (position, gene, mutation) and quality information.

Filters

Results - gene details, variations overview, variations list, variation details.

19 / 25

slide-44
SLIDE 44

EVA – NETTAB 2011

Filters: 3 ways of using them: recurrence familial and de novo Recurrence strategy Can be applied on dominant or recessive pathologies.

20 / 25

Other strategies: Familial and de novo

Variations Not related individuals

Select genes with remaining variations in several individuals

DB Filters EVA Interface

slide-45
SLIDE 45

EVA – NETTAB 2011

Filters: 3 ways of using them: recurrence familial and de novo Recurrence strategy Can be applied on dominant or recessive pathologies. Familial strategy

Variations Related individuals

Select genes with identical remaining variations in the related individuals

DB EVA Interface Variations Not related individuals

Select genes with remaining variations in several individuals

DB Filters Filters EVA Interface

20 / 25

Other strategies: Familial and de novo

slide-46
SLIDE 46

EVA – NETTAB 2011

Filters: 3 ways of using them: recurrence familial and de novo Recurrence strategy Can be applied on dominant or recessive pathologies. Familial strategy

Variations Related individuals

Select genes with identical remaining variations in the related individuals

DB EVA Interface Variations Not related individuals

Select genes with remaining variations in several individuals

DB Filters Filters EVA Interface

20 / 25

Other strategies: Familial and de novo

De novo strategy

Variations 1 child + 2 parents

Select genes with variations in the child not present in the parents

DB EVA Interface Filters

slide-47
SLIDE 47

EVA – NETTAB 2011

Select the related individuals

Filters

Familial strategy

21 / 25

Search for the gene affected by the most common variations in the specified related individuals

C O N F I D E N T I A L

slide-48
SLIDE 48

EVA – NETTAB 2011

Select child and parents

Filters

De novo strategy

22 / 25

Search for the genes with remaining variations found in the child but not in the parents

CONFIDENTIAL CONFIDENTIAL

slide-49
SLIDE 49

EVA – NETTAB 2011

Conclusion

EVA (Exome Variation Analyzer): simple, convivial and efficient tool.

  • Database: ExomeDB

store exome sequencing data

  • Web interface:

help clinicians in filtering and interpreting data

Results:

  • Real decrease of candidate variations
  • Case study (Alzheimer): 1 candidate gene revealed (publication submitted)

Encountered problems:

  • Genes with frequent polymorphisms are not eliminated in Recurrence strategy
  • Reference transcripts
  • Variations found in other projects is necessary to drasticaly decrease the

candidate list (1000 genomes, CompleteGenomics...)

23 / 25

slide-50
SLIDE 50

EVA – NETTAB 2011

Perspectives Availability

Interface Graphical representation Tool Integration module: VCF format compatibility Statistical overview (project, individual, ...) EVA is hosted on a dedicated server. The web address is public but an authentication must be given by an administrator in order to access biological data.

24 / 25

http://bioinfo.litislab.fr/EVA/

slide-51
SLIDE 51

EVA – NETTAB 2011 25 / 25

Thank you for your attention!