Genomic Testing Elaine Gee, PhD Director, Bioinformatics ARUP - - PowerPoint PPT Presentation

genomic testing
SMART_READER_LITE
LIVE PREVIEW

Genomic Testing Elaine Gee, PhD Director, Bioinformatics ARUP - - PowerPoint PPT Presentation

A Quick Guide to the Analytics Behind Genomic Testing Elaine Gee, PhD Director, Bioinformatics ARUP Laboratories 1 Learn earning ing Objec Objectives ives Catalogue various types of bioinformatics analyses that support clinical genomic


slide-1
SLIDE 1

A Quick Guide to the

Analytics Behind Genomic Testing

1

Elaine Gee, PhD Director, Bioinformatics ARUP Laboratories

slide-2
SLIDE 2

Learn earning ing Objec Objectives ives

Catalogue various types of bioinformatics analyses that support clinical genomic testing Enumerate types of variant classes Describe algorithmic methods for variant detection by NGS Compare and contrast germline and somatic clinical bioinformatics pipeline methodologies Discuss the infrastructure complexity required to support analytics for NGS testing at scale in the cloud Explain validation strategies for bringing best-in-class pipelines into clinical production

slide-3
SLIDE 3

The Hu The Huma man n Re Reference ference Genom Genome ~3B

3B ba base se pa pairs irs

structured into

23 chromosome pairs

3,098,825,702 base pairs 20,805 coding genes 14,181 pseudogenes 196,501 gene transcripts

slide-4
SLIDE 4
slide-5
SLIDE 5

Why Why Genom Genomic ic Tes Testing ing?

KRAS G12D

1 in 4

cancer deaths are from lung cancer. ~222,500 new cases of lung cancer in the U.S. in 2017.

slide-6
SLIDE 6

Short-Read Sequencers Illumina Ion-Torrent Long-Read Sequencers PacBio NanoPore 10X Nanostring

Genom Genomic ic Tes Testing ing

slide-7
SLIDE 7

7

Types Types o

  • f

f NG NGS S Tes Testing ing—Somat Somatic ic & Germl & Germline ine

slide-8
SLIDE 8

Types Types o

  • f

f NG NGS S Tes Testing ing—cfDNA cfDNA an and d ctDNA ctDNA

Non-Invasive Prenatal Testing (NIPT) Liquid Biopsy

Trisomy 21 (Down Syndrome) Non-small cell lung cancer EGFR

slide-9
SLIDE 9

Types Types o

  • f

f NG NGS S Tes Testing ing—Inf Infec ectiou ious Dise Disease ase

slide-10
SLIDE 10

Types Types o

  • f

f NG NGS S Tes Testing ing—RNA RNA-Seq Seq

  • Alternate transcripts
  • Novel gene isoforms
  • Gene fusions
slide-11
SLIDE 11

Role of Role of Clinica Clinical B l Bioi ioinf nform

  • rmat

atics ics

Build pipelines Provide supplemental information for clinical interpretation and quality control Other computationally heavy analytics are involved in evaluating:

Design of new panels Identification of genetic patterns in patient cohorts Discovery of gene pathways

slide-12
SLIDE 12

Understanding bioinformatics requires understanding the laboratory process.

slide-13
SLIDE 13

Steps in a bioinformatics pipeline:

  • 1. Sample demultiplexing
  • 2. Read alignment
  • 3. BAM polishing steps
  • 4. Variant calling
  • 5. Variant annotations
  • 6. QC calculations

Vari Varian ant Call Calling ing Pi Pipeline peline

slide-14
SLIDE 14

Step 1 Step 1: : Sa Sampl mple e Demult Demultipl iplexing exing

slide-15
SLIDE 15

Step 2 Step 2: : Read Read Alig Alignment nment

Read Alignment SAM Format

slide-16
SLIDE 16

A T C C T G A T C C C T G A T C C T G A T C C T G A T C C T G A T C C T G

PCR Duplicate Removal Base Quality Score Recalibration Step 3 Step 3: : BAM Po BAM Polishin lishing g Steps Steps

homopolymer

+1%

Q30 Phred base quality score → 99.9% → 1/1000

reference read

C C C C C

slide-17
SLIDE 17

Step 4 Step 4: : Va Varia riant nt Call Calling ing by C by Cla lass

slide-18
SLIDE 18

SNV/Insertion/Deletion

  • Position based callers

(GATK Unified Genotyper, LoFreq)

  • Local de-novo assembly of haplotypes

(GATK Haplotype Caller)

  • Graph based variant callers

(Graph Genome)

  • Neural networks (Deep Variant)

Duplications/Structural variants

  • Pattern growth approach (Pindel)
  • Split reads, discordant paired-end reads

(Manta, DELLY, CREST)

  • kmer + de-novo assembly (BreaKmer)
  • Unmapped or partially mapped reads

(ITD Assembler)

  • Depth of coverage + background error

correction + principal component analysis (XHMM)

  • Tumor/normal
  • B allele frequency

Exa Exampl mple e Vari Varian ant Call Calling ing Algor Algorit ithms hms

slide-19
SLIDE 19

Example KRAS G12D Variant Cell

slide-20
SLIDE 20

The annotated variant includes:

  • Gene
  • Gene Transcript
  • Nucleotide change (cdot)
  • Protein change (pdot)
  • Variant Type

– Polymorphism – Synonymous – Non-synonymous

  • Nonsense
  • Missense

– Frame shift The VCF variant includes:

  • chromosome
  • position
  • ID
  • reference base
  • alternate base
  • variant quality
  • meta-information

– information and individual format fields – filter flags

Step 5 Step 5: : Va Varia riant nt Annot Annotat ation ions

VCF variant Annotated variant

slide-21
SLIDE 21

Sample Report QC metrics

 

Step 6 Step 6: : QC QC Calcula Calculation ions

Sequencing

Ru Run Lev Level el Cluster density Base call quality score Fragment size Sam ample e Lev Level el Depth coverage Uniformity Mapping quality Duplication rate Var Varian ant t Lev Level el Novel variants Known variants Transition-to-transversion ratio

slide-22
SLIDE 22

Off-Target Read Depth On-Target Gene Exon Intronic regions

Sampl Sample-Leve evel l QC QC Metr Metrics ics fo for Target r Targeted ed Capt Capture ure

Minimum Depth of Coverage Uniformity Mapping Quality Duplication Rate

slide-23
SLIDE 23

How does a bioinformatics job get executed in clinical production?

Job 1 Job 3 Job 2 Job 4 Job 5

Comput Compute e Inf Infra rastruct ructur ure e fo for Dat r Data a Pr Proce

  • cessing

ing

slide-24
SLIDE 24

Dat Data Stora a Storage ge Inf Infra rastruct ructur ure

BCL (500–550 GB) FASTQs (12–15 GB)

Database Object Storage “hot” Archive “cold”

99.9% availability 99.999999999% durability

Raw output for a single run Exome ~150x In-house Cloud based

FASTQs, BAMs, VCFs

Bioinformatics HiSeq 4000

slide-25
SLIDE 25
  • Recommendations from CAP/AMP

– 17 recommendation statements – 59 variants tested in each variant class

  • Example Statistics

– Positive percentage agreement (PPA) – Positive predictive value (PPV) – Reproducibility – Allelic fraction lower limit of detection

  • Validation required prior to use in clinical production

Bioin Bioinfo forma rmatics ics Pi Pipeline V peline Val alida idation ion

slide-26
SLIDE 26

Summa Summary ry

Catalogue various types of bioinformatics analyses that support clinical genomic testing Enumerate types of variant classes Describe algorithmic methods for variant detection by NGS Compare and contrast germline and somatic clinical bioinformatics pipeline methodologies Discuss the infrastructure complexity required to support analytics for NGS testing at scale in the cloud Explain validation strategies for bringing best-in-class pipelines into clinical production

slide-27
SLIDE 27

Quest Questions? ions?

Elaine Gee, PhD Director of Bioinformatics ARUP Laboratories elaine.gee@aruplab.com