[PPT] - Genomic Testing Elaine Gee, PhD Director, Bioinformatics ARUP PowerPoint Presentation

SLIDE 1

A Quick Guide to the

Analytics Behind Genomic Testing

1

Elaine Gee, PhD Director, Bioinformatics ARUP Laboratories

SLIDE 2

Learn earning ing Objec Objectives ives

Catalogue various types of bioinformatics analyses that support clinical genomic testing Enumerate types of variant classes Describe algorithmic methods for variant detection by NGS Compare and contrast germline and somatic clinical bioinformatics pipeline methodologies Discuss the infrastructure complexity required to support analytics for NGS testing at scale in the cloud Explain validation strategies for bringing best-in-class pipelines into clinical production

SLIDE 3

The Hu The Huma man n Re Reference ference Genom Genome ~3B

3B ba base se pa pairs irs

structured into

23 chromosome pairs

3,098,825,702 base pairs 20,805 coding genes 14,181 pseudogenes 196,501 gene transcripts

SLIDE 4

SLIDE 5

Why Why Genom Genomic ic Tes Testing ing?

KRAS G12D

1 in 4

cancer deaths are from lung cancer. ~222,500 new cases of lung cancer in the U.S. in 2017.

SLIDE 6

Short-Read Sequencers Illumina Ion-Torrent Long-Read Sequencers PacBio NanoPore 10X Nanostring

Genom Genomic ic Tes Testing ing

SLIDE 7

7

Types Types o

f

f NG NGS S Tes Testing ing—Somat Somatic ic & Germl & Germline ine

SLIDE 8

Types Types o

f

f NG NGS S Tes Testing ing—cfDNA cfDNA an and d ctDNA ctDNA

Non-Invasive Prenatal Testing (NIPT) Liquid Biopsy

Trisomy 21 (Down Syndrome) Non-small cell lung cancer EGFR

SLIDE 9

Types Types o

f

f NG NGS S Tes Testing ing—Inf Infec ectiou ious Dise Disease ase



SLIDE 10

Types Types o

f

f NG NGS S Tes Testing ing—RNA RNA-Seq Seq

Alternate transcripts
Novel gene isoforms
Gene fusions

SLIDE 11

Role of Role of Clinica Clinical B l Bioi ioinf nform

rmat

atics ics

Build pipelines Provide supplemental information for clinical interpretation and quality control Other computationally heavy analytics are involved in evaluating:

Design of new panels Identification of genetic patterns in patient cohorts Discovery of gene pathways

SLIDE 12

Understanding bioinformatics requires understanding the laboratory process.

SLIDE 13

Steps in a bioinformatics pipeline:

1. Sample demultiplexing
2. Read alignment
3. BAM polishing steps
4. Variant calling
5. Variant annotations
6. QC calculations

Vari Varian ant Call Calling ing Pi Pipeline peline

SLIDE 14

Step 1 Step 1: : Sa Sampl mple e Demult Demultipl iplexing exing

SLIDE 15

Step 2 Step 2: : Read Read Alig Alignment nment

Read Alignment SAM Format

SLIDE 16

A T C C T G A T C C C T G A T C C T G A T C C T G A T C C T G A T C C T G

PCR Duplicate Removal Base Quality Score Recalibration Step 3 Step 3: : BAM Po BAM Polishin lishing g Steps Steps

homopolymer

+1%

Q30 Phred base quality score → 99.9% → 1/1000

reference read

C C C C C

SLIDE 17

Step 4 Step 4: : Va Varia riant nt Call Calling ing by C by Cla lass

SLIDE 18

SNV/Insertion/Deletion

Position based callers

(GATK Unified Genotyper, LoFreq)

Local de-novo assembly of haplotypes

(GATK Haplotype Caller)

Graph based variant callers

(Graph Genome)

Neural networks (Deep Variant)

Duplications/Structural variants

Pattern growth approach (Pindel)
Split reads, discordant paired-end reads

(Manta, DELLY, CREST)

kmer + de-novo assembly (BreaKmer)
Unmapped or partially mapped reads

(ITD Assembler)

Depth of coverage + background error

correction + principal component analysis (XHMM)

Tumor/normal
B allele frequency

Exa Exampl mple e Vari Varian ant Call Calling ing Algor Algorit ithms hms

SLIDE 19

Example KRAS G12D Variant Cell

SLIDE 20

The annotated variant includes:

Gene
Gene Transcript
Nucleotide change (cdot)
Protein change (pdot)
Variant Type

– Polymorphism – Synonymous – Non-synonymous

Nonsense
Missense

– Frame shift The VCF variant includes:

chromosome
position
ID
reference base
alternate base
variant quality
meta-information

– information and individual format fields – filter flags

Step 5 Step 5: : Va Varia riant nt Annot Annotat ation ions

VCF variant Annotated variant

SLIDE 21

Sample Report QC metrics

 

Step 6 Step 6: : QC QC Calcula Calculation ions

Sequencing

Ru Run Lev Level el Cluster density Base call quality score Fragment size Sam ample e Lev Level el Depth coverage Uniformity Mapping quality Duplication rate Var Varian ant t Lev Level el Novel variants Known variants Transition-to-transversion ratio

SLIDE 22

Off-Target Read Depth On-Target Gene Exon Intronic regions

Sampl Sample-Leve evel l QC QC Metr Metrics ics fo for Target r Targeted ed Capt Capture ure

Minimum Depth of Coverage Uniformity Mapping Quality Duplication Rate

SLIDE 23

How does a bioinformatics job get executed in clinical production?

Job 1 Job 3 Job 2 Job 4 Job 5

Comput Compute e Inf Infra rastruct ructur ure e fo for Dat r Data a Pr Proce

cessing

ing

SLIDE 24

Dat Data Stora a Storage ge Inf Infra rastruct ructur ure

BCL (500–550 GB) FASTQs (12–15 GB)

Database Object Storage “hot” Archive “cold”

99.9% availability 99.999999999% durability

Raw output for a single run Exome ~150x In-house Cloud based

FASTQs, BAMs, VCFs

Bioinformatics HiSeq 4000

SLIDE 25

Recommendations from CAP/AMP

– 17 recommendation statements – 59 variants tested in each variant class

Example Statistics

– Positive percentage agreement (PPA) – Positive predictive value (PPV) – Reproducibility – Allelic fraction lower limit of detection

Validation required prior to use in clinical production

Bioin Bioinfo forma rmatics ics Pi Pipeline V peline Val alida idation ion

SLIDE 26

Summa Summary ry

Catalogue various types of bioinformatics analyses that support clinical genomic testing Enumerate types of variant classes Describe algorithmic methods for variant detection by NGS Compare and contrast germline and somatic clinical bioinformatics pipeline methodologies Discuss the infrastructure complexity required to support analytics for NGS testing at scale in the cloud Explain validation strategies for bringing best-in-class pipelines into clinical production

SLIDE 27

Quest Questions? ions?

Elaine Gee, PhD Director of Bioinformatics ARUP Laboratories elaine.gee@aruplab.com