VIRANA: A Standardized Analysis of Viral Next Generation Sequencing - - PowerPoint PPT Presentation

virana a standardized analysis of viral next generation
SMART_READER_LITE
LIVE PREVIEW

VIRANA: A Standardized Analysis of Viral Next Generation Sequencing - - PowerPoint PPT Presentation

VIRANA: A Standardized Analysis of Viral Next Generation Sequencing Data Bastian Beggel Max-Planck-Institute for Informatics Saarbrcken Improvements in the rate of DNA sequencing Source: Stratton et al., Nature 2009 Bastian Beggel Slide 2


slide-1
SLIDE 1

VIRANA: A Standardized Analysis of Viral Next Generation Sequencing Data

Bastian Beggel Max-Planck-Institute for Informatics Saarbrücken

slide-2
SLIDE 2

Slide 2 Bastian Beggel

Improvements in the rate of DNA sequencing

Source: Stratton et al., Nature 2009

slide-3
SLIDE 3

Slide 3 Bastian Beggel

Reduction in the cost of DNA sequencing

slide-4
SLIDE 4

Slide 4 Bastian Beggel

Result set of viral NGS data analysis

NGS datasets

position-wise

Pileup

haplotype level

60% ATATC…GATCG 20% ATATC…TATCG 10% ATATC…TATCG

read level

  • HIV tropism
  • Hypermutation
  • Dual infections
slide-5
SLIDE 5

Slide 5 Bastian Beggel

Standardized processing of next-generation sequencing data

Quality Control

  • Withdraw/ clip bad

quality reads Map to reference

  • Select reference
  • Solve alignment

problem

Pre-processing Standardized Analysis

Coverage

  • Number of mapped

reads per position Pileups/ Dynamics

  • Summarize data

position-wise

  • Analysis of changes

Haplotypes

  • Raw output
  • Visualization
  • Complexity

Custom Analysis

HIV Tropism

  • g2p[454]

Hypermutation

  • Classify reads as

hypermutated Statistics

  • Correlate NGS data

with clinical parameters

slide-6
SLIDE 6

Slide 6 Bastian Beggel

VIRANA as a Web-Service Upload sequence data Download summary statistics and plots

slide-7
SLIDE 7

Slide 7 Bastian Beggel

Pileups summarize NGS data position-wise ID Pos

  • Ref. NT

G A T C Cov. A_BL 1 T 0.1% 32.9% 66.8% 0.1% 8300 A_BL 2 G 99.6% 0.2% 0.0% 0.0% 8305 A_BL 3 T 0.1% 0.0% 99.4% 0.4% 8331 A_BL 4 T 0.0% 0.0% 99.7% 0.1% 8334 A_BL 5 G 98.4% 0.6% 0.1% 0.0% 8338

LoFreq (Wilm et al., 2012)

  • Modeling biases in sequencing error rates
  • Uses base-specific quality scores (phred scores)
  • Poisson–binomial distribution

Error model

slide-8
SLIDE 8

Slide 8 Bastian Beggel

Dynamic pileups to analyze changes over time ID1 ID1 Pos Ref. NT dG dA dT dC mCov. A B 1 T 0.0% 0.0% 0.1% -0.1% 8300 A B 2 G

  • 0.1% 0.0% -0.1% 0.2% 8305

A B 3 T 0.0% 0.0% -4.5% 4.5% 8331 A B 4 T 0.0% 0.0% 0.0% 0.0% 8334 A B 5 G

  • 0.1% 0.0% -0.1% 0.2% 8338

deepSVN (Gerstung et al., 2012)

  • Compares two similar samples
  • Adapts error rates to genomic context
  • Hierarchical binomial model (overdispersed)

Error model

slide-9
SLIDE 9

Slide 9 Bastian Beggel

Median and sample coverage

Coverage Coverage NT position NT position

Median Coverage Single Sample Coverage

slide-10
SLIDE 10

Slide 10 Bastian Beggel

Monitoring of resistance mutations

Sample time NGS frequency 181T Prevalence of resistance mutations at baseline Dynamics of resistance mutations NGS frequency

BL 12M Patient

slide-11
SLIDE 11

Slide 11 Bastian Beggel

Monitoring genetic change

AA

slide-12
SLIDE 12

Slide 12 Bastian Beggel

Quasispecies estimation using ShoRAH

Source: Zargordi et al. 2011

slide-13
SLIDE 13

Slide 13 Bastian Beggel

Visualization of the viral quasispecies

Principal Component 2 Principal Component 1

slide-14
SLIDE 14

Slide 14 Bastian Beggel

Conclusions

  • State-of-the-art processing of viral NGS data
  • HBV, HCV, HIV
  • Summary plots and statistics
  • Coverage
  • Pileups, Dynamics
  • Resistance mutation
  • Quasispecies
  • Fully automated
  • Web-based version in planning
  • Looking forward to more collaborations
slide-15
SLIDE 15

Slide 15 Bastian Beggel

Thank you for your attention

Acknowledgements

  • Thomas Lengauer
  • Alex Thielen
  • Rolf Kaiser
  • Martin Däumer
  • Sven-Eric Schelhorn
slide-16
SLIDE 16

Slide 16 Bastian Beggel

End