Large-scale Cancer Genomics Data Analysis David Haussler Center - - PowerPoint PPT Presentation

large scale cancer genomics data analysis
SMART_READER_LITE
LIVE PREVIEW

Large-scale Cancer Genomics Data Analysis David Haussler Center - - PowerPoint PPT Presentation

Large-scale Cancer Genomics Data Analysis David Haussler Center for Biomolecular Science and Engineering, UC Santa Cruz Cancer Genomics Hub Being built to store BAM & VCF for TCGA, TARGET and CGAP/CGCI projects Designed for 25,000


slide-1
SLIDE 1

Large-scale Cancer Genomics Data Analysis

David Haussler Center for Biomolecular Science and Engineering, UC Santa Cruz

slide-2
SLIDE 2

Cancer Genomics Hub

Being built to store BAM & VCF for TCGA, TARGET and CGAP/CGCI projects

Designed for 25,000 cases with average of 200 gigabytes per case

5 petabytes (5 x 1015) total, scalable to 20 petabytes

General Parallel File System, Dual RAID 6 subsystems, Redundant I/O paths, 16 application processors, 12 storage controllers

co-location opportunities

slide-3
SLIDE 3

CGHub Goals

  • Enable direct comparison and combined analysis of

many large-scale cancer genomics datasets

  • aggregate enough data to provide the statistical power

to attack the full complexity of cancer mutations

  • Set standards for data storage and exchange; encourage

data sharing

  • Maintain compatibility with EGA, dbGaP, ICGC, 1000

Genomes Project, ENCODE and other large-scale genomics efforts (e.g. VCF format, data access coordination)

slide-4
SLIDE 4

Given the same BAM files, different mutation calling pipelines do not completely agree

TCGA-13-0725_

Broad UCSC WUSTL 575 304 126 494 442 276 1982

Total calls: Called by 2 Called by at

  • ther centers least 1 other

Broad: 3,194 62% 85% UCSC: 2,688 74% 89% WUSTL: 3,125 63% 82%

Still work to do to harden mutation- calling software

slide-5
SLIDE 5

We are just beginning to look at accuracy and consistency in the detection of structural variation Case study: UCSC and Broad analysis of whole genome GBM data

slide-6
SLIDE 6

Samples Analyzed

slide-7
SLIDE 7

Gene fusions: BamBam 167, dRanger 188

136 potentially overlapping events

slide-8
SLIDE 8

Whole Genome View

06-0152 06-0188

  • Circle plot shows amplifications, deletions, inter/ intra

chromosomal rearrangement

  • These 2 samples have 23/ 25 top dRanger, 21/ 29 top

bambam events

slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11

CDKN2A CDKN2B

1 2 3 4 5

chr9

Non-reciprocal Translocation chr11(p15.5-15.3)

chr9

CDKN2A/ B

chr9 chr9 chr11 chr11 GBM Germ line chr11

CDKN2A/ B

chr11

Segmental Deletion

Independent events lead to somatic homozygous loss of tumor suppressors CDKN2A/B

slide-12
SLIDE 12
slide-13
SLIDE 13

In 11/16 cases similar events lead to homozygous loss of CDKN2A/B

One Copy Deleted by Other Copy Deleted by

5 GBMs

Focal Loss Arm-Level loss of chr9p

(via inter-chrom translocation)

3 GBMs

Focal Loss Arm-Level loss of chr9p

(mechanism unknown)

2 GBMs

Focal Loss Complete loss of chr9

1 GBM

Focal Loss Complex event 5 GBMs No loss detected No loss detected

Zack Sanborn

slide-14
SLIDE 14

Features of CDKN2A/B normal samples

slide-15
SLIDE 15
slide-16
SLIDE 16

GBM-0152 chr12

Chromothripsis in a gliblastoma

Inter-chromosomal links to chr7 MDM2

LEMD3-c12orf56 Fusion

slide-17
SLIDE 17

MDM2 EGFR Amplified regions are connected GBM-0152 chr12 chr7 chr2

slide-18
SLIDE 18

EGFR Amplifcation/Mutation

  • 11/17 samples have chr7 amplifications including

EGFR

  • 4/11 also have EGFRviii mutations
  • Exon 2-7 deletion at low copy
  • Probably happened after amplification events
  • Selection for low copy?
slide-19
SLIDE 19

Example: EGFRviii mutation

slide-20
SLIDE 20

GBMs release exosomes. Could some GBM tumor DNA show up in the blood?

slide-21
SLIDE 21

Amplified events may provide enough reads to detect this

GBM: TCGA-06-0152

left-hand edge of EGFR amplicon, connected to chr12

slide-22
SLIDE 22

Similar pattern

  • f mismatches

GBM: TCGA-06-0152

left-hand edge of EGFR amplicon, connected to chr12

Split Reads

slide-23
SLIDE 23

Overall Copy Number

Minority Copy Number GBM: TCGA-06-0185

Copy Number States

Single Copy Loss of chr10

chr6p chr9p

Normal (Diploid)

chr9q

Homozygous Deletion of CDKN2A/B 1 2 3 Single Copy Amplification of chr7, chr19, & chr20 1 Zack Sanborn

slide-24
SLIDE 24

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Simulated Progression Model to Infer Karyotype Mixture

Chromosomes

15%

2a

8%

2b

54%

3

Proportion

23%

1

EGFR CDKN2A/B

1

2a 2b

3

Tumorigenesis

?

Zack Sanborn

slide-25
SLIDE 25

Steve Benz Charlie Vaske James Durbin Zack Sanborn Jing Zhu Chris Szeto Mark Diekhans *

UCSC Cancer Integration Group

Josh Stuart, Co-PI Sam Ng Mia Grifford Amie Radenbaugh Ted Golstein Melissa Cline Dan Carlin Kyle Elrott Brian Craft Chris Wilks Sofie Salama *

*

Artem Sokolov

slide-26
SLIDE 26

Allele-Specific Copy Number

Matched Normal

genomic position Majority Allele Read Counts Minority Allele Read Counts

Tumor

genomic position deletion heterozygous sites Zack Sanborn

slide-27
SLIDE 27

Tumors exhibit multiple rounds of duplication, rearrangement and loss

Overall Copy Number Minority Copy Number

CN-LOH Single Copy Amplification Normal (Diploid) Colon 5EKFO (Meyerson)

3 2 1

  • Est. Normal

Contamination

estimated normal contamination

Zack Sanborn

slide-28
SLIDE 28

Copy Number Profile Analysis

Overall Copy Number Minority Copy Number

Ovarian TCGA-13-1411

  • Est. Normal

Contamination # Minority Copies 1 2 3 # Total Copies 1 2 3 4 5 6

KRAS

estimated normal contamination

Zack Sanborn

slide-29
SLIDE 29

Many rearrangements in amplified regions

MDM2 -CDK4 0 6 -0 1 5 2