Analysis of gene copy number changes in tumor phylogenetics
Jun Zhou, Yu Lin, Vaibhav Rajan, Willia Hoskins, Bing Feng and Jijun Tang
Analysis of gene copy number changes in tumor phylogenetics Jun - - PowerPoint PPT Presentation
Analysis of gene copy number changes in tumor phylogenetics Jun Zhou, Yu Lin, Vaibhav Rajan, Willia Hoskins, Bing Feng and Jijun Tang Background u Cancer can be modeled by molecular evolutionary processes (specifically deletion, insertion,
Jun Zhou, Yu Lin, Vaibhav Rajan, Willia Hoskins, Bing Feng and Jijun Tang
u Cancer can be modeled by
molecular evolutionary processes (specifically deletion, insertion, etc.)
u A mutational phylogenetic tree
can be built with nodes as clones and subclones and directed edges as mutation processes.
Image from Davis A. and Navin N., 2016
u Genetic marker: Copy number of an array of genes
detected by Fluorescent In Situ Hybridization (FISH) at single-cell level
u Caused by insertion/deletion of genes u Caused by chromosomal aberrations u Data structure: u A clone is represented as a tuple of copy numbers u A patient is represented as a matrix of copy numbers ->
main (and only) input to the phylogenetic problem
u Distance-based Minimum Tree u NP-hard
u Input: A set of vertices 𝑊 and a 𝑊 × 𝑊 distance matrix OR
a metric system.
u Output: A 1-connected tree T = (𝑊, 𝐹) with minimum
weight (sum of distances of vertices connected by an edge)
u Prim’s/Kruskal’s greedy algorithm in polynomial time
u Steiner nodes: unobserved nodes (absent in the dataset) u Input: A set of vertices 𝑊 and a metric system. u Output: A 1-connected tree T = 𝑊), 𝐹 where 𝑊) ⊇ 𝑊 with
minimum weight (sum of distances of vertices connected by an edge)
u NP-hard u For the case of 𝑊 = 3, this reduces to the Median
problem.
u Sankoff’s algorithm in linear time
Image from Zhou J. et al., 2016
u Rectilinear (Manhattan) metric: sum of absolute
difference between corresponding positions from 2 tuples
u Input: A set of vertices 𝑊. u Output: A 1-connected tree T = 𝑊), 𝐹 where 𝑊) ⊇ 𝑊 with
minimum weight (sum of distances of vertices connected by an edge) under the rectilinear metric.
u Hanan’s Theorem for 2-D problem: There exists a RSMT
containing only Steiner points from the Hanan’s grid
u Solution space is bounded u Generalized for n-D u Exact (and naïve) algorithm would enumerate all possible
sets of Steiner nodes, compute Minimum Spanning Tree on the new tree and compute the weight
u Inspiration from the Median problem u Sankoff’s algorithm in linear time u Inspiration from Maximum parsimony problem u Maximum parsimony tree: heuristics of MP borrowed
from TNT package
Image from Zhou J. et al., 2016
Image from Zhou J. et al., 2016
u Minimizing number of Steiner nodes added by carefully
selecting which nodes to add first.
u Steiner count for an observed node A: the number of
triplets containing A that require a Steiner node to
u Inference score for Steiner nodes: sum of Steiner counts
in the triplet defining it
Image from Zhou J. et al., 2016
u MP heuristics (TNT package) to derive a tree whose leaves
contains the dataset
u Dynamic programming to assign states to internal nodes u Contract trivial edges (edge with weight 0 under
rectilinear metric)
Image from Zhou J. et al., 2016
Image from Zhou
al., 2016
Image from Zhou
al., 2016
Image from Zhou J. et al., 2016
u
QA
u Input: A set of vertices 𝑊. u Output: A 1-connected tree T = 𝑊), 𝐹 where 𝑊) ⊇ 𝑊 with
minimum weight (sum of distances of vertices connected by an edge) under a generalized metric to incorporate large scale duplication events.
Image from Zhou J. et al., 2016