Uncovering Proteins Functions Through Multi-Layer Tissue Networks
Marinka Zitnik
marinka@cs.stanford.edu Joint work with Jure Leskovec
Uncovering Proteins Functions Through Multi-Layer Tissue Networks - - PowerPoint PPT Presentation
Uncovering Proteins Functions Through Multi-Layer Tissue Networks Marinka Zitnik marinka@cs.stanford.edu Joint work with Jure Leskovec Why tissues? A unified view of cellular functions across human tissues is essential for understanding
marinka@cs.stanford.edu Joint work with Jure Leskovec
[Greene et al. 2015, Yeger & Sharan 2015, GTEx and others]
Marinka Zitnik, Stanford, ISMB/ECCB 2017 2
Goal: Given a set of proteins and possible functions, predict each protein’s association with each function
𝑋𝑂𝑈1 × (Midbrain development, Substantia nigra) → 0.9 RPT6 × (Angiogenesis, Blood) → 0.05
Midbrain development WNT1
PPI network in substantia nigra tissue
Angiogenesis RPT6
PPI network in blood tissue
Marinka Zitnik, Stanford, ISMB/ECCB 2017 3
§ Guilty by association: protein’s function is determined based on who it interacts with [Zuberi et al. 2013, Radivojac et al. 2013,
Kramer et al. 2014, Yu et al. 2015] and many others]
§ No tissue-specificity
§ Protein functions are assumed constant across organs and tissues:
§ Functions in heart are the same as in skin
Marinka Zitnik, Stanford, ISMB/ECCB 2017 4
§ Proteins in biologically similar tissues have similar functions [Greene et al. 2015, ENCODE 2016] § Proteins are missing in some tissues
Marinka Zitnik, Stanford, ISMB/ECCB 2017 5
WNT1 INA DLPG5 GPR4 ETS1 NDNF RHOA HPSE WNT1 INA DLG5 GPR4 ETS1 NDNF RHOA
Angiogenesis Midbrain development
HPSE
Machine learning
Multi-label node classification: midbrain development, angiogenesis, etc.
Marinka Zitnik, Stanford, ISMB/ECCB 2017 6
Raw Networks Node and edge profiles Learning Algorithm Prediction Model
Downstream task: Protein function prediction Feature engineering
Automatically learn the features
Marinka Zitnik, Stanford, ISMB/ECCB 2017 7
Vectors, node embeddings
𝑔
L, 𝑔 M, 𝑔 N
𝑔
O, 𝑔 P, 𝑔 Q
𝑣 → ℝT
u u u
Layer Layer Layer Scale “3” Scale “2” Scale “1”
Marinka Zitnik, Stanford, ISMB/ECCB 2017 8
§ Layers 𝐻L LWQ..X are in leaves of ℳ
L: 𝑊 L → ℝT
§ Learn node embeddings at each possible scale § Layers 𝑗, 𝑘, 𝑙, 𝑚
§ Scales “3”, “2”, “1”
Marinka Zitnik, Stanford, ISMB/ECCB 2017 9
Marinka Zitnik, Stanford, ISMB/ECCB 2017 10
§ Intuition: For each layer, embed nodes to 𝑒 dimensions by preserving their similarity § Two nodes are similar if their neighborhoods are similar § For node 𝑣 in layer 𝑗 we define nearby nodes as nodes in 𝐻L visited by random walks starting at 𝑣
Marinka Zitnik, Stanford, ISMB/ECCB 2017 11
“2” is a parent of 𝐻L and 𝐻
`
Marinka Zitnik, Stanford, ISMB/ECCB 2017 12
FemaleReproductiveSystem FemaleReproductiveSystem Choroid Choroid Eye Eye NervousSystem NervousSystem Placenta Placenta Integument Integument Retina Retina Hindbrain Hindbrain PancreaticIslet PancreaticIslet Basophil Basophil SpinalCord SpinalCord Spermatid Spermatid EndocrineGland EndocrineGland ReproductiveSystem ReproductiveSystem ParietalLobe ParietalLobe Hepatocyte Hepatocyte CorpusCallosum CorpusCallosum Pons Pons TemporalLobe TemporalLobe Pancreas Pancreas Oviduct Oviduct BloodPlasma BloodPlasma Lens Lens Glia Glia
§ Layers are PPI nets: § Nodes: proteins § Edges: tissue-specific PPIs § Node labels: § “Cortex development” in renal cortex tissue § “Artery morphogenesis” in artery tissue
One layer
Marinka Zitnik, Stanford, ISMB/ECCB 2017 13
§ Learn OhmNet embeddings for multi-layer tissue network § Train a classifier for each function based on a fraction of proteins and all their functions § Predict functions for new proteins
Marinka Zitnik, Stanford, ISMB/ECCB 2017 14
0.756
OhmNet Protein function prediction methods Mono-layer network embeddings Tensor decompositions
>10% improvement over function prediction methods >18% improvement over non- hierarchical versions of the dataset >15% improvement over matrix-based methods
Marinka Zitnik, Stanford, ISMB/ECCB 2017 15
Frontal lobe Medulla
Pons Substantia nigra Midbrain Parietal lobe Occipital lobe Temporal lobe
Brainstem Brain
Cerebellum
9 brain tissue PPI networks in two-level hierarchy
Marinka Zitnik, Stanford, ISMB/ECCB 2017 16
Marinka Zitnik, Stanford, ISMB/ECCB 2017 17
§ Transfer protein functions to an unannotated tissue § Task: Predict functions in target tissue without access to any annotation/label in that tissue
Target tissue Tissue-specific (OhmNet) Tissue non-specific Improvement Placenta 0.758 0.684 11% Spleen 0.779 0.712 10% Liver 0.741 0.553 34% Forebrain 0.755 0.632 20% Blood plasma 0.703 0.540 40% Smooth muscle 0.729 0.583 25% Average 0.746 0.617 21% Reported are AUROC values (see paper for other metrics)
Marinka Zitnik, Stanford, ISMB/ECCB 2017 18
§ Unsupervised feature learning for multi-layer networks § Learned embeddings can be used for any downstream prediction task: node classification, node clustering, link prediction § OhmNet predicts protein functions across biological contexts
Marinka Zitnik, Stanford, ISMB/ECCB 2017 19
Poster A-294
Travel Award
Marinka Zitnik, Stanford, ISMB/ECCB 2017 20