Deep Learning for Network Biology
Marinka Zitnik and Jure Leskovec
Stanford University
1 Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018
Deep Learning for Network Biology Marinka Zitnik and Jure Leskovec - - PowerPoint PPT Presentation
Deep Learning for Network Biology Marinka Zitnik and Jure Leskovec Stanford University Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 1 This Tutorial snap.stanford.edu/deepnetbio-ismb ISMB 2018 July 6,
Stanford University
1 Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 2
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 3
§ Map nodes to low-dimensional embeddings § Applications: PPIs, Disease pathways
§ Deep learning approaches for graphs § Applications: Gene functions
§ Embedding heterogeneous networks § Applications: Human tissues, Drug side effects
4
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018
So far we have embedded homogeneous networks Can we embed heterogeneous networks (i.e., het nets)?
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 5
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 6
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 7
§ 𝑊
# is the vertex set for node type 𝑢
§ 𝑩& is the adjacency matrix for edge type 𝑠 § 𝐘# ∈ ℝ+×|.| is a matrix of features for nodes of type 𝑢
§ Biologically meaningful node features:
– E.g., immunological signatures, gene expression profiles, gene functional information
§ No features:
– Indicator vectors (one-hot encoding of a node)
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 8
r1 Gastrointestinal bleed side effect r2 Bradycardia side effect
Protein-protein interaction Drug-protein interaction
r3 Nausea side effect r4 Mumps side effect
MAMBO: Multimodal biomedical networks § Tool for construction, representation and analysis of large multimodal networks:
§ Nets with millions of nodes and billions of edges § Nets with thousands of modes (i.e., entity types) and links (i.e., relationship types)
§ Network analytics through SNAP
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 9
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 10
11
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018
Based on material from:
tissue networks. ISMB & Bioinformatics.
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 12
u u u
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018
Embeddings
𝑔
0, 𝑔 2, 𝑔 3
𝑔
4, 𝑔 5, 𝑔 6
𝑣 → ℝ9
u u u
Layer Layer Layer Scale “3” Scale “2” Scale “1”
13
Input Output
§ Nodes in each graph § Nodes in each sub-hierarchy
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 14
§ For graphs 𝐻𝑗:
§ Use node2vec’s biased walks (see Part T1)
§ For hierarchy 𝑁:
§ Encode dependencies between graphs § Recursive regularization: embeddings at level 𝑗 are encouraged to be similar to embeddings in 𝑗’s parent in the hierarchy
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 15
16 Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018
§ Optimize node embeddings as described in Part T1 § Extra: Include terms for recursive regularization in the loss function
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 Frontal lobe Medulla
Pons Substantia nigra Midbrain Parietal lobe Occipital lobe Temporal lobe
Brainstem Brain
Cerebellum
9 brain tissue PPI networks in a two-level tissue hierarchy
17
18
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018
Based on material from:
heterogeneous networks. KDD.
19
Image from: Himmelstein et al. 2015. Heterogeneous network edge prediction: A data integration approach to prioritize disease-associated genes. PLoS Comp Bio.
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018
§ Specify a metapath of interest § Run random walks that capture structural correlations between different node types
§ Given the random walks, optimize node embeddings (similar to Part T1)
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 20
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 21
§ Given a metapath:
§ E.g., OAPVPAO
§ What is the next step of a walker on node 𝑏? that transitioned from node CMU?
§ Standard random walk: The next step can be all types of nodes surrounding it:
§ 𝑏5, 𝑏4, 𝑏@, 𝑞5, 𝑞4, and 𝐷𝑁𝑉
§ Metapath-based random walk: The next step can only be a paper node (P), given that its current node is an author node 𝑏? (A) and its previous step was an organization node 𝐷𝑁𝑉 (O):
§ Follow the semantics of this metapath
walks starting from each node
𝑢 that are visited by random walks starting at u
predicting which nodes are in Nt(u): ):
22 Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018
§ 2D projections of the learned embeddings for:
§ 16 CS conferences and corresponding high-profile researchers in each field
§ Metapath2vec:
§ Groups author-conference pairs closely § Automatically organizes these two types of nodes § Learns internal relationships between them:
§ E.g., J. Dean → OSDI § E.g., C. D. Manning → ACL
§ Not possible using methods for homogeneous networks
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 23
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 24
25
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018
Based on material from:
convolutional networks. ISMB & Bioinformatics.
26
r1 Gastrointestinal bleed side effect r2 Bradycardia side effect
Protein-protein interaction Drug-protein interaction
r3 Nausea side effect r4 Mumps side effect
Drug pair 𝑑, 𝑒 leads to side effect 𝑠
5
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 27
INPUT GRAPH TARGET NODE
B D E F C A B C D A A A C F B E A
28
Each edge type is modeled separately A node’s neighborhood defines a computation graph
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018
29
Neural network weight matrices
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018
30
An example batch of computation graphs Neural network weight matrices
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018
Aggregate neighbor’s previous layer embeddings
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 31
§ Approach: Average neighbor messages for each edge type and apply a neural network
Initial 0-th layer embeddings are equal to node features Embedding after K layers of neighborhood aggregation Non-linearity (e.g., ReLU) Previous layer embedding of v
h0
v = xv
@ X
∈
A zv = hK
v
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 32
How do we train the model to generate embeddings?
Goal: Predict labeled edges between drug nodes
33
Ciprofloxacin
r1 r2
Simvastatin Mupirocin
r2
Doxycycline
S C M D
Query: Given a drug pair 𝑑, 𝑡, how likely does an edge (𝑑, 𝑠
5, 𝑡) exist?
Co-prescribed drugs 𝑑 and 𝑡 lead to side effect 𝑠
5
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018
34
2) Use the learned embeddings to predict side effects of drug pairs
r, ?
Embedding
1) Take the graph and learn a 𝑒-dimensional vector (embedding) for every node
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018
35
Neural network weight matrices
This is multi-relational link prediction task!
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 36
July 10, 2018 at 12:20 pm http://snap.stanford.edu/decagon
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 37
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018 38
§ Map nodes to low-dimensional embeddings § Applications: PPIs, Disease pathways
§ Deep learning approaches for graphs § Applications: Gene functions
§ Embedding heterogeneous networks § Applications: Human tissues, Drug side effects
39
PhD Students Post-Doctoral Fellows Funding Collaborators Industry Partnerships
Claire Donnat Mitchell Gordon David Hallac Emma Pierson Himabindu Lakkaraju Rex Ying Tim Althoff Will Hamilton Baharan Mirzasoleiman Marinka Zitnik Michele Catasta Srijan Kumar Stephen Bach Rok Sosic
Research Staff
Adrijan Bradaschia Dan Jurafsky, Linguistics, Stanford University Christian Danescu-Miculescu-Mizil, Information Science, Cornell University Stephen Boyd, Electrical Engineering, Stanford University David Gleich, Computer Science, Purdue University VS Subrahmanian, Computer Science, University of Maryland Sarah Kunz, Medicine, Harvard University Russ Altman, Medicine, Stanford University Jochen Profit, Medicine, Stanford University Eric Horvitz, Microsoft Research Jon Kleinberg, Computer Science, Cornell University Sendhill Mullainathan, Economics, Harvard University Scott Delp, Bioengineering, Stanford University Jens Ludwig, Harris Public Policy, University of Chicago Geet Sethi Alex Porter
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018
40
Many interesting high-impact projects in Machine Learning and Large Biomedical Data
Applications: Precision Medicine & Health, Drug Repurposing, Drug Side Effect modeling, Network Biology, and many more
Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018