[PPT] - Node Representation Learning Prof. Srijan Kumar PowerPoint Presentation

SLIDE 1

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

1

CSE 6240: Web Search and Text Mining. Spring 2020

Node Representation Learning

Prof. Srijan Kumar

http://cc.gatech.edu/~srijan

SLIDE 2

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

2

Administrivia

Project midterm rubrik released

– Discussion at the end

Proposal regrades done

SLIDE 3

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

3

Today’s Lecture

Introduction
Node embedding setup
Random walk approaches for node

embedding

Project midterm rubrik

These slides are inspired by Prof. Jure Leskovec’s CS224W lecture

SLIDE 4

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

4

Machine Learning in Networks

? ? ? ? ?

Machine Learning

Networks are complex
Need a uniform language to process various

networks

SLIDE 5

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

5

Example: Node Classification

Classifying the function
f proteins in the

interactome

Image from: Ganapathiraju et al. 2016. Schizophrenia interactome with 504 novel protein–protein interactions. Nature.

SLIDE 6

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

6

Example: Link Prediction

Machine Learning

? ? ?

x

Which links exist in the network?

SLIDE 7

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

7

Machine Learning Lifecycle

Raw Data Structured Data Learning Algorithm Model Downstream task Feature Engineering

Automatically learn the features

Typical machine learning lifecycle

requires feature engineering every single time!

Goal: avoid task-specific feature

engineering

SLIDE 8

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

8

Feature Learning in Graphs

Goal: Efficient task-independent feature

learning for machine learning with graphs! vec node 𝑔: 𝑣 → ℝ& ℝ&

Feature representation, embedding

u

SLIDE 9

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

9

–

– –

17

Why Network Embedding?

Task: We map each node in a network into

a low-dimensional space

– Distributed representations for nodes – Similarity of embeddings between nodes indicates their network similarity – Encode network information and generate node representation

SLIDE 10

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

10

Example Node Embedding

2D embeddings of nodes of the Zachary’s

Karate Club network:

Zachary’s Karate Network:

Image from: Perozzi et al. DeepWalk: Online Learning of Social Representations. KDD 2014.

SLIDE 11

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

11

Why Is It Hard?

Modern deep learning toolbox is designed

for simple sequences or grids.

– CNNs for fixed-size images/grids…. – RNNs or word2vec for text/sequences…

SLIDE 12

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

12

Why Is It Hard?

But networks are far more complex!

– Complex topographical structure (i.e., no spatial locality like grids) – No fixed node ordering or reference point (i.e., the isomorphism problem) – Often dynamic and have multimodal features.

SLIDE 13

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

13

Today’s Lecture

Introduction
Node embedding setup
Random walk approaches for node

embedding

Project midterm rubrik

SLIDE 14

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

14

Framework Setup

Assume we have a graph G:

– V is the vertex set. – A is the adjacency matrix (assume binary). – No node features or extra information is used!

SLIDE 15

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

15

Embedding Nodes

Goal: Encode nodes so that similarity in

the embedding space (e.g., dot product) approximates similarity in the original network

SLIDE 16

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

16

Embedding Nodes

similarity(u, v) ≈ z>

v zu

Goal: Need to define!

in the original network Similarity of the embedding

SLIDE 17

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

17

Learning Node Embeddings

1. Define an encoder (i.e., a mapping from

nodes to embeddings)

2. Define a node similarity function (i.e., a

measure of similarity in the original network)

3. Optimize the parameters of the encoder

so that:

similarity(u, v) ≈ z>

v zu

in the original network Similarity of the embedding

SLIDE 18

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

18

Two Key Components

Encoder: maps each node to a low-

dimensional vector

Similarity function: specifies how the

relationships in vector space map to the relationships in the original network

enc(v) = zv

node in the input graph d-dimensional embedding Similarity of u and v in the original network dot product between node embeddings

similarity(u, v) ≈ z>

v zu

SLIDE 19

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

19

How to Define Node Similarity?

Key choice of methods is how they define

node similarity.

E.g., should two nodes have similar

embeddings if they….

– are connected? – share neighbors? – have similar “structural roles”? – …?

SLIDE 20

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

20

Today’s Lecture

Introduction
Node embedding setup
Random walk approaches for node

embedding

Project midterm rubrik

SLIDE 21

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

21

Random Walk

Given a graph and a starting point, we select

a neighbor of it at random, and move to this neighbor; then we select a neighbor of this point at random, and move to it, etc.

The (random) sequence of points selected

this way is a random walk on the graph.

SLIDE 22

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

22

Random-Walk Node Similarity

probability that u and v co-occur on a random walk over the network

z>

u zv ≈

SLIDE 23

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

23

Random-Walk Embeddings

Estimate probability of

visiting node 𝑤 on a random walk starting from node 𝑣 using some random walk strategy R

Learn node embedding

such that nearby nodes are close together in the network

– Similarity here: dot product

SLIDE 24

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

24

Unsupervised Feature Learning

Given a node 𝒗, how do we define

nearby nodes?

– 𝑂0 𝑣 = neighborhood of 𝑣 obtained by some random-walk strategy 𝑆

Different neighborhood definitions give

different algorithms

– We will look at DeepWalk and node2vec

SLIDE 25

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

25

Random Walk Optimization

1. Run short fixed-length random walks starting from each node on the graph using some strategy R 2. For each node 𝒗, collect 𝑶𝑺(𝒗), the multiset* of nodes visited on random walks starting from u

– 𝑂0(𝑣) can have repeat elements since nodes can be visited multiple times on random walks

3. Optimize embeddings

SLIDE 26

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

26

Random Walk Optimization

sum over all nodes 𝑣 sum over nodes 𝑤 seen on random walks starting from 𝑣 predicted probability of 𝑣 and 𝑤 co-occuring on random walk

L = X

u2V

X

v2NR(u)

− log ✓ exp(z>

u zv)

P

n2V exp(z> u zn)

◆

High score (= embedding cosine similarity)
f nodes appearing in random walk; Low

probability of other nodes

Expensive to calculate for all node pairs
Use negative sampling

SLIDE 27

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

27

DeepWalk [Perozzi et al., 2013]

What strategies should we use to run

these random walks?

Simplest idea: Just run fixed-length,

unbiased random walks starting from each node (i.e., DeepWalk from Perozzi et al., 2013).

– The issue is that such notion of similarity is too

constrained

– Node2vec generalizes this

SLIDE 28

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

28

DeepWalk Example

2D embeddings of nodes of the Zachary’s

Karate Club network:

Zachary’s Karate Network:

Image from: Perozzi et al. DeepWalk: Online Learning of Social Representations. KDD 2014.

SLIDE 29

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

29

node2vec [Grover et al., 2016]

Goal: Embed nodes with similar network

neighborhoods close in the feature space

– Frame this goal as a maximum likelihood

ptimization problem, independent to the

downstream prediction task

Key observation: Develop biased 2nd order

random walk 𝑆 to generate network neighborhood 𝑂0(𝑣) of node 𝑣

SLIDE 30

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

30

node2vec: Biased Walks

Idea: use flexible, biased random walks that can trade off between local and global views

f the network (Grover and Leskovec, 2016).

u s3 s2

s1

s4 s8 s9 s6 s7 s5

BFS DFS

SLIDE 31

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

31

node2vec: Biased Walks

Two strategies to define a neighborhood

𝑶𝑺 𝒗 of a given node 𝒗: BFS & DFS

Walk of length 3 (𝑂0 𝑣 of size 3):

𝑂:;< 𝑣 = { 𝑡?, 𝑡@, 𝑡A} 𝑂C;< 𝑣 = { 𝑡D, 𝑡E, 𝑡F} Local microscopic view Global macroscopic view

u s3 s2

s1

s4 s8 s9 s6 s7 s5

BFS DFS

SLIDE 32

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

32

Interpolating BFS and DFS

Biased fixed-length random walk 𝑺 that

given a node 𝒗 generates neighborhood 𝑶𝑺 𝒗

Two parameters:

– Return parameter 𝒒:

Return back to the previous node

– In-out parameter 𝒓:

Moving outwards (DFS) vs. inwards (BFS)
Intuitively, 𝑟 is the “ratio” of BFS vs. DFS

SLIDE 33

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

33

Biased Random Walks

Biased 2nd-order random walks explore network neighborhoods:

– Random walk just traversed edge (𝑡?, 𝑥) and is now at 𝑥 – Neighbors of 𝑥 can only be: Idea: Remember where that walk came from

s1 s2 w s3 u

Back to 𝒕𝟐 Same distance to 𝒕𝟐 Farther from 𝒕𝟐

SLIDE 34

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

34

Biased Random Walks

Walker came over edge (s?, w) and is at w.

Where to go next?

𝑞, 𝑟 model transition probabilities

– 𝑞 = return parameter, 𝑟 = “walk away” parameter

BFS-like walk: Low value of 𝑞
DFS-like walk: Low value of 𝑟

1 1/𝑟 1/𝑞

1/𝑞, 1/𝑟, 1 are

unnormalized probabilities

s1 s2 w s3 u s4

1/𝑟

SLIDE 35

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

35

Experiments: Micro vs. Macro

Graphs = Interactions of characters in a novel
Node color = identified communities

p=1, q=2

Microscopic view of the network neighbourhood

p=1, q=0.5

Macroscopic view of the network neighbourhood

SLIDE 36

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

36

Experiments: Datasets

BlogCatalog: Social network of bloggers

– 10,312 nodes, 333,983 edges, 39 labels

Protein-protein interactions (PPI):

subgraph of the PPI network for Homo Sapiens

– 3,890 nodes, 76,584 edges, and 50 labels

Wikipedia: co-occurrence network of words

– 4,777 nodes, 184,812 edges, and 40 different labels (Part-of-Speech tags of words)

SLIDE 37

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

37

Experiments: Results

SLIDE 38

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

38

How to Use Embeddings

How to use embeddings 𝒜𝒋 of nodes:

– Clustering/community detection: Cluster points 𝑨U – Node classification: Predict label 𝑔(𝑨U) of node 𝑗 based on 𝑨U – Link prediction: Predict edge (𝑗, 𝑘) based on 𝑔(𝑨U, 𝑨

X)

Where we can: concatenate, avg, product, or take a

difference between the embeddings:

– Concatenate: 𝑔(𝑨U, 𝑨

X)= 𝑕([𝑨U, 𝑨 X])

– Hadamard: 𝑔(𝑨U, 𝑨

X)= 𝑕(𝑨U ∗ 𝑨 X) (per coordinate product)

– Sum/Avg: 𝑔(𝑨U, 𝑨

X)= 𝑕(𝑨U + 𝑨 X)

– Distance: 𝑔(𝑨U, 𝑨

X)= 𝑕(||𝑨U − 𝑨 X||@)

SLIDE 39

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

39

Today’s Lecture

Introduction
Node embedding setup
Random walk approaches for node

embedding

Project midterm rubrik

SLIDE 40

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

40

Project Midterm Expectations

Details are here

SLIDE 41

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

41

Today’s Lecture

Introduction
Node embedding setup
Random walk approaches for node

embedding

Project midterm rubrik