Node Representation Learning Prof. Srijan Kumar - - PowerPoint PPT Presentation

node representation learning
SMART_READER_LITE
LIVE PREVIEW

Node Representation Learning Prof. Srijan Kumar - - PowerPoint PPT Presentation

CSE 6240: Web Search and Text Mining. Spring 2020 Node Representation Learning Prof. Srijan Kumar http://cc.gatech.edu/~srijan 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining Administrivia Project midterm


slide-1
SLIDE 1

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

1

CSE 6240: Web Search and Text Mining. Spring 2020

Node Representation Learning

  • Prof. Srijan Kumar

http://cc.gatech.edu/~srijan

slide-2
SLIDE 2

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

2

Administrivia

  • Project midterm rubrik released

– Discussion at the end

  • Proposal regrades done
slide-3
SLIDE 3

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

3

Today’s Lecture

  • Introduction
  • Node embedding setup
  • Random walk approaches for node

embedding

  • Project midterm rubrik

These slides are inspired by Prof. Jure Leskovec’s CS224W lecture

slide-4
SLIDE 4

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

4

Machine Learning in Networks

? ? ? ? ?

Machine Learning

  • Networks are complex
  • Need a uniform language to process various

networks

slide-5
SLIDE 5

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

5

Example: Node Classification

  • Classifying the function
  • f proteins in the

interactome

Image from: Ganapathiraju et al. 2016. Schizophrenia interactome with 504 novel protein–protein interactions. Nature.

slide-6
SLIDE 6

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

6

Example: Link Prediction

Machine Learning

? ? ?

x

  • Which links exist in the network?
slide-7
SLIDE 7

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

7

Machine Learning Lifecycle

Raw Data Structured Data Learning Algorithm Model Downstream task Feature Engineering

Automatically learn the features

  • Typical machine learning lifecycle

requires feature engineering every single time!

  • Goal: avoid task-specific feature

engineering

slide-8
SLIDE 8

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

8

Feature Learning in Graphs

  • Goal: Efficient task-independent feature

learning for machine learning with graphs! vec node 𝑔: 𝑣 → ℝ& ℝ&

Feature representation, embedding

u

slide-9
SLIDE 9

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

9

– –

17

Why Network Embedding?

  • Task: We map each node in a network into

a low-dimensional space

– Distributed representations for nodes – Similarity of embeddings between nodes indicates their network similarity – Encode network information and generate node representation

slide-10
SLIDE 10

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

10

Example Node Embedding

  • 2D embeddings of nodes of the Zachary’s

Karate Club network:

  • Zachary’s Karate Network:

Image from: Perozzi et al. DeepWalk: Online Learning of Social Representations. KDD 2014.

slide-11
SLIDE 11

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

11

Why Is It Hard?

  • Modern deep learning toolbox is designed

for simple sequences or grids.

– CNNs for fixed-size images/grids…. – RNNs or word2vec for text/sequences…

slide-12
SLIDE 12

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

12

Why Is It Hard?

  • But networks are far more complex!

– Complex topographical structure (i.e., no spatial locality like grids) – No fixed node ordering or reference point (i.e., the isomorphism problem) – Often dynamic and have multimodal features.

slide-13
SLIDE 13

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

13

Today’s Lecture

  • Introduction
  • Node embedding setup
  • Random walk approaches for node

embedding

  • Project midterm rubrik
slide-14
SLIDE 14

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

14

Framework Setup

  • Assume we have a graph G:

– V is the vertex set. – A is the adjacency matrix (assume binary). – No node features or extra information is used!

slide-15
SLIDE 15

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

15

Embedding Nodes

  • Goal: Encode nodes so that similarity in

the embedding space (e.g., dot product) approximates similarity in the original network

slide-16
SLIDE 16

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

16

Embedding Nodes

similarity(u, v) ≈ z>

v zu

Goal: Need to define!

in the original network Similarity of the embedding

slide-17
SLIDE 17

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

17

Learning Node Embeddings

  • 1. Define an encoder (i.e., a mapping from

nodes to embeddings)

  • 2. Define a node similarity function (i.e., a

measure of similarity in the original network)

  • 3. Optimize the parameters of the encoder

so that:

similarity(u, v) ≈ z>

v zu

in the original network Similarity of the embedding

slide-18
SLIDE 18

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

18

Two Key Components

  • Encoder: maps each node to a low-

dimensional vector

  • Similarity function: specifies how the

relationships in vector space map to the relationships in the original network

enc(v) = zv

node in the input graph d-dimensional embedding Similarity of u and v in the original network dot product between node embeddings

similarity(u, v) ≈ z>

v zu

slide-19
SLIDE 19

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

19

How to Define Node Similarity?

  • Key choice of methods is how they define

node similarity.

  • E.g., should two nodes have similar

embeddings if they….

– are connected? – share neighbors? – have similar “structural roles”? – …?

slide-20
SLIDE 20

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

20

Today’s Lecture

  • Introduction
  • Node embedding setup
  • Random walk approaches for node

embedding

  • Project midterm rubrik
slide-21
SLIDE 21

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

21

Random Walk

  • Given a graph and a starting point, we select

a neighbor of it at random, and move to this neighbor; then we select a neighbor of this point at random, and move to it, etc.

  • The (random) sequence of points selected

this way is a random walk on the graph.

slide-22
SLIDE 22

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

22

Random-Walk Node Similarity

probability that u and v co-occur on a random walk over the network

z>

u zv ≈

slide-23
SLIDE 23

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

23

Random-Walk Embeddings

  • Estimate probability of

visiting node 𝑤 on a random walk starting from node 𝑣 using some random walk strategy R

  • Learn node embedding

such that nearby nodes are close together in the network

– Similarity here: dot product

slide-24
SLIDE 24

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

24

Unsupervised Feature Learning

  • Given a node 𝒗, how do we define

nearby nodes?

– 𝑂0 𝑣 = neighborhood of 𝑣 obtained by some random-walk strategy 𝑆

  • Different neighborhood definitions give

different algorithms

– We will look at DeepWalk and node2vec

slide-25
SLIDE 25

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

25

Random Walk Optimization

1. Run short fixed-length random walks starting from each node on the graph using some strategy R 2. For each node 𝒗, collect 𝑶𝑺(𝒗), the multiset* of nodes visited on random walks starting from u

– 𝑂0(𝑣) can have repeat elements since nodes can be visited multiple times on random walks

3. Optimize embeddings

slide-26
SLIDE 26

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

26

Random Walk Optimization

sum over all nodes 𝑣 sum over nodes 𝑤 seen on random walks starting from 𝑣 predicted probability of 𝑣 and 𝑤 co-occuring on random walk

L = X

u2V

X

v2NR(u)

− log ✓ exp(z>

u zv)

P

n2V exp(z> u zn)

  • High score (= embedding cosine similarity)
  • f nodes appearing in random walk; Low

probability of other nodes

  • Expensive to calculate for all node pairs
  • Use negative sampling
slide-27
SLIDE 27

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

27

DeepWalk [Perozzi et al., 2013]

  • What strategies should we use to run

these random walks?

  • Simplest idea: Just run fixed-length,

unbiased random walks starting from each node (i.e., DeepWalk from Perozzi et al., 2013).

– The issue is that such notion of similarity is too

constrained

– Node2vec generalizes this

slide-28
SLIDE 28

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

28

DeepWalk Example

  • 2D embeddings of nodes of the Zachary’s

Karate Club network:

  • Zachary’s Karate Network:

Image from: Perozzi et al. DeepWalk: Online Learning of Social Representations. KDD 2014.

slide-29
SLIDE 29

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

29

node2vec [Grover et al., 2016]

  • Goal: Embed nodes with similar network

neighborhoods close in the feature space

– Frame this goal as a maximum likelihood

  • ptimization problem, independent to the

downstream prediction task

  • Key observation: Develop biased 2nd order

random walk 𝑆 to generate network neighborhood 𝑂0(𝑣) of node 𝑣

slide-30
SLIDE 30

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

30

node2vec: Biased Walks

Idea: use flexible, biased random walks that can trade off between local and global views

  • f the network (Grover and Leskovec, 2016).

u s3 s2

s1

s4 s8 s9 s6 s7 s5

BFS DFS

slide-31
SLIDE 31

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

31

node2vec: Biased Walks

  • Two strategies to define a neighborhood

𝑶𝑺 𝒗 of a given node 𝒗: BFS & DFS

  • Walk of length 3 (𝑂0 𝑣 of size 3):

𝑂:;< 𝑣 = { 𝑡?, 𝑡@, 𝑡A} 𝑂C;< 𝑣 = { 𝑡D, 𝑡E, 𝑡F} Local microscopic view Global macroscopic view

u s3 s2

s1

s4 s8 s9 s6 s7 s5

BFS DFS

slide-32
SLIDE 32

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

32

Interpolating BFS and DFS

  • Biased fixed-length random walk 𝑺 that

given a node 𝒗 generates neighborhood 𝑶𝑺 𝒗

  • Two parameters:

– Return parameter 𝒒:

  • Return back to the previous node

– In-out parameter 𝒓:

  • Moving outwards (DFS) vs. inwards (BFS)
  • Intuitively, 𝑟 is the “ratio” of BFS vs. DFS
slide-33
SLIDE 33

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

33

Biased Random Walks

Biased 2nd-order random walks explore network neighborhoods:

– Random walk just traversed edge (𝑡?, 𝑥) and is now at 𝑥 – Neighbors of 𝑥 can only be: Idea: Remember where that walk came from

s1 s2 w s3 u

Back to 𝒕𝟐 Same distance to 𝒕𝟐 Farther from 𝒕𝟐

slide-34
SLIDE 34

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

34

Biased Random Walks

  • Walker came over edge (s?, w) and is at w.

Where to go next?

  • 𝑞, 𝑟 model transition probabilities

– 𝑞 = return parameter, 𝑟 = “walk away” parameter

  • BFS-like walk: Low value of 𝑞
  • DFS-like walk: Low value of 𝑟

1 1/𝑟 1/𝑞

1/𝑞, 1/𝑟, 1 are

unnormalized probabilities

s1 s2 w s3 u s4

1/𝑟

slide-35
SLIDE 35

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

35

Experiments: Micro vs. Macro

  • Graphs = Interactions of characters in a novel
  • Node color = identified communities

p=1, q=2

Microscopic view of the network neighbourhood

p=1, q=0.5

Macroscopic view of the network neighbourhood

slide-36
SLIDE 36

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

36

Experiments: Datasets

  • BlogCatalog: Social network of bloggers

– 10,312 nodes, 333,983 edges, 39 labels

  • Protein-protein interactions (PPI):

subgraph of the PPI network for Homo Sapiens

– 3,890 nodes, 76,584 edges, and 50 labels

  • Wikipedia: co-occurrence network of words

– 4,777 nodes, 184,812 edges, and 40 different labels (Part-of-Speech tags of words)

slide-37
SLIDE 37

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

37

Experiments: Results

slide-38
SLIDE 38

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

38

How to Use Embeddings

  • How to use embeddings 𝒜𝒋 of nodes:

– Clustering/community detection: Cluster points 𝑨U – Node classification: Predict label 𝑔(𝑨U) of node 𝑗 based on 𝑨U – Link prediction: Predict edge (𝑗, 𝑘) based on 𝑔(𝑨U, 𝑨

X)

  • Where we can: concatenate, avg, product, or take a

difference between the embeddings:

– Concatenate: 𝑔(𝑨U, 𝑨

X)= 𝑕([𝑨U, 𝑨 X])

– Hadamard: 𝑔(𝑨U, 𝑨

X)= 𝑕(𝑨U ∗ 𝑨 X) (per coordinate product)

– Sum/Avg: 𝑔(𝑨U, 𝑨

X)= 𝑕(𝑨U + 𝑨 X)

– Distance: 𝑔(𝑨U, 𝑨

X)= 𝑕(||𝑨U − 𝑨 X||@)

slide-39
SLIDE 39

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

39

Today’s Lecture

  • Introduction
  • Node embedding setup
  • Random walk approaches for node

embedding

  • Project midterm rubrik
slide-40
SLIDE 40

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

40

Project Midterm Expectations

  • Details are here
slide-41
SLIDE 41

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

41

Today’s Lecture

  • Introduction
  • Node embedding setup
  • Random walk approaches for node

embedding

  • Project midterm rubrik