Random Walks in Graphs Thomas Bonald Stage LIESSE 2018 Schedule - - PowerPoint PPT Presentation

random walks in graphs
SMART_READER_LITE
LIVE PREVIEW

Random Walks in Graphs Thomas Bonald Stage LIESSE 2018 Schedule - - PowerPoint PPT Presentation

Random Walks in Graphs Thomas Bonald Stage LIESSE 2018 Schedule 9:30 - 12:30 Tutorial 12:30 - 13:30 Lunch 13:30 - 17:00 Lab session (python) Graph data Infrastructure: roads, railways, power grid, internet, ... Main European


slide-1
SLIDE 1

Random Walks in Graphs

Thomas Bonald Stage LIESSE 2018

slide-2
SLIDE 2

Schedule

◮ 9:30 - 12:30

Tutorial

◮ 12:30 - 13:30

Lunch

◮ 13:30 - 17:00

Lab session (python)

slide-3
SLIDE 3

Graph data

◮ Infrastructure: roads, railways, power grid, internet, ...

Main European highways

slide-4
SLIDE 4

Graph data

◮ Infrastructure: roads, railways, power grid, internet, ... ◮ Communication: phone, emails, flights, ...

International flights

slide-5
SLIDE 5

Graph data

◮ Infrastructure: roads, railways, power grid, internet, ... ◮ Communication: phone, emails, flights, ... ◮ Information: Web, Wikipedia, knowledge bases, ...

Symmetry Mathematics Topology Geometry Euclidean geometry Calculus Pythagorean theorem Mathematical analysis David Hilbert Euclid René Descartes Physics String theory

Extract from Wikipedia

slide-6
SLIDE 6

Graph data

◮ Infrastructure: roads, railways, power grid, internet, ... ◮ Communication: phone, emails, flights, ... ◮ Information: Web, Wikipedia, knowledge bases, ...

Belle Époque (film) Penélope Cruz Chromophobia (film) The Counselor Nine (2009 live-action film) Lola Dueñas Volver All the Pretty Horses (film) Woman on Top Todo es mentira Vicky Cristina Barcelona The Good Night Carmen Maura Jamón Jamón Head in the Clouds Volavérunt Vanilla Sky Chus Lampreave For Love, Only for Love Noel (film) Broken Embraces Yohana Cobo Blow (film) Gothika Bandidas G-Force (film) Zoolander 2 Don't Tempt Me The Rebel (1993 film) American Crime Story Captain Corelli's Mandolin (film) Entre rojas Manolete (film) The Girl of Your Dreams Don't Move Elegy (film) Open Your Eyes (1997 film) Sahara (2005 film) Blanca Portillo Alegre ma non troppo Grimsby (film) Twice Born Ma Ma (2015 film) All About My Mother The Greek Labyrinth The Hi-Lo Country To Rome with Love (film) La Celestina (1996 film) The Man with Rain in His Shoes Love Can Seriously Damage Your Health Murder on the Orient Express (2017 film)

Extract from the movie-actor graph

slide-7
SLIDE 7

Graph data

◮ Infrastructure: roads, railways, power grid, internet, ... ◮ Communication: phone, emails, flights, ... ◮ Information: Web, Wikipedia, knowledge bases, ... ◮ Social networks: Facebook, Twitter, LinkedIn, ...

Extract from Twitter Source: AllThingsGraphed.com

slide-8
SLIDE 8

Graph data

◮ Infrastructure: roads, railways, power grid, internet, ... ◮ Communication: phone, emails, flights, ... ◮ Information: Web, Wikipedia, knowledge bases, ... ◮ Social networks: Facebook, Twitter, LinkedIn, ... ◮ Biology: brain, proteins, phylogenetics, ...

The brain network Source: Wired

slide-9
SLIDE 9

Graph data

◮ Infrastructure: roads, railways, power grid, internet, ... ◮ Communication: phone, emails, flights, ... ◮ Information: Web, Wikipedia, knowledge bases, ... ◮ Social networks: Facebook, Twitter, LinkedIn, ... ◮ Biology: brain, proteins, phylogenetics, ... ◮ Health: genetic diseases, patient-doctor-pharmacy-drugs, ...

Pharmacy-doctor network Source: IAAI 2015

slide-10
SLIDE 10

Graph data

◮ Infrastructure: roads, railways, power grid, internet, ... ◮ Communication: phone, emails, flights, ... ◮ Information: Web, Wikipedia, knowledge bases, ... ◮ Social networks: Facebook, Twitter, LinkedIn, ... ◮ Biology: brain, proteins, phylogenetics, ... ◮ Health: genetic diseases, patient-doctor-pharmacy-drugs, ... ◮ Marketing: customer-product, bundling, ...

slide-11
SLIDE 11

Data as graph

◮ Dataset x1, . . . , xn ∈ X ◮ Similarity measure σ : X × X → R+ ◮ Graph of n nodes with weight σ(xi, xj) between nodes i and j

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Example: X = [0, 1]2, σ(x, y) = 1{d(x,y)<1/4}

slide-12
SLIDE 12

Data as graph

◮ Dataset x1, . . . , xn ∈ X ◮ Similarity measure σ : X × X → R+ ◮ Graph of n nodes with weight σ(xi, xj) between nodes i and j

Example: X = [0, 1]2, σ(x, y) = 1{d(x,y)<1/4}

slide-13
SLIDE 13

Motivation

◮ Information retrieval ◮ Content recommandation ◮ Advertizing ◮ Anomaly detection ◮ Security

slide-14
SLIDE 14

Graph analysis

◮ What are the most important nodes?

→ Ranking

◮ Can we predict new links?

→ Local ranking

◮ What is the graph structure?

→ Clustering

◮ Can we predict labels?

→ Classification

slide-15
SLIDE 15

Setting

A weighted, undirected, connected graph of n nodes No self-loops Weighted adjacency matrix A Vector of node weights d = A1

slide-16
SLIDE 16

Outline

  • 1. Random walk
  • 2. Laplacian matrix
  • 3. Spectral analysis
  • 4. Graph embedding
  • 5. Applications
slide-17
SLIDE 17

Outline

  • 1. Random walk

→ Statistical physics

  • 2. Laplacian matrix

→ Heat equation

  • 3. Spectral analysis

→ Mechanics

  • 4. Graph embedding

→ Electricity

  • 5. Applications
slide-18
SLIDE 18

Outline

  • 1. Random walk

→ Statistical physics

  • 2. Laplacian matrix

→ Heat equation

  • 3. Spectral analysis

→ Mechanics

  • 4. Graph embedding

→ Electricity

  • 5. Applications
slide-19
SLIDE 19

Random walk

Consider a random walk in the graph G where the probability of moving from node i to node j is Aij/di

slide-20
SLIDE 20

Random walk

Consider a random walk in the graph G where the probability of moving from node i to node j is Aij/di The sequence of nodes X0, X1, X2, . . . defines a Markov chain on {1, . . . , n} with transition matrix P = D−1A

slide-21
SLIDE 21

Random walk

Consider a random walk in the graph G where the probability of moving from node i to node j is Aij/di The sequence of nodes X0, X1, X2, . . . defines a Markov chain on {1, . . . , n} with transition matrix P = D−1A

◮ Dynamics:

P(Xt+1 = i) =

  • j

P(Xt = j)Pji

slide-22
SLIDE 22

Random walk

Consider a random walk in the graph G where the probability of moving from node i to node j is Aij/di The sequence of nodes X0, X1, X2, . . . defines a Markov chain on {1, . . . , n} with transition matrix P = D−1A

◮ Dynamics:

P(Xt+1 = i) =

  • j

P(Xt = j)Pji

◮ Stationary distribution π:

P(X∞ = i) =

  • j

P(X∞ = j)Pji ⇐ ⇒ πi =

  • j

πjPji (global balance)

slide-23
SLIDE 23

Return time

Since πi is the frequency of visits of node i in stationary regime, the mean return time to node i is given by σi = Ei(τ +

i ) = 1

πi with τ +

i

= min{t ≥ 1 : Xt = i}

slide-24
SLIDE 24

Reversibility

A Markov chain is called reversible if in stationary regime, the probability of any sequence of states is the same in both directions

  • f time
slide-25
SLIDE 25

Reversibility

A Markov chain is called reversible if in stationary regime, the probability of any sequence of states is the same in both directions

  • f time

◮ Transition from state i to state j:

P(Xt = i, Xt+1 = j) = P(Xt = j, Xt+1 = i) ⇐ ⇒ πiPij = πjPji (local balance)

slide-26
SLIDE 26

Reversibility

A Markov chain is called reversible if in stationary regime, the probability of any sequence of states is the same in both directions

  • f time

◮ Transition from state i to state j:

P(Xt = i, Xt+1 = j) = P(Xt = j, Xt+1 = i) ⇐ ⇒ πiPij = πjPji (local balance)

◮ Sequence of states i0, i1, . . . iℓ:

P(Xt = i0, . . . , Xt+ℓ = iℓ) = P(Xt = iℓ, . . . , Xt+ℓ = i0) ⇐ ⇒ πi0Pi0i1 . . . Piℓ−1iℓ = πiℓPiℓiℓ−1 . . . Pi1i0

slide-27
SLIDE 27

Reversibility & random walks

◮ The random walk in a graph is a reversible Markov chain,

with stationary distribution π ∝ d

slide-28
SLIDE 28

Reversibility & random walks

◮ The random walk in a graph is a reversible Markov chain,

with stationary distribution π ∝ d

◮ Conversely, any reversible Markov chain is a random walk in

a graph, with weights πiPij = πjPji

slide-29
SLIDE 29

Reversibility in physics

◮ All microscopic laws of physics are reversible

slide-30
SLIDE 30

Reversibility in physics

◮ All microscopic laws of physics are reversible ◮ The second law of thermodynamics states that the evolution

  • f any isolated system is irreversible
slide-31
SLIDE 31

Reversibility in physics

◮ All microscopic laws of physics are reversible ◮ The second law of thermodynamics states that the evolution

  • f any isolated system is irreversible

◮ This apparent paradox was solved by Tatiana & Paul

Ehrenfest in 1907

slide-32
SLIDE 32

Example

slide-33
SLIDE 33

Hitting time, commute time & escape probability

◮ Mean hitting time of node j from node i:

Hij = Ei(τj), τj = min{t ≥ 0 : Xt = j}

◮ Mean commute time between nodes i and j:

ρij = Hij + Hji

◮ Escape probability from node i to node j:

eij = Pi(τj < τ +

i )

Proposition

ρij = 1 πieij

slide-34
SLIDE 34

Proof

slide-35
SLIDE 35

Frequency of no-return paths

∀i = j πieij = πjeji

slide-36
SLIDE 36

Outline

  • 1. Random walk

→ Statistical physics

  • 2. Laplacian matrix

→ Heat equation

  • 3. Spectral analysis

→ Mechanics

  • 4. Graph embedding

→ Electricity

  • 5. Applications
slide-37
SLIDE 37

Laplacian matrix

Let D = diag(A1).

Definition

The matrix L = D − A is called the Laplacian matrix.

Heat equation

◮ Fix the temperature of some nodes S ⊂ {1, . . . , n} ◮ Interpret the weight Aij as the thermal conductivity ◮ Then for any node i ∈ S,

dT dt =

  • j

Aij(Tj − Ti) = −(LT)i

slide-38
SLIDE 38

Example

slide-39
SLIDE 39

Example

slide-40
SLIDE 40

Example

slide-41
SLIDE 41

Equilibrium

Dirichlet problem

◮ For any node i ∈ S,

(LT)i = 0 with boundary condition Ti for all i ∈ S

◮ The vector T is said to be harmonic

Uniqueness

There is at most one solution to the Dirichlet problem Proof based on the maximum principle

slide-42
SLIDE 42

The maximum principle

slide-43
SLIDE 43

Back to random walks

◮ Consider the probability that the random walk first hits S in j

when starting from i: PS

ij = Pi(τj = τS)

with τS = min{t ≥ 0 : Xt ∈ S}

◮ This defines a stochastic matrix PS

slide-44
SLIDE 44

Back to random walks

◮ Consider the probability that the random walk first hits S in j

when starting from i: PS

ij = Pi(τj = τS)

with τS = min{t ≥ 0 : Xt ∈ S}

◮ This defines a stochastic matrix PS

Existence

The solution to the Dirichlet problem is ∀i ∈ S, Ti =

  • j∈S

PS

ij Tj

slide-45
SLIDE 45

Solution to the Dirichlet problem

slide-46
SLIDE 46

Outline

  • 1. Random walk

→ Statistical physics

  • 2. Laplacian matrix

→ Heat equation

  • 3. Spectral analysis

→ Mechanics

  • 4. Graph embedding

→ Electricity

  • 5. Applications
slide-47
SLIDE 47

Spectral analysis

The Laplacian matrix L is symmetric and positive semi-definite

Proposition

∀v ∈ Rn, vTLv =

  • i<j

Aij(vi − vj)2

slide-48
SLIDE 48

Spectral analysis

The Laplacian matrix L is symmetric and positive semi-definite

Proposition

∀v ∈ Rn, vTLv =

  • i<j

Aij(vi − vj)2

Spectral decomposition

L = V ΛV T

◮ Λ = diag(λ1, . . . , λn) is the diagonal matrix of eigenvalues,

with 0 = λ1 < λ2 ≤ . . . ≤ λn

◮ V = (v1, . . . , vn) is a unitary matrix of eigenvectors,

with v1 = 1/√n

slide-49
SLIDE 49

Mechanics

Consider a mechanical system of n particles of unit mass located

  • n a line and linked by springs with stiffness Aij (Hooke’s law)
slide-50
SLIDE 50

Mechanics

Consider a mechanical system of n particles of unit mass located

  • n a line and linked by springs with stiffness Aij (Hooke’s law)

Denoting by v ∈ Rn the location of these particles, the force between i and j is: Aij|vi − vj|

slide-51
SLIDE 51

Mechanics

Consider a mechanical system of n particles of unit mass located

  • n a line and linked by springs with stiffness Aij (Hooke’s law)

Denoting by v ∈ Rn the location of these particles, the force between i and j is: Aij|vi − vj| We deduce the potential energy of the system: 1 2

  • i<j

Aij(vi − vj)2 = 1 2vTLv

slide-52
SLIDE 52

Energy minima

The minimum of vTLv under the constraint vTv = 1 is:

◮ 0 (take v = v1) ◮ λ2 under the constraint 1Tv = 0 (take v = v2)

Theorem

For all k = 1, . . . , n, λk = min

v:vT v=1 vT

1 v=0,...,vT k−1v=0

vTLv and the minimum is attained for v = vk.

slide-53
SLIDE 53

Proof

slide-54
SLIDE 54

Physical interpretation

Assume each particle has unit mass and let the mechanical system rotate with angular velocity ω > 0

slide-55
SLIDE 55

Physical interpretation

Assume each particle has unit mass and let the mechanical system rotate with angular velocity ω > 0 By Newton’s law, ∀i,

  • j

Aij(vj − vi) = −viω2 ⇐ ⇒ Lv = ω2v

slide-56
SLIDE 56

Physical interpretation

Assume each particle has unit mass and let the mechanical system rotate with angular velocity ω > 0 By Newton’s law, ∀i,

  • j

Aij(vj − vi) = −viω2 ⇐ ⇒ Lv = ω2v

Observations

◮ The only possible values of angular velocity are √λ2, . . . , √λn ◮ The corresponding equilibra are proportional to v2, . . . , vn

slide-57
SLIDE 57

Physical interpretation (energy)

At equilibrium, the potential energy is equal to the (rotational) kinetic energy: 1 2vTLv = 1 2vTvω2 where vTv is the moment of inertia of the system.

slide-58
SLIDE 58

Physical interpretation (energy)

At equilibrium, the potential energy is equal to the (rotational) kinetic energy: 1 2vTLv = 1 2vTvω2 where vTv is the moment of inertia of the system.

Observations

For unit moments of inertia,

◮ The only possible values of energy are (half) λ2, . . . , λn ◮ The corresponding equilibra are v2, . . . , vn

slide-59
SLIDE 59

Example

v2 v3

slide-60
SLIDE 60

Back to random walks

◮ The normalized symmetric Laplacian is defined by:

L = D−1/2LD−1/2 = I − D−1/2AD−1/2

◮ This matrix is symmetric and positive semi-definite ◮ By the spectral theorem,

L = VΓVT where Γ = (γ1, . . . , γn), with γ1 = 0 < γ2 ≤ . . . ≤ γn

Observation

The transition matrix P has eigenvalues 1 > 1 − γ2 ≥ . . . ≥ γn, with corresponding matrix of eigenvectors D−1/2V

slide-61
SLIDE 61

Outline

  • 1. Random walk

→ Statistical physics

  • 2. Laplacian matrix

→ Heat equation

  • 3. Spectral analysis

→ Mechanics

  • 4. Graph embedding

→ Electricity

  • 5. Applications
slide-62
SLIDE 62

Pseudo-inverse

Recall that L = V ΛV T The pseudo-inverse of L is L+ = V Λ+V T with Λ+ = diag

  • 0, 1

λ2 , . . . , 1 λn

  • Proposition

LL+ = L+L = I − 11T n

slide-63
SLIDE 63

Proof

slide-64
SLIDE 64

First graph embedding

Consider the embedding Z = (z1, . . . , zn) of the nodes in Rn, with Z = √ Λ+V T

slide-65
SLIDE 65

First graph embedding

Consider the embedding Z = (z1, . . . , zn) of the nodes in Rn, with Z = √ Λ+V T

Observations

◮ The first coordinate is 0 ◮ The k-th coordinate is vk/√λk, with energy

1 2 vT

k Lvk

λk = 1 2

◮ Null component-wise averages, Z1 = 0 ◮ The Gram matrix of Z is the pseudo-inverse of L

Z TZ = V Λ+V T = L+

slide-66
SLIDE 66

Example in R2

0.3 0.2 0.1 0.0 0.1 0.2 0.3 0.4 0.3 0.2 0.1 0.0 0.1 0.2 0.3

slide-67
SLIDE 67

Second graph embedding

Consider the embedding X = (x1, . . . , xn) of the nodes in Rn, with X =

  • |d|Z(I − π1T)

Observations

◮ Shifted, normalized version of Z ◮ Null component-wise weighted averages, Xπ = 0 ◮ Gram matrix of X:

G = X TX = |d|(I − 1πT)L+(I − π1T) Gπ = 0

slide-68
SLIDE 68

Example in R2

75 50 25 25 50 75 100 80 60 40 20 20 40 60 80

slide-69
SLIDE 69

Back to random walks

◮ The mean hitting time of node j from node i satisfies:

Hij = if i = j 1 + n

k=1 PikHkj

  • therwise
slide-70
SLIDE 70

Back to random walks

◮ The mean hitting time of node j from node i satisfies:

Hij = if i = j 1 + n

k=1 PikHkj

  • therwise

◮ We deduce that the matrix (I − P)H − 11T is diagonal ◮ Equivalently, the matrix LH − d1T is diagonal

slide-71
SLIDE 71

Back to random walks

◮ The mean hitting time of node j from node i satisfies:

Hij = if i = j 1 + n

k=1 PikHkj

  • therwise

◮ We deduce that the matrix (I − P)H − 11T is diagonal ◮ Equivalently, the matrix LH − d1T is diagonal

Theorem

H = 11Td(G) − G where G = X TX is the Gram matrix of X

slide-72
SLIDE 72

Back to random walks

◮ The mean hitting time of node j from node i satisfies:

Hij = if i = j 1 + n

k=1 PikHkj

  • therwise

◮ We deduce that the matrix (I − P)H − 11T is diagonal ◮ Equivalently, the matrix LH − d1T is diagonal

Theorem

H = 11Td(G) − G where G = X TX is the Gram matrix of X

Observation

H = 1hT − G with hT = πTH

slide-73
SLIDE 73

Graph embedding and random walk

◮ Square distance to the origin:

||xi||2 = hi (hitting time)

◮ Scalar product:

xT

j (xj − xi) = Hij

(hitting time)

◮ Square distance between nodes i and j:

||xi − xj||2 = ρij (commute time)

slide-74
SLIDE 74

Proof of the Theorem

Lemma

There is at most one matrix H such that LH − d1T is diagonal and d(H) = 0

slide-75
SLIDE 75

Proof of the Theorem

Theorem

H = 11Td(G) − G

slide-76
SLIDE 76

Mean return times

◮ The mean return time to node i satisfies

σi = 1 +

  • j

PijHji

◮ Thus the diagonal of PH + 11T gives the mean return times

Corollary

d(PH + 11T) = diag(π)−1

slide-77
SLIDE 77

Electricity

◮ Consider the electric network induced by the graph, with a

resistor of conductance Aij between nodes i and j

slide-78
SLIDE 78

Electricity

◮ Consider the electric network induced by the graph, with a

resistor of conductance Aij between nodes i and j

◮ We look for the vector U of electric potentials given Us = 1

(source) and Ut = 0 (sink)

slide-79
SLIDE 79

A Dirichlet problem

◮ By Ohm’s law, the current that flows from i to j is

Aij(Ui − Uj)

◮ By Kirchoff’s law, the net current at any node i = s, t is null:

  • j

Aij(Ui − Uj) = 0 that is (LU)i = 0

◮ The vector U is the solution to the Dirichlet problem with

boundary conditions Us = 1 and Ut = 0

slide-80
SLIDE 80

Energy dissipation

◮ Energy dissipation = differential of potential × current ◮ Total energy dissipation

  • i<j

Aij(Uj − Ui)2

Thompson’s principle

The potential vector U minimizes energy dissipation Taking the derivative in Ui

  • j

Aij(Uj − Ui) = 0 that is (LU)i = 0, which is the Dirichlet problem

slide-81
SLIDE 81

Solution to the Dirichlet problem

Proposition

The electric potential of node i is Ui = (xi − xt)T(xs − xt) ||xs − xt||2

slide-82
SLIDE 82

Example

75 50 25 25 50 75 100 80 60 40 20 20 40 60 80

slide-83
SLIDE 83

Effective conductance, effective resistance

◮ The current that goes from node s to node t is

|d| ||xs − xt||2 = |d| ρst

◮ This is the effective conductance between s and t ◮ The effective resistance between s and t is proportional to

ρst, the mean commute time between nodes s and t

slide-84
SLIDE 84

Electricity and random walks

The vector U of electric potential is the solution to the Dirichlet problem with Us = 1 and Ut = 0

Interpretation of voltage

The voltage of any node is the probability that the random walk starting from this node reaches node s before node t

slide-85
SLIDE 85

Electricity and random walks

The vector U of electric potential is the solution to the Dirichlet problem with Us = 1 and Ut = 0

Interpretation of voltage

The voltage of any node is the probability that the random walk starting from this node reaches node s before node t

Interpretation of current

The net current from node i to node j is the net frequency of particles moving from node i to node j, with a flow of particles entering the network at node s at rate |d| ρst

slide-86
SLIDE 86

The current as the net flow of particles

slide-87
SLIDE 87

Extension

◮ A single source s, at electric potential 1 ◮ Multiple sinks t1, . . . , tK, at electric potential 0

slide-88
SLIDE 88

Solution to the Dirichlet problem

Proposition

The electric potential of node i is: Ui =

K

  • k=1

αk(xi − xtl)T(xs − xtk) where

◮ l is an arbitrary element of {1, . . . , K} ◮ α is the unique solution to the equation Mα = |d|1, with M

the Gram matrix of the vectors (xs − xt1, . . . , xs − xtK )

General solution to the Dirichlet problem

◮ For each s ∈ S, apply previous result to get PS is ≡ Ui ◮ The potential of each node i ∈ S is Ui = j∈S PS ij Uj

slide-89
SLIDE 89

Outline

  • 1. Random walk

→ Statistical physics

  • 2. Laplacian matrix

→ Heat equation

  • 3. Spectral analysis

→ Mechanics

  • 4. Graph embedding

→ Electricity

  • 5. Applications
slide-90
SLIDE 90

Graph embedding

Method

  • 1. Check that the graph is connected
  • 2. Form the Laplacian L = D − A
  • 3. Compute v1, . . . , vk, the k eigenvectors of L associated with

the lowest eigenvalues, λ1 ≤ . . . ≤ λk

  • 4. Compute Z = diag
  • 1

√λ2 , . . . , 1 √λk

  • (v2, . . . , vk)T
  • 5. Return X =
  • |d|Z(I − π1T) where π = d/|d|

Observation

The dimension of the embedding must be chosen so that λk is large compared to λ2

slide-91
SLIDE 91

Ranking

Centrality

◮ Output: nodes in increasing order of ||xi||2

Local centrality

◮ Input: node s of interest ◮ Ouput: nodes in increasing order of xT i (xi − xs)

Local centrality (multiple nodes)

◮ Input: nodes s1, . . . , sK of interest (with weights) ◮ Ouput: nodes in increasing order of xT i (xi − x)

with x the weighted sum of xs1, . . . , xsK

slide-92
SLIDE 92

Ranking with repulsive nodes

Directional centrality

◮ Input: node s of interest, repulsive node t ◮ Ouput: nodes in increasing order of xT i (xs − xt)

Directional centrality (multiple repulsive nodes)

◮ Input: node s of interest, repulsive nodes t1, . . . , tK ◮ Ouput: nodes in increasing order of xT i x with

x =

K

  • k=1

αk(xs − xtk) where α is the solution to Mα = 1, with M the Gram matrix

  • f (xs − xt1, . . . , xs − xtK )
slide-93
SLIDE 93

Clustering

Partition C1, . . . , CK of the nodes

◮ Objective: Minimizing

J =

  • k
  • i∈Ck

||xi − µk||2 with µk = 1 |Ck|

  • i∈Ck

xi

◮ A combinatorial problem (NP-hard)

slide-94
SLIDE 94

The K-means algorithm

Algorithm

Input: K, number of clusters Init µ1, . . . , µK arbitrarily Repeat until convergence:

◮ for each k, Ck ← closest points of µk ◮ for each k, µk ← centroid of Ck

Output: Clusters C1, . . . , CK

◮ Convergence in finite time ◮ Local optimum, that depends on the initial values of

µ1, . . . , µK

slide-95
SLIDE 95

Back to random walks

Observing that J =

  • k

1 2|Ck|

  • i,j∈Ck

||xi − xj||2 the cost function J is, up to a factor n/2:

◮ the mean square distance of a random point to another

random point of the same cluster

◮ the mean commute time of the random walk between a

random node and another node taken uniformly at random in the same cluster

slide-96
SLIDE 96

Modularity

◮ Given some clustering C, let

Q =

  • i,j

πi(Pij − πj)δC

i,j

where δC

i,j =

1 if i, j are in the same cluster

  • therwise
slide-97
SLIDE 97

Modularity

◮ Given some clustering C, let

Q =

  • i,j

πi(Pij − πj)δC

i,j

where δC

i,j =

1 if i, j are in the same cluster

  • therwise

◮ Then Q is the difference between the probabilities that

(1) two successive nodes of the random walk are in the same cluster (2) two independent random walks are in the same cluster

◮ Maximizing Q is NP-hard

slide-98
SLIDE 98

The Louvain algorithm

Algorithm

Init each node in its own cluster Repeat until convergence:

◮ while Q increases, change the cluster of any node to one of its

neighbors

◮ aggregate all nodes belonging to the same cluster in a single

node Output: Clusters

◮ Convergence in finite time ◮ Local optimum, that depends on the order in which nodes are

considered

slide-99
SLIDE 99

Summary

◮ Random walks in graphs provide efficient techniques for

ranking and clustering nodes

◮ In the lab session, you will learn to apply these techniques to

real graphs using the Python networkx package

Myriel Napoleon Mlle Baptistine Mme Magloire Countess de Lo Geborand Champtercier Cravatte Count Old man Labarre Valjean Marguerite Mme Der Isabeau Gervais Tholomyes Listolier Fameuil Blacheville Favourite Dahlia Zephine Fantine Mme Thenardier Thenardier Cosette Javert Fauchelevent Bamatabois Perpetue Simplice Scaufflaire Woman1 Judge Champmathieu Brevet Chenildieu Cochepaille Pontmercy Boulatruelle Eponine Anzelma Woman2 MotherInnocent Gribier Jondrette Mme Burgon Gavroche Gillenormand Magnon Mlle Gillenormand Mme Pontmercy Mlle Vaubois Lt Gillenormand Marius Baroness Mabeuf Enjolras Combeferre Prouvaire Feuilly Courfeyrac Bahorel Bossuet Joly Grantaire MotherPlutarch Gueulemer Babet Claquesous Montparnasse Toussaint Child1 Child2 Brujon Mme Hucheloup