Clustering in Popularity Adjusted Stochastic Block Model Majid - - PowerPoint PPT Presentation

clustering in popularity adjusted stochastic block model
SMART_READER_LITE
LIVE PREVIEW

Clustering in Popularity Adjusted Stochastic Block Model Majid - - PowerPoint PPT Presentation

Clustering in Popularity Adjusted Stochastic Block Model Majid Noroozi and Marianna Pensky Department of Mathematics University of Central Florida Majid Noroozi (UCF) Clustering in PABM 1 / 25 Introduction Clustering is a central problem in


slide-1
SLIDE 1

Clustering in Popularity Adjusted Stochastic Block Model

Majid Noroozi and Marianna Pensky

Department of Mathematics University of Central Florida

Majid Noroozi (UCF) Clustering in PABM 1 / 25

slide-2
SLIDE 2

Introduction

Clustering is a central problem in machine learning and data mining A vast amount of data sets can be represented as a network of interacting items One of the first features of interest in such networks is to understand which items are alike and clustering is used in particular to do that

Majid Noroozi (UCF) Clustering in PABM 2 / 25

slide-3
SLIDE 3

Introduction

Clustering is a central problem in machine learning and data mining A vast amount of data sets can be represented as a network of interacting items One of the first features of interest in such networks is to understand which items are alike and clustering is used in particular to do that Source: Abbe, ”Community detection and stochastic block models: recent developments.”

Majid Noroozi (UCF) Clustering in PABM 2 / 25

slide-4
SLIDE 4

Introduction

Network Node Link Network Type Citation Network Papers Citations Directed Email Email Emails Directed Addresses Social Users Interactions Undirected Network Coauthorship Research Coauthor a paper Undirected Network Scientists

Table: Some examples of real networks

Majid Noroozi (UCF) Clustering in PABM 3 / 25

slide-5
SLIDE 5

Random Graph Models

Majid Noroozi (UCF) Clustering in PABM 4 / 25

slide-6
SLIDE 6

Random Graph Models

Let A ∈ {0, 1}n×n be the symmetric adjacency matrix of the network with Ai,j = 1 if there is a connection between nodes i and j, and Ai,j = 0

  • therwise.

Assume that Ai,j ∼ Bernoulli(Pi,j), 1 ≤ i ≤ j ≤ n, where Ai,j are conditionally independent given Pi,j and Ai,j = Aj,i, Pi,j = Pj,i for i > j.

Majid Noroozi (UCF) Clustering in PABM 5 / 25

slide-7
SLIDE 7

Community structure in networks

The block models assume that each node i belongs to one of K distinct blocks or communities Nk, k = 1, · · · , K Community assignment is described by a function c : {1, . . . , n} → {1, . . . , K} where c(i) = k if i ∈ Nk Alternatively, one considers a corresponding membership (or clustering matrix Z ∈ {0, 1}n×K such that Zi,k = 1 iff i ∈ Nk, i = 1, . . . , n, k = 1, · · · , K The popularity of nodes across communities defined as the number of edges between a specific node and a specific community

Majid Noroozi (UCF) Clustering in PABM 6 / 25

slide-8
SLIDE 8

Stochastic Block Model (SBM)

A classical random graph model for networks with community structure is the Stochastic Block Model (SBM) Under this model, the probability of connection between nodes is completely defined by the communities to which they belong

Majid Noroozi (UCF) Clustering in PABM 7 / 25

slide-9
SLIDE 9

Stochastic Block Model (SBM)

A classical random graph model for networks with community structure is the Stochastic Block Model (SBM) Under this model, the probability of connection between nodes is completely defined by the communities to which they belong P(Z, K) =       b1,11n11T

n1

b1,21n11T

n2

· · · b1,K1n11T

nK

b2,11n21T

n1

b2,21n21T

n2

· · · b2,K1n21T

nK

. . . . . . · · · . . . bK,11nK 1T

n1

bK,21nK 1T

n2

· · · bK,K1nK 1T

nK

      where P(Z, K) is a rearranged version of matrix P where its first n1 rows correspond to nodes from class 1, the next n2 rows correspond to nodes from class 2 and the last nK rows correspond to nodes from class K.

Majid Noroozi (UCF) Clustering in PABM 7 / 25

slide-10
SLIDE 10

Degree Corrected Block Model (DCBM)

Since the real-life networks usually contain a very small number of high-degree nodes while the rest of the nodes have very few connections (low degree), the SBM model fails to explain the structure of many networks that occur in practice

Majid Noroozi (UCF) Clustering in PABM 8 / 25

slide-11
SLIDE 11

Degree Corrected Block Model (DCBM)

Since the real-life networks usually contain a very small number of high-degree nodes while the rest of the nodes have very few connections (low degree), the SBM model fails to explain the structure of many networks that occur in practice The Degree-Corrected Block Model (DCBM) addresses this deficiency by allowing these probabilities to be multiplied by the node-dependent weights

Majid Noroozi (UCF) Clustering in PABM 8 / 25

slide-12
SLIDE 12

Degree Corrected Block Model (DCBM)

Since the real-life networks usually contain a very small number of high-degree nodes while the rest of the nodes have very few connections (low degree), the SBM model fails to explain the structure of many networks that occur in practice The Degree-Corrected Block Model (DCBM) addresses this deficiency by allowing these probabilities to be multiplied by the node-dependent weights P(Z, K) =       b1,1θn1θT

n1

b1,2θn1θT

n2

· · · b1,Kθn1θT

nK

b2,1θn2θT

n1

b2,2θn2θT

n2

· · · b2,Kθn2θT

nK

. . . . . . · · · . . . bK,1θnK θT

n1

bK,2θnK θT

n2

· · · bK,KθnK θT

nK

     

Majid Noroozi (UCF) Clustering in PABM 8 / 25

slide-13
SLIDE 13

Degree Corrected Block Model (DCBM)

DCBM allows a generous degree distribution in which nodes can have different expected degree Nodes that are popular have higher value of θ

Majid Noroozi (UCF) Clustering in PABM 9 / 25

slide-14
SLIDE 14

Degree Corrected Block Model (DCBM)

DCBM allows a generous degree distribution in which nodes can have different expected degree Nodes that are popular have higher value of θ DCBM fails to model node popularities in a flexible and realistic way since two nodes in the same community, the one with higher θ must be uniformly more popular in all communities

Majid Noroozi (UCF) Clustering in PABM 9 / 25

slide-15
SLIDE 15

Popularity Adjusted Block Model (PABM)

The Popularity Adjusted Stochastic Block Model (PABM) introduced by Sengupta and Chen (2018) generalizes the SBM and its general form the DCBM.

Majid Noroozi (UCF) Clustering in PABM 10 / 25

slide-16
SLIDE 16

Popularity Adjusted Block Model (PABM)

The Popularity Adjusted Stochastic Block Model (PABM) introduced by Sengupta and Chen (2018) generalizes the SBM and its general form the DCBM. For a K−block network, let Λn×K be popularity scaling parameters. Then for i < j, Pij = λicjλjci 0 ≤ Pij ≤ 1 for all i < j.

Majid Noroozi (UCF) Clustering in PABM 10 / 25

slide-17
SLIDE 17

Popularity Adjusted Block Model (PABM)

The Popularity Adjusted Stochastic Block Model (PABM) introduced by Sengupta and Chen (2018) generalizes the SBM and its general form the DCBM. For a K−block network, let Λn×K be popularity scaling parameters. Then for i < j, Pij = λicjλjci 0 ≤ Pij ≤ 1 for all i < j. P(Z, K) =       Λ(1,1)ΛT

(1,1)

Λ(1,2)ΛT

(2,1)

· · · Λ(1,K)ΛT

(K,1)

Λ(2,1)ΛT

(1,2)

Λ(2,2)ΛT

(2,2)

· · · Λ(2,K)ΛT

(K,2)

. . . . . . · · · . . . Λ(K,1)ΛT

(1,K)

Λ(K,2)ΛT

(2,K)

· · · Λ(K,K)ΛT

(K,K)

      where Λ =       Λ(1,1) Λ(1,2) · · · Λ(1,K) Λ(2,1) Λ(2,2) · · · Λ(2,K) . . . . . . · · · . . . Λ(K,1) Λ(K,2) · · · Λ(K,K)      

Majid Noroozi (UCF) Clustering in PABM 10 / 25

slide-18
SLIDE 18

Understanding the PABM

Left panel: matrix Λ; Λ(1,1) (red), Λ(2,1) (blue), Λ(1,2) (yellow), Λ(2,2) (purple). Right panel: assembling re-organized probability matrix P(Z, K); P(1,1)(Z, K) (red), P(2,1)(Z, K) (green), P(2,2)(Z, K) (purple).

Majid Noroozi (UCF) Clustering in PABM 11 / 25

slide-19
SLIDE 19

Understanding the PABM

Left panel: re-organized probability matrix P(Z, 2). Right panel: probability matrix P; community 1: nodes 1,3,4; community 2: nodes 2 and 5

Majid Noroozi (UCF) Clustering in PABM 12 / 25

slide-20
SLIDE 20

Subspace Clustering

Subspace clustering is designed for separation of points that lie in the union of subspaces. The matrix P is constructed by K clusters of columns (rows) that lie in the union of K distinct subspaces, each of the dimension K.

Majid Noroozi (UCF) Clustering in PABM 13 / 25

slide-21
SLIDE 21

Sparse Subspace Clustering (SSC)

If matrix P were known, the coefficient matrix W of the SSC would be based on writing every data point as a sparse linear combination of all

  • ther points by minimizing the number of nonzero coefficients

min

Wj Wj0

s.t (P)j =

  • k=j

Wkj(P)k

Majid Noroozi (UCF) Clustering in PABM 14 / 25

slide-22
SLIDE 22

Sparse Subspace Clustering (SSC)

If matrix P were known, the coefficient matrix W of the SSC would be based on writing every data point as a sparse linear combination of all

  • ther points by minimizing the number of nonzero coefficients

min

Wj Wj0

s.t (P)j =

  • k=j

Wkj(P)k The sparse coefficients of the contaminated data are found as a solution

  • f

min

Wj {Wj0 + γAj − AWj2 2}

s.t Wjj = 0, j = 1, ..., n

Majid Noroozi (UCF) Clustering in PABM 14 / 25

slide-23
SLIDE 23

Sparse Subspace Clustering (SSC)

If matrix P were known, the coefficient matrix W of the SSC would be based on writing every data point as a sparse linear combination of all

  • ther points by minimizing the number of nonzero coefficients

min

Wj Wj0

s.t (P)j =

  • k=j

Wkj(P)k The sparse coefficients of the contaminated data are found as a solution

  • f

min

Wj {Wj0 + γAj − AWj2 2}

s.t Wjj = 0, j = 1, ..., n Problem above can be rewritten in an equivalent form as min

Wj Aj − AWj2 2

s.t Wj0 ≤ L, Wjj = 0, j = 1, ..., n (1) where L is a parameter for the maximum number of nonzero elements in each column of W.

Majid Noroozi (UCF) Clustering in PABM 14 / 25

slide-24
SLIDE 24

Sparse Subspace Clustering (SSC)

If matrix P were known, the coefficient matrix W of the SSC would be based on writing every data point as a sparse linear combination of all

  • ther points by minimizing the number of nonzero coefficients

min

Wj Wj0

s.t (P)j =

  • k=j

Wkj(P)k The sparse coefficients of the contaminated data are found as a solution

  • f

min

Wj {Wj0 + γAj − AWj2 2}

s.t Wjj = 0, j = 1, ..., n Problem above can be rewritten in an equivalent form as min

Wj Aj − AWj2 2

s.t Wj0 ≤ L, Wjj = 0, j = 1, ..., n (1) where L is a parameter for the maximum number of nonzero elements in each column of W. Given W, the affinity matrix is defined as |W| + |W T|. The class assignment (clustering matrix) Z is then obtained by applying spectral clustering to |W| + |W T|.

Majid Noroozi (UCF) Clustering in PABM 14 / 25

slide-25
SLIDE 25

Simulation on synthetic networks

We generate networks with a symmetric probability matrix P given by the PABM with a clustering matrix Z and a block matrix Λ.

Majid Noroozi (UCF) Clustering in PABM 15 / 25

slide-26
SLIDE 26

Simulation on synthetic networks

We generate networks with a symmetric probability matrix P given by the PABM with a clustering matrix Z and a block matrix Λ. Assume that the number of communities (clusters) K is known Consider a perfectly balanced model with n/K nodes in each cluster Multiply the non-diagonal blocks of Λ by ω, 0 < ω < 1, to ensure that most nodes in the same community have larger probability of interactions

Majid Noroozi (UCF) Clustering in PABM 15 / 25

slide-27
SLIDE 27

Simulation on synthetic networks

Estimating P: Evaluate matrix W using (1) Obtain the clustering matrix ˆ Z by applying spectral clustering to |W| + |W T| Given ˆ Z, generate matrix A(ˆ Z) = PT

ˆ Z APˆ Z with blocks A(k,l)(ˆ

Z), k, l = 1, . . . , K, Obtain ˆ Θ(k,l)(ˆ Z, ˆ K) by using the rank one approximation for each of the blocks Finally, estimate matrix P by ˆ P = ˆ P(ˆ Z, ˆ K) using formula ˆ P(ˆ Z, ˆ K) = Pˆ

Z,ˆ K ˆ

Θ(ˆ Z, ˆ K)PT

ˆ Z,ˆ K with ˆ

K = K.

Majid Noroozi (UCF) Clustering in PABM 16 / 25

slide-28
SLIDE 28

Simulation on synthetic networks

The average estimation error RF(ˆ P, P) and the average clustering error ERR(ˆ Z, Z) are evaluated by RF(ˆ P, P) = 1 n2 ˆ P − P2

F,

ERR(ˆ Z, Z) = min

Π

1 n

n

  • i=1

1(Π(ˆ

Z)=Z)

where P is the true probability matrix, Z, ˆ Z ∈ {1, 2, ..., K}n are the true and the estimated clustering matrices and the minimum is taken with respect to all permutations Π : {1, ..., K}n → {1, ..., K}n.

Majid Noroozi (UCF) Clustering in PABM 17 / 25

slide-29
SLIDE 29

Simulation on synthetic networks

300 350 400 450 500 0.05 0.1 0.15 0.2 0.25 n Clustering Error SSC for =0.5 SSC for =0.7 SSC for =0.9 SC for =0.5 SC for =0.7 SC for =0.9 300 350 400 450 500 4 6 8 10 12 14 16 x 10-3 n Estimation Error (Frobenius)

Figure: The clustering (top) and the estimation (bottom) errors for K = 4 clusters.

Majid Noroozi (UCF) Clustering in PABM 18 / 25

slide-30
SLIDE 30

Simulation on synthetic networks

300 350 400 450 500 0.05 0.1 0.15 0.2 0.25 n Clustering Error SSC for =0.5 SSC for =0.7 SSC for =0.9 SC for =0.5 SC for =0.7 SC for =0.9 300 350 400 450 500 4 6 8 10 12 14 16 x 10-3 n Estimation Error (Frobenius)

Figure: The clustering (top) and the estimation (bottom) errors for K = 5 clusters.

Majid Noroozi (UCF) Clustering in PABM 19 / 25

slide-31
SLIDE 31

Simulation on synthetic networks

50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400 nz = 17752 50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400 nz = 27408 50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400 nz = 36716

Figure: Adjacency matrices of networks with n = 420. K = 4, and ω = 0.5, 0.7, 0.9

Majid Noroozi (UCF) Clustering in PABM 20 / 25

slide-32
SLIDE 32

Simulation on synthetic networks

Using ˆ P, ˆ K, the estimator of K, can be found by solving the following

  • ptimization problem:

ˆ K = argmin

K

{ˆ P − A2

F + Pen(n, K)}

with Pen(n, K) = ρ(A)nK

  • ln n (ln K)3

where ρ(A) is the density of matrix A, the proportion of nonzero entries of A.

Majid Noroozi (UCF) Clustering in PABM 21 / 25

slide-33
SLIDE 33

Simulation on synthetic networks

n=420 n=540 K ˆ K ω = 0.5 ω = 0.7 ω = 0.9 ω = 0.5 ω = 0.7 ω = 0.9 2 3 0.06 0.14 0.04 4 4 0.64 0.66 0.96 0.8 0.76 0.96 5 0.28 0.16 0.04 0.12 0.22 0.04 6 0.02 0.04 0.04 0.02

Table: Probability of finding the true number of clusters.

Majid Noroozi (UCF) Clustering in PABM 22 / 25

slide-34
SLIDE 34

Simulation on synthetic networks

n=420 n=540 K ˆ K ω = 0.5 ω = 0.7 ω = 0.9 ω = 0.5 ω = 0.7 ω = 0.9 2 0.02 3 0.02 0.02 0.04 5 4 0.14 0.16 0.04 0.12 0.1 5 0.64 0.66 0.82 0.76 0.74 0.96 6 0.2 0.16 0.12 0.12 0.12 0.04

Table: Probability of finding the true number of clusters.

Majid Noroozi (UCF) Clustering in PABM 23 / 25

slide-35
SLIDE 35

A real network example

Figure: A butterfly similarity network extracted from the Leeds Butterfly dataset described in Wang et al. (2018).

Majid Noroozi (UCF) Clustering in PABM 24 / 25

slide-36
SLIDE 36

A real network example

Figure: A butterfly similarity network extracted from the Leeds Butterfly dataset described in Wang et al. (2018).

The SSC provides 89% accuracy while SC is correct only in 64% of cases.

Majid Noroozi (UCF) Clustering in PABM 24 / 25

slide-37
SLIDE 37

References

  • M. Noroozi, R. Rimal, M. Pensky. Estimation and Clustering in Popularity

Adjusted Stochastic Block Model. arXiv:1902.00431, 2019.

  • S. Sengupta and Y. Chen. A blockmodel for node popularity in networks with

community structure. Journal of the Royal Statistical Society, 2018

  • R. Vidal. Subspace clustering. IEEE Signal Processing Magazine,

28(3):5268, March 2011. 1

  • B. Wang, A. Pourshafeie, M. Zitnik, J. Zhu, C. D. Bustamante, S. Batzoglou. J.
  • Leskovec. Network enhancement as a general method to denoise weighted

biological networks. Nature Communications, 2018.

Majid Noroozi (UCF) Clustering in PABM 25 / 25