Mining Attributed Networks Part 1 Introduction Rushed Kanawati, - - PDF document

mining attributed networks
SMART_READER_LITE
LIVE PREVIEW

Mining Attributed Networks Part 1 Introduction Rushed Kanawati, - - PDF document

Overview Complex Network Analysis Outlook Mining Attributed Networks Part 1 Introduction Rushed Kanawati, Martin Atzmueller A 3 , Universit e Sorbonne Paris Cit e, France CSAI, Tilburg University, Netherlands DSAA17, Tokyo 20


slide-1
SLIDE 1

Overview Complex Network Analysis Outlook

Mining Attributed Networks

Part 1 – Introduction Rushed Kanawati, Martin Atzmueller

A3, Universit´ e Sorbonne Paris Cit´ e, France CSAI, Tilburg University, Netherlands DSAA’17, Tokyo – 20 October 2017

1 / 45

slide-2
SLIDE 2

Overview Complex Network Analysis Outlook

OUTLINE

1 Overview: Complex Networks 2 Complex Network Analysis 1 Centralities, Roles & Similarities 2 Community Detection 3 Link Prediction 3 Outlook: Alternative Network Models

2 / 45

slide-3
SLIDE 3

Overview Complex Network Analysis Outlook

COMPLEX NETWORKS

Definition Graphs abstracting, directly or indirectly, interactions in real-world systems. Basic topological features

I Low Density I Small Diameter I Scale-free I High Clustering Coefficient

3 / 45

slide-4
SLIDE 4

Overview Complex Network Analysis Outlook

SCALE-FREE

4 / 45

slide-5
SLIDE 5

Overview Complex Network Analysis Outlook

CLUSTERING COEFFICIENT

Definitions I Global version : 3⇥4

^

I Local on node v : # of links between neighbors of v # Potential links between neighbors of v ⌅ CC = 3⇥1

8

⌅ CCv=1 = 1

6

5 / 45

slide-6
SLIDE 6

Overview Complex Network Analysis Outlook

NOTATION

A graph G =< V, E ✓ V ⇥ V > I V: set of nodes (a.k.a. vertices, actors, sites) I E: set of edges (a.k.a. ties, links, bonds) Notations I AG Adjacency Matrix d : aij 6= 0 iff (vi, vj) 2 E, 0 otherwise. I n = |V| I m = |E|. Often m ⇠ n I Γ(v) : neighbors of node v. Γ(v) = {x 2 V : (x, v) 2 E}. I Node degree : d(v) =k Γ(v) k

6 / 45

slide-7
SLIDE 7

Overview Complex Network Analysis Outlook

COMPLEX NETWORKS : EXAMPLE I

Direct interactions

Density : 2−3 Diameter : 28 Clustering coeff.: 0.47

Greatest connected component of Wikipedia 7 / 45

slide-8
SLIDE 8

Overview Complex Network Analysis Outlook

COMPLEX NETWORKS : EXAMPLE II

Indirect interactions

Density : 10−4 Diameter : 24 Clustering coeff.: 0.67

Co-authorship network - DBLP 1980-1984 (for authors active for more than 10 years) 8 / 45

slide-9
SLIDE 9

Overview Complex Network Analysis Outlook

SIMILARITY GRAPHS

⌅ ✏-neighborhood graph : u, v are linked iff d(u, v)  ✏ ⌅ k-nearest neighbor graph : each node is connected to k nearest nodes. ⌅ Relative neighborhood graph : u, v are linked iff d(u, v)  maxx{d(v, x), d(u, x)}, 8x 6= u, v

9 / 45

slide-10
SLIDE 10

Overview Complex Network Analysis Outlook

SIMILARITY GRAPHS

Figure:RNG graph Figure: ✏-threshold graph Figure:Knn graph

10 / 45

slide-11
SLIDE 11

Overview Complex Network Analysis Outlook

COMPLEX NETWORKS 6= RANDOM GRAPHS

Erd¨

  • s-R´

enyi model Rule : for n nodes, generate edges (i,j) randomly and independently with probability p ⌅ Low density ⌅ Small diameter ⌅ Clustering coefficient ⇠ p ⌅ not scale-free

11 / 45

slide-12
SLIDE 12

Overview Complex Network Analysis Outlook

COMPLEX NETWORKS 6= RANDOM GRAPHS

Small-world graphs ⌅ Low density ⌅ Small diameter ⌅ High clustering coefficient ⌅ not scale-free

12 / 45

slide-13
SLIDE 13

Overview Complex Network Analysis Outlook

COMPLEX NETWORKS 6= RANDOM GRAPHS

Preferential attachement Rule : New nodes prefer to connect to high degree nodes ⌅ Low density ⌅ Small diameter ⌅ Clustering coefficient ! 0 ⌅ Scale-free

13 / 45

slide-14
SLIDE 14

Overview Complex Network Analysis Outlook

COMPLEX NETWORK ANALYSIS

A lot of interesting results using a simple model: interacting nodes ⌅ Node related analysis tasks Centralities, Roles, Similarities ⌅ Community detection Local, disjoint, overlapping, hierarchical ⌅ Network evolution mining Link prediction

14 / 45

slide-15
SLIDE 15

Overview Complex Network Analysis Outlook

CENTRALITIES

Nodes centralities

Degree centrality example

I Importance and/or influence of a node I PageRank, HITS : web search results ranking I FolkRank: Tag recommendation I Betweenness, Closeness: Importance of actors in a social network I Viral maketing

15 / 45

slide-16
SLIDE 16

Overview Complex Network Analysis Outlook

ROLES

Node Roles

red nodes are star centers, blues are in-clique and others are peripherial

I Nodes having a similar structural behavior I Centers of stars, in cliques, peripheral nodes, . . . I Role query, Role outliers, Role dynamics, Role transfer, . . . I See Henderson et al. KDD 2012

16 / 45

slide-17
SLIDE 17

Overview Complex Network Analysis Outlook

NODE SIMILARITY

I A node is similar to itself I Neighborhood-based similarity: Two nodes are similar if they are connected to the same nodes. I Distance-based similarity: Two nodes are similar if they are connected by short paths. I Structural-similarity : Two nodes are similar if each one is similar to the neighbors of the other.

17 / 45

slide-18
SLIDE 18

Overview Complex Network Analysis Outlook

COMMUNITY DETECTION

Overview I Given a network/graph, find “modules” I Single network I Multiplex networks I Attributed networks I Community structures I Graph Clustering/disjoint communities I Hierarchical organization I Overlapping communities I Questions: I What is ”a community”? I What are ”good” communities? I How do we evaluate these?

18 / 45

slide-19
SLIDE 19

Overview Complex Network Analysis Outlook

COMMUNITY DETECTION

Definitions I A dense subgraph loosely coupled to other modules in the network I A community is a set of nodes seen as “one” by nodes outside of the community I A subgraph where almost all nodes are linked to other nodes in the community. I . . .

19 / 45

slide-20
SLIDE 20

Overview Complex Network Analysis Outlook

COMMUNITY DETECTION

Applications I Finding similar items (documents, users, queries, etc.) I Collaborative filtering I Network visualisation I Computation distribution I . . .

20 / 45

slide-21
SLIDE 21

Overview Complex Network Analysis Outlook

LOCAL COMMUNITY

21 / 45

slide-22
SLIDE 22

Overview Complex Network Analysis Outlook

LOCAL COMMUNITY

22 / 45

slide-23
SLIDE 23

Overview Complex Network Analysis Outlook

QUALITY FUNCTIONS : EXAMPLES I

Local modularity R [Cla05] R =

Bin Bin+Bout

Local modularity M [LWP08] M = Din

Dout

Local modularity L [CZG09] L = Lin

Lex where : Lin = P

i∈D

kΓ(i)\Dk kDk

, Lex =

P

i∈B

kΓ(i)\Sk kBk

And many many others . . . [YL12]

23 / 45

slide-24
SLIDE 24

Overview Complex Network Analysis Outlook

GLOBAL COMMUNITY DETECTION

I Communities can also be defined with respect to the whole graph I Graph has community structure, if its structure is different from a random graph I Random graph: Not expected to have community structure I Here: Any two vertices have the same probability to be adjacent I Define null model; use it for investigating if we can observe community structure in a graph

24 / 45

slide-25
SLIDE 25

Overview Complex Network Analysis Outlook

GLOBAL COMMUNITY DETECTION

Problem I Divide the set of nodes in a number of (overlapping) subsets such that induced subgraphs are dense and loosely coupled. Recommended readings I S. Fortunato. Community detection in graphs. Physics Reports, 2010, 486, 75-174 I L. Tang, H. Liu. Community Detection and Mining in Social Media, Morgan & Claypool Publishers, 2010

25 / 45

slide-26
SLIDE 26

Overview Complex Network Analysis Outlook

COMMUNITIES DETECTION: METHODS

Classification I Group-based Nodes are grouped in function of shared topological features (ex. clique) I Network-based Clustering, Graph-cut, block-models, modularity optimization I Propagation-based Unstable approaches, good when used with ensemble clustering I Seed-centric from local community identification to global community detection I Descriptive Community Detection Identifies communities and description (for attributed network).

26 / 45

slide-27
SLIDE 27

Overview Complex Network Analysis Outlook

GROUP-BASED APPROACHES

Principle Search for special (dense) subgraphs: I k-clique I n-clique I -dense clique I K-core

27 / 45

slide-28
SLIDE 28

Overview Complex Network Analysis Outlook

GROUP-BASED APPROACHES

from Symeon Papadopoulos, Community Detection in Social Media, CERTH-ITI, 22 June 2011

28 / 45

slide-29
SLIDE 29

Overview Complex Network Analysis Outlook

QUASI-CLIQUE

Search for special (dense) subgraphs: I Generalize clique to dense subgraph I Different definitions (degree, density) I Subset of nodes is quasi-clique, if I Nodal degree: every node in induced subgraph is adjacent to at least (n - 1) other nodes in the subgraph I Edge density: Number of edges in subgraph is at least n(n - 1)/2 (with n : number of nodes in subgraph)

29 / 45

slide-30
SLIDE 30

Overview Complex Network Analysis Outlook

NETWORK-BASED APPROACHES

Clustering approaches I Apply classical clustering approaches using graph-based distance function I Different types of Graph-based distances: neighborhood-based, path-based (Random-walk) I Usually requires the number of clusters to discover

30 / 45

slide-31
SLIDE 31

Overview Complex Network Analysis Outlook

MODULARITY OPTIMIZATION APPROACHES

Modularity: a partition quality criteria Q(P) = 1 2m X

c2P

X

i,j2c

(Aij didj 2m ) (1)

Figure: Example : Q = 0.31

31 / 45

slide-32
SLIDE 32

Overview Complex Network Analysis Outlook

MODULARITY OPTIMIZATION APPROACHES

I Applying classical optimization algorithms (ex. Genetic algorithms [Piz12]). I Applying hierarchical clustering and select the level with Qmax (ex. Walktrap [PL06]) I Divisive approach : Girvan-Newman algorithm [GN02] I Greedy optimization : Louvain algorithm [BGL08] I . . .

32 / 45

slide-33
SLIDE 33

Overview Complex Network Analysis Outlook

MODULARITY OPTIMIZATION LIMITATIONS

Hypothesis The best partition of a graph is the one that maximizes the modularity. If a network has a community structure, then it is possible to find a precise partition with maximal modularity If a network has a community structure, then partitions inducing high modularity values are structurally similar. All three hypothesis do not hold [GdMC10, LF11].

33 / 45

slide-34
SLIDE 34

Overview Complex Network Analysis Outlook

PROPAGATION-BASED APPROACHES

Algorithm 1 Label propagation Require: G =< V, E > a connected graph,

1: Initialize each node with unique label lv 2: while Labels are not stable do 3:

for v 2 V do lv = arg max

l

|Γl(v)| /* random tie-breaking */

4:

end for

5: end while 6: return communities from labels

Γl(v) : set of neighbors having label l

34 / 45

slide-35
SLIDE 35

Overview Complex Network Analysis Outlook

LABEL PROPAGATION

Advantages I Complexity : O(m) I Highly parallel Disadvantages I No convergence guarantee, oscillation phenomena I Low robustness Different runs yield very different community structure due to randomness

35 / 45

slide-36
SLIDE 36

Overview Complex Network Analysis Outlook

SEED-CENTRIC ALGORITHMS

(KANAWATI, SCSM’2014)

Algorithm 2 General seed-centric community detection algorithm Require: G =< V, E > a connected graph,

1: C ; 2: S compute seeds(G) 3: for s 2 S do 4:

Cs compute local com(s,G)

5:

C C + Cs

6: end for 7: return compute community(C)

36 / 45

slide-37
SLIDE 37

Overview Complex Network Analysis Outlook

LINK PREDICTION

Link predction I Structural Find hidden/missing links in a network

  • ex. Missing links in Wikipedia

I Temporal Predicting new links to appear at time tp based

  • n the network state at instants t < tp

Readings I M. Pujari et. al., Link prediction in complex networks, chapter 3, in Advanced methods for complex networks Analysis, N. Meghanthan (Editor), IGI publishing, 2016.

37 / 45

slide-38
SLIDE 38

Overview Complex Network Analysis Outlook

OUTLOOK: ALTERNATIVE NETWORK MODELS

Network science is mature enough to a move towards more complex, expressive models ⌅ K-partite networks

38 / 45

slide-39
SLIDE 39

Overview Complex Network Analysis Outlook

ALTERNATIVE NETWORK MODELS

Network science is mature enough to a move towards more complex, expressive models ⌅ K-partite networks ⌅ Dynamic networks

39 / 45

slide-40
SLIDE 40

Overview Complex Network Analysis Outlook

ALTERNATIVE NETWORK MODELS

Network science is mature enough to a move towards more complex, expressive models ⌅ K-partite networks ⌅ Dynamic networks ⌅ Heterogeneous networks ⌅ Multiplex networks

40 / 45

slide-41
SLIDE 41

Overview Complex Network Analysis Outlook

ALTERNATIVE NETWORK MODELS

Network science is mature enough to a move towards more complex, expressive models ⌅ K-partite networks ⌅ Dynamic networks ⌅ Heterogeneous networks ⌅ Multiplex networks ⌅ Attributed networks

41 / 45

slide-42
SLIDE 42

Overview Complex Network Analysis Outlook

ALTERNATIVE NETWORK MODELS

Network science is mature enough to a move towards more complex, expressive models ⌅ K-partite networks ⌅ Dynamic networks ⌅ Heterogeneous networks ⌅ Multiplex networks ⌅ Attributed networks Next: A powerful model : Multiplex Network

42 / 45

slide-43
SLIDE 43

Overview Complex Network Analysis Outlook

BIBLIOGRAPHY I

Vincent D Blondel, Jean-loup Guillaume, and Etienne Lefebvre, Fast unfolding of communities in large networks, Journal of Statistical Mechanics: Theory and Experiment 2008 (2008), P10008. Aaron Clauset, Finding local community structure in networks, Physical Review E (2005). Jiyang Chen, Osmar R. Za¨ ıane, and Randy Goebel, Local community identification in social networks, ASONAM, 2009,

  • pp. 237–242.
  • B. H. Good, Y.-A. de Montjoye, and A. Clauset., The performance of modularity maximization in practical contexts., Physical

Review E (2010), no. 81, 046106.

  • M. Girvan and M. E. J. Newman, Community structure in social and biological networks, PNAS 99 (2002), no. 12, 7821–7826.

Andrea Lancichinetti and Santo Fortunato, Limits of modularity maximization in community detection, CoRR abs/1107.1 (2011). 43 / 45

slide-44
SLIDE 44

Overview Complex Network Analysis Outlook

BIBLIOGRAPHY II

Feng Luo, James Zijun Wang, and Eric Promislow, Exploring local community structures in large networks, Web Intelligence and Agent Systems 6 (2008), no. 4, 387–400. Clara Pizzuti, A multiobjective genetic algorithm to find communities in complex networks, IEEE Trans. Evolutionary Computation 16 (2012), no. 3, 418–430. Pascal Pons and Matthieu Latapy, Computing communities in large networks using random walks, J. Graph Algorithms Appl. 10 (2006), no. 2, 191–218. Jaewon Yang and Jure Leskovec, Defining and evaluating network communities based on ground-truth, ICDM (Mohammed Javeed Zaki, Arno Siebes, Jeffrey Xu Yu, Bart Goethals, Geoffrey I. Webb, and Xindong Wu, eds.), IEEE Computer Society, 2012, pp. 745–754. 44 / 45

slide-45
SLIDE 45

Overview Analysis Conclusion

Mining Attributed Networks

Part 2 – Analysis of Multiplex Networks Rushed Kanawati, Martin Atzmueller

A3, Universit´ e Sorbonne Paris Cit´ e, France CSAI, Tilburg University, Netherlands DSAA’17, Tokyo – 20 October 2017

1 / 49

slide-46
SLIDE 46

Overview Analysis Conclusion

OUTLINE

1 Multiplex networks: Overview & Definitions 2 Analysis of multiplex networks 1 Network Measures 2 Analysis tasks 3 Evaluation 3 Conclusions

2 / 49

slide-47
SLIDE 47

Overview Analysis Conclusion

MULTIPLEX NETWORK: DEFINITIONS

G =< V, E, C >

from [Mucha et. al., 2010]

I V set of nodes I E = {E1, . . . , Eα} : 8k 2 [1, α]Ek ✓ V ⇥ V I C Layer coupling links Coupling I Ordinal Coupling : Diagonal inter-layer links among consecutive layers. I Categorical Coupling : Diagonal inter-layer links between all pairs of layers. I Generalized coupling ? Ex. Decay functions

3 / 49

slide-48
SLIDE 48

Overview Analysis Conclusion

NOTATION

Notation

I A[k] Adjacency Matrix of slice k : a [k]

ij 6= 0 iff (vi, vj) 2 Ek, 0 otherwise.

I m[k] = |Ek|. Often, we have m ⇠ n I Neighbors of v in slice k: Γ(v)[k] = {x 2 V : (x, v) 2 Ek}. I All neighbors of v : Γ(v)tot = [s∈{1,...,α}Γ(v)[s] I Node degree in slice k: dk

v =k Γ(v)[k] k

I Total degree of node v: dtot

v

= ||Γtot(v)||

4 / 49

slide-49
SLIDE 49

Overview Analysis Conclusion

MULTIPLEX NETWORKS: RELATED TERMS

Recommended readings / S. Mikko Kivel¨ a et. al.. Multilayer Networks. arXiv:1309.7233, March 2014

5 / 49

slide-50
SLIDE 50

Overview Analysis Conclusion

POWER OF THE MULTIPLEX MODEL

Multi-relational networks European airports network

6 / 49

slide-51
SLIDE 51

Overview Analysis Conclusion

POWER OF MULTIPLEX MODEL

Dynamic networks Academic collaborations per year

7 / 49

slide-52
SLIDE 52

Overview Analysis Conclusion

POWER OF THE MULTIPLEX MODEL

Heterogeneous networks DBLP author-centred multiplex network

8 / 49

slide-53
SLIDE 53

Overview Analysis Conclusion

MULTIPLEX NETWORKS : MEASURES

⌅ Need of generalization of the usual measures : Degree Neighborhood Centralities Paths and distances Clustering coefficient . . . ⌅ New layer-oriented questions to answer : Which layers determine the centrality of a user Which layers are relevant to measure the similarity of two nodes How one layer influence the evolution

  • f another

. . .

9 / 49

slide-54
SLIDE 54

Overview Analysis Conclusion

APPROACHES

1 Transformation into a monoplex centred problem

I Layer aggregation approaches. I Hypergraph transformation based approaches I Ensemble approaches

2 Generalization of monoplex oriented algorithms to multiplex networks

10 / 49

slide-55
SLIDE 55

Overview Analysis Conclusion

LAYER AGGREGATION

11 / 49

slide-56
SLIDE 56

Overview Analysis Conclusion

LAYER AGGREGATION

Aggregation functions

Aij = 8 < : 1 91  l  α : A [l]

ij 6= 0

  • therwise

Aij =k {d : A[d]

ij

6= 0} k Aij = 1 α

α

X

k=1

wkA[k]

ij

Aij = sim(vi, vj)

12 / 49

slide-57
SLIDE 57

Overview Analysis Conclusion

K-UNIFORM HYPERGRAPH TRANSFORMATION

Principle I A k-uniform hypergraph is a hypergraph in which the cardinality

  • f each hyperedge is exactly k

I Mapping a multiplex to a 3-uniform hypergraph H = (V, E) such that : V = V [ {1, . . . , α} (u, v, i) 2 E if 9l : A [l]

uv 6= 0, u, v 2 V, i 2 {1, . . . , α}

I Apply hypergraphs analysis approaches (Ex. tensor-based approaches)

13 / 49

slide-58
SLIDE 58

Overview Analysis Conclusion

MULTIPLEX: NODE NEIGHBORHOOD

Some options I Γmux(v) = [α

k=1Γk(v)

I Γmux(v) = \α

k=1Γk(v)

I Γmux(v) = {x 2 Γ(v)tot : sim(x, v) δ} δ 2 [0, 1] I Γmux(v) = {x 2 Γ(v)tot : Γ(v)tot\Γ(x)tot

Γ(v)tot[Γ(x)tot δ}

I . . .

14 / 49

slide-59
SLIDE 59

Overview Analysis Conclusion

PATHS, SHORTEST DISTANCE

Some options I Path in an aggregated network I daverage =

Pm

α=1 d(u,v)[α]

m

8u, v 2 V and (u, v) / 2 Ei. I path length(u, v) =< r1, r2, . . . , rα > where ri number of links in layer i I pathx(u, v) dominates pathy(u, v)9j : rx

j < ry j , 8k 6= j rx j  ry j

15 / 49

slide-60
SLIDE 60

Overview Analysis Conclusion

WHAT ABOUT COMMUNITIES?

What is a dense subgraph in a multiplex network ?

[BCG11] 16 / 49

slide-61
SLIDE 61

Overview Analysis Conclusion

COMMUNITY DETECTION IN MULTIPLEX NETWORKS

Approaches 1 Transformation into a monoplex community detection problem

I Layer aggregation approaches. I Multi-objective optimization approach. I Ensemble clustering approaches

2 Generalization of monoplex oriented algorithms to multiplex networks.

I Generalized-modularity optimization I Generalized info-map I Generalized walktrap I Seed-centric approaches

17 / 49

slide-62
SLIDE 62

Overview Analysis Conclusion

MULTI-OBJECTIVE OPTIMIZATION APPROACH [AP14]

1 Rank the set of α layers according to some importance criteria 2 C1 community(G[1]) 3 for i 2 [2, α] do: Ci optimize(community(G[i]), similarity(Ci1)) 4 return Cα

18 / 49

slide-63
SLIDE 63

Overview Analysis Conclusion

ENSEMBLE CLUSTERING APPROACHES

19 / 49

slide-64
SLIDE 64

Overview Analysis Conclusion

ENSEMBLE CLUSTERING APPROACHES

Ensemble Clustering [SG03] I CSPA: Cluster-based Similarity Partitioning Algorithm I HGPA: HyperGraph-Partitioning Algorithm I MCLA: Meta-Clustering Algorithm I . . .

20 / 49

slide-65
SLIDE 65

Overview Analysis Conclusion

ENSEMBLE CLUSTERING: APPROACHES

CSPA: Cluster-based Similarity Partitioning Algorithm I Let K be the number of basic models, Ci(x) be the cluster in model i to which x belongs. I Define a similarity graph on objects : sim(v, u) =

K

P

i=1

δ(Ci(v),Ci(u)) K

I Cluster the obtained graph : Isolate connected components after prunning edges Apply community detection approach I Complexity : O(n2kr) : n # objects, k # of clusters, r# of clustering solutions

21 / 49

slide-66
SLIDE 66

Overview Analysis Conclusion

CSPA : EXAMPLE

from Seifi, M. Cœurs stables de communaut´ es dans les graphes de terrain. Th` ese de l’universit´ e Paris 6, 2012 22 / 49

slide-67
SLIDE 67

Overview Analysis Conclusion

ENSEMBLE CLUSTERING : ILLUSTRATION I

23 / 49

slide-68
SLIDE 68

Overview Analysis Conclusion

ENSEMBLE CLUSTERING : ILLUSTRATION II

24 / 49

slide-69
SLIDE 69

Overview Analysis Conclusion

ENSEMBLE CLUSTERING : ILLUSTRATION III

25 / 49

slide-70
SLIDE 70

Overview Analysis Conclusion

ENSEMBLE CLUSTERING : ILLUSTRATION IV

26 / 49

slide-71
SLIDE 71

Overview Analysis Conclusion

ENSEMBLE CLUSTERING : ILLUSTRATION V

27 / 49

slide-72
SLIDE 72

Overview Analysis Conclusion

ENSEMBLE CLUSTERING : ILLUSTRATION VI

28 / 49

slide-73
SLIDE 73

Overview Analysis Conclusion

ENSEMBLE CLUSTERING : ILLUSTRATION VII

29 / 49

slide-74
SLIDE 74

Overview Analysis Conclusion

MULTIPLEX MODULARITY

Generalized modularity [MRM+10] I Qmultiplex(P) = 1 2µ X

c2P

X

i,j2c k,l:1!α

@ @A[k]

ij λk

d[k]

i d[k] j

2m[k] 1 A δkl + δijCkl

ij

1 A I µ = P

j2V k,l:1!α

m[k] + Cl

jk

I Ckl

ij Inter slice coupling = 0 8i 6= j

30 / 49

slide-75
SLIDE 75

Overview Analysis Conclusion

SEED-CENTRIC ALGORITHMS [KAN14]

Algorithm 1 General seed-centric community detection algorithm Require: G =< V, E > a connected graph,

1: C ; 2: S compute seeds(G) 3: for s 2 S do 4:

Cs compute local com(s,G)

5:

C C + Cs

6: end for 7: return compute community(C)

31 / 49

slide-76
SLIDE 76

Overview Analysis Conclusion

THE LICOD ALGORITHM [YK14]

1 Compute a set of seeds that are likely to be leaders in their communities

Heuristic : nodes having higher degree centralities than their neighbors

2 Each node in the graph ranks seeds in function of its own preference

In function of increasing shortest path (length)

3 Iterate till convergence: Each node modifies its preference vector in function of neighbor’s preferences

Applying rank aggregation methods.

32 / 49

slide-77
SLIDE 77

Overview Analysis Conclusion

MUXLICOD

Multiplex degree centrality [BNL13] dmultiplex

i

=

α

X

k=1

d[k]

i

d[tot]

i

log d[k]

i

d[tot]

i

! Multiplex shortest path SP(u, v)multiplex =

α

P

k=1

SP(u, v)[k] α Multiplex neighborhood Γmux(v) = {x 2 Γ(v)tot : Γ(v)tot \ Γ(x)tot Γ(v)tot [ Γ(x)tot δ}

33 / 49

slide-78
SLIDE 78

Overview Analysis Conclusion

RANK AGGREGATION

[PK12, DKNS01]

34 / 49

slide-79
SLIDE 79

Overview Analysis Conclusion

OTHER ALGORITHMS

1 Random walk based approach (Generalization of Walktrap [KM15] 2 Generalized infomap [DLAR15]

35 / 49

slide-80
SLIDE 80

Overview Analysis Conclusion

EVALUATION CRITERIA I

1 Multiplex modularity 2 Redundancy [BCG11] ρ(c) = X

(u,v)2 ¯ ¯ Pc

k {k : 9A[k]

uv 6= 0} k

α⇥ k Pc k ¯ ¯ P the set of couple (u, v) which are directly connected in at least two layers 3 Complementarity :γ(c) = Vc ⇥ εc ⇥ Hc

36 / 49

slide-81
SLIDE 81

Overview Analysis Conclusion

EVALUATION CRITERIA II

I Variety Vc : the proportion of occurrence of the community c across layers of the multiplex. Vc =

α

X

s=1

k9(i, j) 2 c/A[s]

ij 6= 0k

α 1 (1) I Exclusivity εc : number of pairs of nodes, in community c, that are connected exclusively in one layer. εc =

α

X

s=1

kPc,sk kPck (2)

37 / 49

slide-82
SLIDE 82

Overview Analysis Conclusion

EVALUATION CRITERIA III

I Homogeneity Hc : How uniform is the distribution of the number of edges, in the community c, per layer. Hc = ⇢ 1 if σc = 0 1

σc σmax

c

  • therwise

(3) with avgc =

α

X

s=1

kPc,sk α σc = v u u t

α

X

s=1

(kPc,sk avgc)2 α σmax

c

= r (max(k Pc,d k) min(k Pc,d k))2 2

38 / 49

slide-83
SLIDE 83

Overview Analysis Conclusion

DATASETS

Benchmark networks Lazzega Lawyer network #nodes 71 #layer 3

39 / 49

slide-84
SLIDE 84

Overview Analysis Conclusion

DATASETS

Dataset Physicians collaboration network #nodes 246 #layers 3

40 / 49

slide-85
SLIDE 85

Overview Analysis Conclusion

RESULTS: REDUNDANCY

41 / 49

slide-86
SLIDE 86

Overview Analysis Conclusion

RESULTS: COMPLEMENTARITY

42 / 49

slide-87
SLIDE 87

Overview Analysis Conclusion

RESULTS: MULTIPLEX MODULARITY

43 / 49

slide-88
SLIDE 88

Overview Analysis Conclusion

PARETO FRONT

44 / 49

slide-89
SLIDE 89

Overview Analysis Conclusion

LAZEGA DATASET: COMPARATIVE STUDY

Figure: NMI (lower triangular part) , adjusted Rand (upper triangular part).

45 / 49

slide-90
SLIDE 90

Overview Analysis Conclusion

CONCLUSIONS

I Multiplex networks provide a rich representation of real-world interaction systems I A lot of work to reformulate basic network concepts for multiplex settings, e.g. Roles, RandomWalk, PageRank, etc. I Community evaluation: still an open problem I Uncovered topics : Layer selection and compression, Co-evolution models, Dynamics on multiplex networks I Ideas under exploration: I Multiplex of multiplexes I Interactive Multiplex network visualisation. I Benchmarking of available tools

46 / 49

slide-91
SLIDE 91

Overview Analysis Conclusion

BIBLIOGRAPHY I

Alessia Amelio and Clara Pizzuti, Community detection in multidimensional networks, IEEE 26th International Conference on Tools with Artificial Intelligence, 2014, pp. 352–359. Michele Berlingerio, Michele Coscia, and Fosca Giannotti, Finding and characterizing communities in multidimensional networks, ASONAM, IEEE Computer Society, 2011, pp. 490–494. Federico Battiston, Vincenzo Nicosia, and Vito Latora, Metrics for the analysis of multiplex networks, CoRR abs/1308.3182 (2013). Cynthia Dwork, Ravi Kumar, Moni Naor, and D Sivakumar, Rank aggregation methods for the Web, WWW, 2001, pp. 613–622. Manlio De Domenico, Andrea Lancichinetti, Alex Arenas, and Martin Rosvall, Identifying modular flows on multilayer networks reveals highly overlapping organization in social systems, Phys. Rev 5 (2015), 011027. 47 / 49

slide-92
SLIDE 92

Overview Analysis Conclusion

BIBLIOGRAPHY II

Rushed Kanawati, Seed-centric approaches for community detection in complex networks, 6th international conference on Social Computing and Social Media (Crete, Greece) (Gabriele Meiselwitz, ed.), vol. LNCS 8531, Springer, June 2014, pp. 197–208. Zhana Kuncheva and Giovanni Montana, Community detection in multiplex networks using locally adaptive random walks, MANEM 2workshop - Proceedings of ASONAM 2015 (Paris), August 2015. Peter J Mucha, Thomas Richardson, Kevin Macon, Mason A Porter, and Jukka-Pekka Onnela, Community structure in time-dependent, multiscale, and multiplex networks, Science 328 (2010), no. 5980, 876–878. Manisha Pujari and Rushed Kanawati, Supervised rank aggregation approach for link prediction in complex networks, WWW (Companion Volume) (Alain Mille, Fabien L. Gandon, Jacques Misselis, Michael Rabinovich, and Steffen Staab, eds.), ACM, 2012, pp. 1189–1196.

  • A. Strehl and J. Ghosh, Cluster ensembles: a knowledge reuse framework for combining multiple partitions, The Journal of

Machine Learning Research 3 (2003), 583–617. 48 / 49

slide-93
SLIDE 93

Overview Analysis Conclusion

BIBLIOGRAPHY III

Zied Yakoubi and Rushed Kanawati, Licod: Leader-driven approaches for community detection, Vietnam Journal of Computer Science 1 (2014), no. 4, 241–256. 49 / 49

slide-94
SLIDE 94

MAN Tutorial Part III: Analysis of Attributed Networks

Rushed Kanawati, Martin Atzmueller

Université Sorbonne Paris Cité, France Tilburg University, Netherlands

DSAA 2017, Tokyo, 2017-10-20

slide-95
SLIDE 95

Agenda

Overview/Recap: Attributed Networks Compositional Subgroup Analysis Community Detection Link Prediction Summary

2

slide-96
SLIDE 96

Terminology (Recap)

Network è Graphs

Set of atomic entities (actors)

ènodes, vertices

Set of links/edges between nodes ("ties") Edges model pairwise relationships Edges: Directed or undirected Social network [Wassermann & Faust 1994]

Social structure capturing actor relations Actors, links given by dyadic ties between actors

(friendship, kinship, organizational position, …) è Set of nodes and edges

Abstract object – independent of representation

3

slide-97
SLIDE 97

Variables

[Wassermann & Faust 1994]

Structural

Measure ties between actors (è links) Specific relation Make up connections in graph/network

Compositional

Measure actor attributes

Age Gender Ethnicity Affiliation …

Describe actors

4

slide-98
SLIDE 98

Attributed Graphs

Graph: edge attributes and/or node attributes

Structure: ties/links (of respective relations)

Attributes - additional information

Actor attributes (node labels) Link attributes (information about connections) Attribute vectors for actors and/or links … can be mapped from/to each other

Integration of heterogenous data (networks +

vectors)

Enables simultaneous analysis of relational +

attribute data

5

slide-99
SLIDE 99

Attributed Network/Graph

Examples

Citation Attributes

(Co-)Authors Affiliation Country Gender …

WWW

Links Content (BoW) …

(Newman 2003)

6

slide-100
SLIDE 100

Subgroups & Cohesive subgroups

Subgroup

Subset of actors (and all their ties)

Define subgroups using specific criteria

(homogeneity among members)

Compositional – actor attributes Structural – using tie structures

Detection of cohesive subgroups &

communities è structural aspects

Subgroup discovery è actor attributes … attributed graph è can combine both

7

[Wasserman & Faust 1994]

slide-101
SLIDE 101

Compositional Subgroups

Detect subgroups according to specific

compositional criteria

Focus on actor attributes Describe actor subset using attributes

Often hypothesis-driven approaches: Test

specific attribute combinations

In contrast: Subgroup discovery

Hypothesis-generating approach Exploratory data mining method Local exceptionality detection

8

[Atzmueller 2015]

slide-102
SLIDE 102

Agenda

Overview/Recap: Attributed Networks Compositional Subgroup Analysis Community Detection Link Prediction Summary

9

slide-103
SLIDE 103

Subgroup Discovery & Analytics

§ Task:

„Find descriptions of subsets in the data, that differ significantly for the total population with respect to a target concept.“

§ Examples:

§ "45% of all men aged between 35 and 45 have a high

income in contrast to only 20% in total."

§ "66% all all woman aged between 50 and 60 have a

high centrality value in the corporate network"

Descriptive patterns for subgroup

Gender= Female Age = [50; 60] è Centrality = high {flickr, delicious}, {library, android}, {php, web} è Centrality = high

10

[Kloesgen 1996, Wrobel 1997]

slide-104
SLIDE 104

Subgroup Discovery

  • Given – INPUT:

– Data as set of cases (records) in tabular form – Target concept (e.g. „high centrality“) – Quality function (interesting measure)

  • OUTPUT - Result: Set of the best k Subgroups:

– Description, e.g., sex=female age= 50-60

è Conjunction of selectors

– Size n, e.g., in 180 of 1000 cases – Deviation

(p = 60% in the subgroup vs. p0=10% in all cases)

è"Quality" of the subgroup: weight size and deviation

11

slide-105
SLIDE 105

Subgroup Quality Functions

  • Consider size and deviation in the target concept
  • Weighted Relative Accuracy (a = 1)
  • Simple Binomial (a = 0.5)
  • Added Value (a = 0)
  • Continous: Mean value (m, m0) of target variable

n:Size of subgroup (number of cases) p: share of cases with target = true in the subgroup p0: share of cases with target = true in the total population a: weight size against deviation (parameter)

[Atzmueller 2015]

12

slide-106
SLIDE 106

Efficient Search

Heuristic: Beam Search Exhaustive Approaches:

Basic idea: Efficient data

structures + pruning

SD-Map – based on FP-

Growth [Atzmueller &

Puppe 2006]

SD-Map* – Utilizing

  • ptimistic estimates

(branch & bound)

[Atzmueller & Lemmerich 2009]

13

slide-107
SLIDE 107

Pruning

Optimistic Estimate

Pruning – Branch & Bound

Optimistic Estimate:

Upper bound for the quality of a pattern and all its specializations èTop-K Pruning

Remove path starting at

current pattern, if

  • ptimistic estimate for

current pattern (and all its specializations) is below quality of worst result of top-k results

14

slide-108
SLIDE 108

Local Exceptionality Detection

Exceptional Model Mining

Identification of Patterns showing an "interesting behavior" for a certain

"model"

Mean test (e.g., influence factors for increased centrality) Linear regression (e.g., different centrality measures) Correlation Coefficient (e.g., factors for role analysis) Variance (e.g., degree, clustering coefficient, …) …

Algorithms:

Beam-Search

[Duivestein et al. 2015]

GP-Growth

[Lemmerich et al. 2012]

Faster by multiple orders of magnitude compared to

standard methods

Fastest exhaustive algorithm so far

15

slide-109
SLIDE 109

Agenda

Overview/Recap: Attributed Networks Compositional Subgroup Analysis Community Detection Link Prediction Summary

16

slide-110
SLIDE 110

Combining Structure and Attributes

Data sources

Structural variables (ties, links) Compositional variables

Actor attributes Represented as attribute vectors

Edge attributes

Each edge has an assigned label Multiplex graphs

è Multiple edges (labels) between nodes

17

slide-111
SLIDE 111

Communities/Edge-Attributed Graphs

Clustering edge-attributed graphs

Reduce/flatten to weighted graph

[Bothorel et al. 2015]

Derive weights according to number of edges where nodes are

directly connected [Berlingerio et al. 2011]

Standard graph clustering approaches can then be directly applied

Frequent-itemset based [Berlingerio et al. 2013] Subspace-oriented [Boden et al. 2012]

18

slide-112
SLIDE 112

Node-Attributed Graphs

Non-uniform terminology

Social-attribute network Attribute augmented graph Feature-vector graph, vertex-labeled graph Attributed graph …

Different representations

19

[Bothorel et al. 2015]

slide-113
SLIDE 113

Community Detection – Attribute Extensions

Utilize structural + attribute information Different roles of a description

Methods "guiding"community detection using

attribute information

"Dense structures" - connectivity But no "perfect" attribute homogeneity (purity)

Methods generating explicit descriptions, i.e.,

descriptive community patterns

"Dense structures" – connectivity Concrete descriptions, e.g., conjunctive logical

formula

20

slide-114
SLIDE 114

Attributes for Aiding Community Detection

Weight modification (edges) according to nodal

attributes

[Ge et al. 2008, Dang & Viennet 2012, Ruan et al. 2013, Zhou et al. 2009, Steinhaeuser & Chawla 2008]

Abstraction into similarities between nodes

è Edge weights è Apply standard community detection algorithm,

Specifically, distance-based community detection

methods

Entropy-oriented methods

[Zhu et al. 2011, Smith et al. 2014, Cruz et al. 2011]

Model-based approaches [Xu et al. 2012, Yang et al.

2013, Akoglu et al. 2012]

21

slide-115
SLIDE 115

Weight modification

Use attribute-based distance measure Community detection: Group nodes according to

threshold , i.e., given (0, 1) place any pair of nodes whose edge weight exceeds the threshold into the same community

Evaluate final partitioning using Modularity

22

[Steinhaeuser & Chawla 2008]

slide-116
SLIDE 116

Entropy Minimization

For a partition, optimize entropy using

Monte-Carlo

Integrate

entropy step into Modularity

  • ptimization

algorithm

23

[Cruz et al. 2011] [Blondel et al. 2008]

slide-117
SLIDE 117

Model-based/MDL

In general: Model edge & attribute values

using mixtures of probability distributions

Use MDL to select clusters w.r.t. attribute

value similarity & connectivity similarity

Data compression of connectivity

& attribute matrices (PICS algorithm)

Lossless compression è MDL cost-function Resulting node groups

Homogeneous both in node & attribute matrix Nodes - similar connectivity & high attribute coherence

24

[Akoglu et al. 2013]

slide-118
SLIDE 118

Descriptive Community Patterns

Community mining scenario

Discover "densely connected groups of nodes" Communities should have explicit description Community (evaluation) space: network/graph

Goal:

Often: Discover top-k communities Maximize some community

quality function

25

slide-119
SLIDE 119

Examples: Community Patterns

Social tagging system:

{work, flickr, delicious} {business, production, sales} {php, web, internet},

{innovation, business, forschung}

{work, flickr, delicious},

{library, android, emulation}, {php, web, internet}

26

slide-120
SLIDE 120

Finding Explicit Descriptions

Cluster transformed node-attribute similarity

graph & extract pure clusters

Mine frequent itemsets (binary attributes)

& analyze communities

Combine dense subgraph mining + subspace

clustering

Apply correlated pattern mining Interleave community detection

& redescription mining

Adapt local exceptionality detection (using

subgroup discovery) for communities

27

[Adnan et al. 2009] [Moser et al. 2009,Günnemann et al. 2013] [Atzmueller & Mitzlaff 2011, Atzmueller et al. 2015] [Silva et al. 2012] [Pool et al. 2014]

slide-121
SLIDE 121

Subspace-Clustering & Dense Subgraphs

Twofold cluster O: Combine subspace-clustering &

dense subgraph mining (GAMer algorithm)

O fulfills subspace property (maximal distance threshold

w.r.t. node attribute values in O) with minimal number

  • f dimensions

O fulfills quasi-clique property, according to nodal-

degree and threshold

Induced subgraph of O is connected, and fulfills minimal

size threshold

Quality function: Density ∙ Size ∙ #Dimensions Pruning using subspace & quasi-clique properties Includes Redundancy-optimization step (Overlapping

communities)

28

[Günnemann et al. 2011]

slide-122
SLIDE 122

Correlated Pattern Mining

Structural correlation pattern mining (SCPM)

Correlation between node attribute set and dense

subgraph, induced by the attribute set

Quality measure: Comparison against null model

Size of the pattern Cohesion of the pattern (density of quasi-clique)

Compare against expected structural correlation of

attribute set (in random graph)

29

[Silva et al. 2011]

slide-123
SLIDE 123

Description-driven Community Detection

Find communities with concise descriptions

(e.g., given by tags)

Focus: Overlapping, diverse, descriptive

communities

Language: Disjunctions of conjunctive

expressions

Two-stage approach

Greedy hill-climbing step: Generate candidates

for communities

Redescription generation: Induce description

for each community, and reshape if necessary

Heuristic approach, due to large search

space

[Pool et al. 2014]

30

slide-124
SLIDE 124

Starts with candidate communities

Domain knowledge Partial communities Start with single vertices (later being extended

using hill-climbing approach)

ReMine algorithm for deriving patterns for

communities

[Zimmermann et al. 2010]

31

slide-125
SLIDE 125

[Pool et al. 2013]

32

slide-126
SLIDE 126

Description-Oriented Community Detection

Basic Idea: Pattern Mining for Community

Characterization

Mine patterns in description space (tags/topics)

è Subgroups of users described by tags/topics

Optimize quality measure in community space

è Network/graph of users

Improve understandability of communities (explanation)

[Atzmueller et al., Information Science, 2016]

33

slide-127
SLIDE 127

Direct Descriptive Community Mining

Goal: Identification/description of communities with

a high quality (exceptional model mining)

Input: Network/Graph + node properties (e.g., tags) Output: k-best community patterns

Description language: conjunctive expressions COMODO algorithm: Top-k pattern mining, based on

SD-Map* algorithm for subgroup discovery

Discover k-best patterns Search space: Conjunctions/tags Apply standard community quality functions, e.g.,

Modularity [Newman 2004]

34

slide-128
SLIDE 128

Community Detection on Attributed Graphs

Goal: Mine patterns describing such groups

Merge networks + descriptive features, e.g.,

characteristics of users

Target both

Community structure (some evaluation function) & Community description (logical formula, e.g.,

conjunction of features, see above)

35

slide-129
SLIDE 129

Transformation & Mining (II)

Dataset of edges connecting two nodes

Described by intersection of labels of the two nodes Additionally: Store nodes, and respective degrees

Apply top-k method w/ optimistic-estimate pruning

(COMODO)

Web Mining, Computer, Java Web Mining, Computer, JavaScript Web Mining, Computer

36

slide-130
SLIDE 130

Optimistic Estimates

Problem: Exponential Search Space Optimistic Estimate: Upper bound for the

quality of a pattern and all its specializations èTop-K Pruning

Delicious friend graph Last.fm friend graph

37

slide-131
SLIDE 131

Agenda

Overview/Recap: Attributed Networks Compositional Subgroup Analysis Community Detection Link Prediction Summary

38

slide-132
SLIDE 132

Link Prediction & Attributed Networks

Utilize both structure & attributive information

for link prediction ètwo different, but complementary perspectives (e.g. homophily)

Relational data – Markov networks: Object labels +

link information [Taskar et al. 2004]

Relations to statistical relational learning, e.g.

Markov Logic Networks,

e.g., [Richardson & Domingos 2006]

Probabilistic Soft Logic,

e.g., [Kimmig et al. 2012, Bach et al. 2013]

Combine logic and probabilities - formulate "rules“ on

attribute domains for inferring links

Or: As before, transform to a simple "base"

case, e.g., using weights

39

slide-133
SLIDE 133

Generative Models for Attributed Networks

How to model relationships between

attributes and structure? E.g.,

for link prediction/link formation/label prediction generating a network

Utilize generative graph models:

Attributed graph model (AGM)

[Pfeiffer et al. 2014]

Multiplicative attributed graph model

[Kim & Leskovec 2012]

Importance of attributes, structural inference

40

slide-134
SLIDE 134

Agenda

Overview/Recap: Attributed Networks Compositional Subgroup Analysis Community Detection Link Prediction Summary

41

slide-135
SLIDE 135

Summary

Subgroup analysis & community detection

enable the identification of subgroups at different levels & dimensions

Compositional Structural + compositional Providing explicit descriptions

Both can be combined for obtaining descriptive

community patterns according to standard community quality functions

Different approaches/techniques, select

depending on analysis goals

Outlook: Hybrid (multiplex, attributed) network

analysis approaches

42

slide-136
SLIDE 136

References

[Adnan et al. 2009] M. Adnan, R. Alhajj, J. Rokne (2009) Identifying Social

Communities by Frequent Pattern Mining. Proc. 13th Intl. Conf. Information Visualisation, IEEE Computer Society, Washington, DC, USA, pp. 413–418.

[Akoglu et al. 2012] L. Akoglu, H. Tong, B. Meeder, and C. Faloutsos (2012) Pics:

Parameter-free Identification of Cohesive Subgroups in Large Attributed

  • Graphs. Proc. SDM, SIAM, pp. 439–450. Omnipress

[Atzmueller 2015] Atzmueller, M (2015) Subgroup Discovery – Advanced

  • Review. WIREs: Data Mining and Knowledge Discovery, 5(1):35–49

[Atzmueller 2007] M. Atzmueller (2007) Knowledge-Intensive Subgroup Mining

– Techniques for Automatic and Interactive Discovery, Vol. 307 of Dissertations in Artificial Intelligence-Infix (Diski), IOS Press

[Atzmueller et al. 2004] M. Atzmueller, F. Puppe, H.-P. Buscher (2004) Towards

Knowledge-Intensive Subgroup Discovery, Proc. LWA 2004, pp. 117–123.

[Atzmueller & Puppe 2006] M. Atzmueller and F. Puppe (2006) SD-Map - A Fast

Algorithm for Exhaustive Subgroup Discovery. Proc. 10th European Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD 2006), pp. 6-17, Heidelberg, Germany. Springer Verlag

[Atzmueller et al. 2006] M. Atzmueller, J. Baumeister, and F. Puppe (2006)

Introspective Subgroup Analysis for Interactive Knowledge Refinement. Proc. FLAIRS, AAAI Press, Palo Alto, CA, USA, pp. 402-407

43

slide-137
SLIDE 137

References

[Atzmueller et al. 2005] M. Atzmueller, J. Baumeister, A. Hemsing, E.-J. Richter,

and F. Puppe (2005) Subgroup Mining for Interactive Knowledge Refinement. In

  • Proc. 10th Conference on Artificial Intelligence in Medicine, LNAI 3581, pp. 453-

462, Heidelberg, Germany, Springer.

[Atzmueller & Puppe 2008] M. Atzmueller and F. Puppe (2008) A Case-Based

Approach for Char-acterization and Analysis of Subgroup Patterns. Journal of Applied Intelligence, 28(3):210-221

[Atzmueller & Lemmerich 2012] M. Atzmueller and F. Lemmerich (2012)

VIKAMINE – Open-Source Subgroup Discovery, Pattern Mining, and Analytics. In

  • Proc. ECML/PKDD, Heidelberg, Germany. Springer Verlag.

[Atzmueller & Puppe 2005] M. Atzmueller and F. Puppe (2005) Semi-Automatic

Visual Subgroup Mining using VIKAMINE. Journal of Universal Computer Science, 11(11):1752-1765, 2005.

[Atzmueller & Lemmerich 2009] M. Atzmueller, F. Lemmerich (2009) Fast

Subgroup Discovery for Continuous Target Concepts. Proc. International Symposium on Methodologies for Intelligent Systems, Vol. 5722 of LNCS, Springer, Berlin, pp. 1–15.

[Atzmueller et al. 2012] M. Atzmueller, S. Doerfel, A. Hotho, F. Mitzlaff, and G.

Stumme (2012) Face-to-Face Contacts at a Conference: Dynamics of Commu- nities and Roles. In Modeling and Mining Ubiquitous Social Media, volume 7472

  • f LNAI. Springer Verlag, Heidelberg, Germany

44

slide-138
SLIDE 138

References

[Atzmueller & Lemmerich 2013] M. Atzmueller and F. Lemmerich (2013) Ex-

ploratory Pattern Mining on Social Media using Geo-References and Social Tagging Information. IJWS, 2(1/2)

[Atzmueller & Mitzlaff 2011] M. Atzmueller and F. Mitzlaff (2011) Efficient De-

scriptive Community Mining. Proc. 24th International FLAIRS Conference, pages 459-464, Palo Alto, CA, USA. AAAI Press.

[Atzmueller et al. 2016] M. Atzmueller, S. Doerfel, and F. Mitzlaff (2016)

Description-Oriented Community Detection using Exhaustive Subgroup Dis-

  • covery. Information Sciences. 329, 965-984

[Atzmueller et al. 2009] M. Atzmueller, F. Lemmerich, B. Krause, and A. Hotho

(2009) Who are the Spammers? Understandable Local Patterns for Concept

  • Description. In Proc. 7th Conference on Computer Methods and Systems,

Krakow, Poland. Oprogramowanie Nauko-Techniczne.

[Berlingerio et al. 2013] M. Berlingerio, F. Pinelli, and F. Calabrese (2013) ABA-

CUS: Apriori-BAsed Community discovery in mUltidimensional networkS. Data Mining and Knowledge Discovery, Springer, 27(3).

[Boden et al. 2012] B. Boden, S. Gu ̈nnemann, H. Hoffmann, and T. Seidl (2012)

Mining Coherent Subgraphs in Multi-Layer Graphs with Edge Labels. Proc. 18th ACM SIGKDD International Conference on Knowledge Discovery and Data

  • Mining. New York, USA: ACM Press

45

slide-139
SLIDE 139

References

[Bothorel et al. 2015] C. Bothorel, J. D. Cruz, M. Magnani, B. Micenkova

(2015) Clustering Attributed Graphs: Models, Measures and Methods. arXiv:1501.01676

[Bringmann et al. 2011] B. Bringmann, S. Nijssen, and A. Zimmermann

(2011) Pattern-based Classification: A Unifying Perspective. arXiv:1111.6191

[Clauset et al. 2004] A. Clauset, M. E. J. Newman, C. Moore (2004)

Finding Com- munity Structure in Very Large Networks. arXiv:cond- mat/0408187

[Cruz et al. 2011] J. D. Cruz, C. Bothorel, F. and Poulet (2011) Entropy

Based Community Detection in Augmented Social Networks. Computational Aspects of Social Networks, pp. 163-168

[Dang & Viennet 2012] T. A. Dang and E. Viennet (2012) Community

Detection Based on Structural and Attribute Similarities. Proc. International Conference on Digital Society (ICDS), pp. 7-14

[Duivestein et al. 2015] W. Duivesteijn, A.J. Feelders, and A. Knobbe

(2015) Ex- ceptional Model Mining - Supervised Descriptive Local Pattern Mining with Complex Target Concepts. Data Mining and Knowledge Discovery

46

slide-140
SLIDE 140

References

[Fortunato 2010] S. Fortunato (2010) Community Detection in Graphs,

Physics Reports 486 (3-5)

[Freeman 1978] L. Freeman (1978) Segregation In Social Networks,

Sociological Methods & Research 6 (4)

[Ge et al. 2008] R. Ge, M. Ester, B. J. Gao, Z. Hu, B. Bhattacharya, and B.

Ben- Moshe (2008) Joint Cluster Analysis of Attribute Data and Relationship Data: The Connected k-Center Problem, Algorithms and

  • Applications. Acm Trans. Knowl. Discov. Data, 2(2)

[Girvan & Newman 2002] M. Girvan, M. E. J. Newman (2002)

Community Struc- ture in Social and Biological Networks, PNAS 99 (12)

[Günnemann et al. 2013] S. Gü̈nnemann, I. Färber, B. Boden, T. Seidl

(2013) GAMer: A Synthesis of Subspace Clustering and Dense Subgraph

  • Mining. Knowledge and Information Systems (KAIS), Springer

[Kannan et al. 2004] R. Kannan, S. Vempala, A. Vetta (2004) On

Clustering: Good, Bad and Spectral. Journal of the ACM, 51(3)

47

slide-141
SLIDE 141

References

[Kibanov et al. 2014] M. Kibanov, M. Atzmueller, C. Scholz, and G. Stumme

(2014) Temporal Evolution of Contacts and Communities in Networks of Face-to- Face Human Interactions. Science China, 57

[Kloesgen 1996] Kloesgen, W. (1996) Explora: A Multipattern and Multistrategy

Discovery Assistant. In Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R., editors, Advances in Knowledge Discovery and Data Mining,

  • pp. 249–271. AAAI Press.

[Lancichinetti 2009] A. Lancichinetti, S. Fortunato (2009) Community Detection

Algorithms: A Comparative Analysis. arXiv:0908.1062

[Lazarsfield & Merton 1954] P. F. Lazarsfeld, R. K. Merton (1954) Friendship as a

Social Process: A Substantive and Methodological Analysis. Freedom and Control in Modern Society, 18(1), 18-66

[Leman et al. 2008] D. Leman, A. Feelders, and A. Knobbe (2008). Exceptional

Model Mining. In Proc. European Conference on Machine Learning and Prin- ciples and Practice of Knowledge Discovery in Databases, volume 5212 of Lec- ture Notes in Computer Science, pages 1–16. Springer.

[Lemmerich et al. 2012] F. Lemmerich, M. Becker, and M. Atzmueller (2012)

Generic Pattern Trees for Exhaustive Exceptional Model Mining. In Proc. ECML/PKDD, Heidelberg, Germany. Springer

48

slide-142
SLIDE 142

References

[Lemmerich et al. 2010] F. Lemmerich, M. Rohlfs, and M. Atzmueller (2012) Fast

Discovery of Relevant Subgroup Patterns. Proc. 23rd FLAIRS Conference, AAAI Press, New York, NY, USA.

[Leskovec et al. 2010] J. Leskovec, K. J. Lang, and M. Mahoney (2010) Empiri- cal

Comparison of Algorithms for Network Community Detection. Proc. 19th International Conference on World Wide Web, pp. 631-640. ACM

[McPherson et al. 2011] M. McPherson, L. Smith-Lovin, and J. M. Cook (2001)

Birds of a Feather: Homophily in Social Networks. Annual Review of Sociology, 415-444

[Mitzlaff et al. 2014] F. Mitzlaff, M. Atzmueller, A. Hotho, and G. Stumme (2014)

The Social Distributional Hypothesis: A Pragmatic Proxy for Homophily in Online Social Networks. Social Network Analysis and Mining 4(1), 216

[Mitzlaff et al. 2011] F. Mitzlaff, M. Atzmueller, D. Benz, A. Hotho, and G.

Stumme (2011) Community Assessment using Evidence Networks. In Analysis of Social Media and Ubiquitous Data, volume 6904 of LNAI

[Mitzlaff et al. 2013] F. Mitzlaff, M. Atzmueller, D. Benz, A. Hotho, and G.

Stumme (2013) User-Relatedness and Community Structure in Social Interaction Net- works. CoRR/abs, 1309.3888

49

slide-143
SLIDE 143

References

  • [Moser et al. 2009] F. Moser, R. Colak, A. Rafiey, and M. Ester (2009)

Mining Cohesive Patterns from Graphs with Feature Vectors. Proc. SDM (Vol. 9), pp. 593-604.

[Newman 2004] M. E. Newman (2004). Detecting community structure

in networks. The European Physical Journal B-Condensed Matter and Complex Systems, 38(2), 321-330.

[Newman 2006] M. E. Newman 2006) Modularity and Community

Structure in Networks. PNAS, 103(23), 8577-8582.

[Palla et al. 2005] G. Palla, I. Derenyi, I. Farkas, and T. Vicsek (2005)

Uncovering the Overlapping Community Structure of Complex Networks in Nature and Society. Nature, 435(7043), 814-818

[Pool et al. 2014] S. Pool, F. Bonchi, M. van Leeuwen (2014) Description-

driven Community Detection, Transactions on Intelligent Systems and Technology 5 (2)

[Psorakis et al. 2011] I. Psorakis, S. Roberts, M. Ebden, and B. Sheldon.

Overlap- ping Community Detection using Bayesian Non-Negative Matrix Factorization. Phys. Rev. E 83, 066114

50

slide-144
SLIDE 144

References

[Puppe et al. 2008] F. Puppe, M. Atzmueller, G. Buscher, M. Huettig, H. Lührs,

and H.-P. Buscher (2008) Application and Evaluation of a Medical Knowledge- System in Sonography (SonoConsult). In Proc. 18th European Conference on Artificial Intelligence (ECAI 20008), pp. 683-687

[Ruan et al. 2013] Y. Ruan, D. Fuhry, and S. Parthasarathy (2013). Efficient Com-

munity Detection in Large Networks Using Content and Links. Proc. 22nd International Conference on World Wide Web, pp. 1089–1098, ACM.

[Tang & Liu 2010] L. Tang and H. Liu (2010) Community Detection and Mining in

Social Media. Synthesis Lectures on Data Mining and Knowledge Discovery, 2(1), 1-137. Morgan & Claypool Publishers

[Steinhaeuser & Chawla 2008] K. Steinhaeuser, N. V. Chawla (2008) Community

Detection in a Large Real-World Social Network. Social Computing, Behavioral Modeling, and Prediction, pp. 168–175, Springer

[Silva et al. 2012] A. Silva, W. Meira Jr., and M. J. Zaki (2010) Structural Cor-

relation Pattern Mining for Large Graphs. Proc. Workshop on Mining and Learning with Graphs. MLG ’10, pp. 119–126. New York, NY, USA: ACM.

[Smith et al. 2014] L. M. Smith, L. Zhu, K. Lerman, and A. G. Percus. Parti- tioning

Networks with Node Attributes by Compressing Information Flow. arXiv:1405.4332

51

slide-145
SLIDE 145

References

  • [Scholz et al. 2013] C. Scholz, M. Atzmueller, A. Barrat, C. Cattuto, and
  • G. Stumme (2013). New Insights and Methods For Predicting Face-To-

Face Contacts. Proc. 7th Intl. AAAI Conference on Weblogs and Social Media, Palo Alto, CA, USA, AAAI Press.

[Wassermann & Faust 1994] S. Wasserman, and K. Faust (1994) Social

Network Analysis: Methods and Applications. Structural Analysis in the Social Sciences. Cambridge University Press, 1 edition.

[Wrobel 1997] S. Wrobel (1997) An Algorithm for Multi-Relational

Discovery of Subgroups. In Proc. 1st Europ. Symp. Principles of Data Mining and Knowl- edge Discovery, pages 78–87, Heidelberg, Germany. Springer Verlag.

[Xie et al. 2013] J. Xie, S. Kelley, and B. K. Szymanski (2013) Overlapping

Com- munity Detection in Networks: The State-of-the-art and Comparative Study. ACM Comput. Surv., 45(4):43:1–43:35.

[Xu et al. 2012] Z. Xu, Y. Ke, Y. Wang, H. Cheng, and J. Cheng (2012) A

Model- based Approach to Attributed Graph Clustering. Proc. ACM International Conference on Management of Data. SIGMOD ’12, pp. 505–516, New York, NY, USA. ACM.

52

slide-146
SLIDE 146

References

[Yang et al. 2013] J. Yang, J. McAuley, and J. Leskovec

(2013) Community Detec- tion in Networks with Node

  • Attributes. Proc. IEEE International Conference on

Data Mining (ICDM), pp. 1151–1156. IEEE Press, Washington, DC, USA

[Zachary, 1977] W. W. Zachary (1977) An Information

Flow Model for Conflict and Fission in Small Groups. Journal of Anthropological Research, 452-473.

[Zhou et al. 2009] Y. Zhou, H. Cheng, and J. X. Yu

(2009) Graph Clustering Based on Structural/Attribute Similarities. Proc. VLDB Endow., 2(1), 718–729.

53

slide-147
SLIDE 147

Applications Tools Conclusions

Mining Attributed Networks

Part 4 – Applications, Tools & Conclusions Rushed Kanawati, Martin Atzmueller

A3, Universit´ e Sorbonne Paris Cit´ e, France CSAI, Tilburg University, Netherlands DSAA’17, Tokyo – 20 October 2017

1 / 53

slide-148
SLIDE 148

Applications Tools Conclusions

OUTLINE

1 Applications Recommender systems, Cluster ensemble selection Explanation-aware recommendation 2 Tools Multiplex analysis Compositional analysis Attributed network analysis 3 Conclusions Summary Hybrid Approaches Further Directions & Outlook

2 / 53

slide-149
SLIDE 149

Applications Tools Conclusions

APPLICATIONS

1 Film recommendation 2 Tag recommendation 3 Collaboration recommendation 4 Ensemble clustering selection

3 / 53

slide-150
SLIDE 150

Applications Tools Conclusions

FILM RECOMMENDATION

Film rating matrix = bipartite graph

4 / 53

slide-151
SLIDE 151

Applications Tools Conclusions

FILM RECOMMENDATION

5 / 53

slide-152
SLIDE 152

Applications Tools Conclusions

FILM RECOMMENDATION

6 / 53

slide-153
SLIDE 153

Applications Tools Conclusions

FILM RECOMMENDATION : MULTIPLEX NETWORK

Figure: MovieLens 100k multiplex (Projection by users) Figure: MovieLens 100k multiplex (Projection by movies)

7 / 53

slide-154
SLIDE 154

Applications Tools Conclusions

FILM RECOMMENDATION : RESULTS

Simple approach Recommend the statistical mode value of links linking clusters of target user to the cluster of target films

MAE RMSE Precision Recall F1-measure GTM 0.9441 1.2549 0.2185 0.2207 0.2195

  • T. co-clustering

0.9293 1.2562 0.25587 0.2094 0.2303 muxlicod 0.9635 1.2773 0.2274 0.2134 0.2202 LA louvain 0.8352 1.1509 0.3113 0.2521 0.2779 LA walktrap 0.8216 1.1155 0.2642 0.2233 0.2420 PA louvain 0.8713 1.1917 0.2532 0.2032 0.2245 PA walktrap 0.8801 1.2023 0.2705 0.2011 0.2283 Table: Result of the proposed recommendation system with each algorithm in MovieLens 100k dataset (PA : Partition Aggregation, LA : Layer Aggregation

8 / 53

slide-155
SLIDE 155

Applications Tools Conclusions

APPLICATION II: TAG RECOMMENDER (TLTR)

Figure: TLTR model

9 / 53

slide-156
SLIDE 156

Applications Tools Conclusions

FOLKSONOMY GRAPHS

Figure: Tripartite graph projection into three bipartite graphs

10 / 53

slide-157
SLIDE 157

Applications Tools Conclusions

MULTIPLEX NETWORK CONSTRUCTION

Figure: Tag multiplex network: Steps of transformation

11 / 53

slide-158
SLIDE 158

Applications Tools Conclusions

EXPERIMENTS: BIBSONOMY DATASET

# Users # Tags # Resources # Edges 116 412 361 24297

Table: Bibsonomy dataset

Networks slices Nodes Edges Density User User-Resource 116 901 0,135 User-Tag 116 985 0,147 Tag Tag-Resource 412 2496 0,0294 Tag-User 412 1956 0,0231 Resource Resource-Tag 361 2814 0,0433 Resource-User 361 1685 0,0259

Table: Multiplex networks of Bibsonomy

12 / 53

slide-159
SLIDE 159

Applications Tools Conclusions

TAG RECOMMENDATION: RESULTS

Graphs #Nodes #Edges #Users #Tags #Resources G 889 24297 116 412 361 Gc (Mux-Licod) 434 1677 97 154 183 compression in % 51, 18 93, 1 16, 37 62, 62 49, 30 Gc (GenLouvain) 16 79 4 6 6 compression in % 98, 2 99, 67 96, 55 98, 54 98, 33 Gc (LA (Licod)) 91 46 13 40 38 compression in % 89, 76 99, 81 88, 79 90, 29 89, 47 Gc (LA (Louvain)) 9 27 3 3 3 compression in % 98, 98 99, 88 97, 41 99, 27 99, 16 Gc (EC (Licod)) 151 993 3 89 59 compression in % 83, 08 95, 91 97, 41 78, 39 83, 65 Gc (EC (Louvain)) 25 187 8 11 6 compression in % 97, 18 78, 96 93, 10 97, 33 98, 3313 / 53

slide-160
SLIDE 160

Applications Tools Conclusions

TAG RECOMMENDATION: RESULTS

Figure: Comparative study of different tag recommendation approaches in terms of precision with kt = 1

14 / 53

slide-161
SLIDE 161

Applications Tools Conclusions

TAG RECOMMENDATION: RESULTS

Figure: Comparative study of different tag recommendation approaches in terms of precision with kt = 2

15 / 53

slide-162
SLIDE 162

Applications Tools Conclusions

TAG RECOMMENDATION: RESULTS

Figure: Comparative study of different tag recommendation approaches in terms of precision with kt = 3

16 / 53

slide-163
SLIDE 163

Applications Tools Conclusions

TAG RECOMMENDATION: RESULTS

Figure: Comparative study of different tag recommendation approaches in terms of precision with kt = 4

17 / 53

slide-164
SLIDE 164

Applications Tools Conclusions

DISCUSSION

I Multiplex approaches outperform layer aggregation and EC approaches on benchmark networks I Layer aggregation approaches do well for film recommendation! I EC approaches rank first for Tag recommendation! I Problem: What is the Validity of topological community quality indexes?

18 / 53

slide-165
SLIDE 165

Applications Tools Conclusions

APPLICATION III: SCIENTIFIC COLLABORATION

RECOMMENDER

19 / 53

slide-166
SLIDE 166

Applications Tools Conclusions

LINK PREDICTION: SUPERVISED APPROACH

20 / 53

slide-167
SLIDE 167

Applications Tools Conclusions

EXPERIMENTS: DBLP

Years Properties Co-Author Co-Venue Co-Citation 1970-1973 Nodes 91 91 91 Edges 116 1256 171 1972-1975 Nodes 221 221 221 Edges 319 5098 706 1974-1977 Nodes 323 323 323 Edges 451 9831 993

Table: Basic statistics about the 3-layer DBLP multiplex networks

Years # Positive # Negatives Train/Test Labeling 1970-1973 1974-1975 16 1810 1972-1975 1976-1977 49 12141 1974-1977 1978-1979 93 26223

Table: # examples extracted from co-authorship layer (number of unconnected nodes in

connected components)

21 / 53

slide-168
SLIDE 168

Applications Tools Conclusions

LINK PREDICTION: RESULTS

Learning:1970-1973 Learning:1972-1975 Attributes Test:1972-1975 Test:1974-1977 F-measure AUC F-measure AUC Setdirect 0.0357 0.5263 0.0168 0.4955 Setdirect+indirect 0.0256 0.5372 0.0150 0.5132 Setdirect+multiplex 0.0592 0.5374 0.0122 0.5108 Setall 0.0153 0.5361 0.0171 0.5555 Setmultiplex 0.0374 0.5181 0.0185 0.5485

Table: Comparative link prediction results applying decision tree algorithm using different types of attributs

22 / 53

slide-169
SLIDE 169

Applications Tools Conclusions

APPLICATION IV: ENSEMBLE CLUSTERING SELECTION

Motivation The quality of a consensus clustering depends on both the quality and diversity of input base clusterings [FL08, AF09, NCC13, ADIA15]. Problem definition I Let Π = {π1, . . . , πn} be a set of base partitions I ES(Π) = Π∗ ⊂ Π : Q(EC(Π∗)) > Q(EC(Π)) I Q : Quality of the consensus clustering

23 / 53

slide-170
SLIDE 170

Applications Tools Conclusions

DIVERSITY

Clustering Similarity measures I Purity I Rand/ARI I NMI (Normlized mutual information) I IV (Information variation) [Mei03] I . . .

24 / 53

slide-171
SLIDE 171

Applications Tools Conclusions

QUALITY

Cluster internal quality indexes [AR14] I Silhouette index, I Calinski-Harabasz index I Davis-Bouldin index I Dunn index I . . . Network-oriented indexes I Modularity I Average conductance I Average local Modularities : L, M, R [Kan15] I See also [YL12] I . . .

25 / 53

slide-172
SLIDE 172

Applications Tools Conclusions

ENSEMBLE SELECTION APPROACHES : LIMITATIONS

I Existing approaches are defined for attribute/value datasets with metric distances I Use of one quality/diversity measure. I Requires the number of clusters to select as input. I . . . Proposed approach: contributions I Designed for both networks and attribute/value datasets I Use of an ensemble of quality/diversity measures. I The number of selected base clustering is automatically computed.

26 / 53

slide-173
SLIDE 173

Applications Tools Conclusions

ENSEMBLE SELECTION APPROACH

The idea ⌅ Cluster the set of base clusterings using an ensemble of similarity measures Apply a multiplex community detection algorithm to a multiplex network whose nodes are the set of base clusterings and whose layers are defined by a set of proximity graphs, each defined according a to a given similarity measure ⌅ From each cluster select the node (i.e clustering) that is ranked first according to an ensemble of quality measures. Apply ensemble ranking algorithms

27 / 53

slide-174
SLIDE 174

Applications Tools Conclusions

ENSEMBLE SELECTION APPROACH

Algorithm 1 Graph-based cluster ensemble selection algorithm Require: Π = {π1, . . . , πr} a set of base clusterings Require: S = {S1, . . . , Sn} A set of partition similarity functions Require: Q = {Q1, . . . , Qm} A set of partition quality functions

1: Π∗ ← ∅ 2: MUX ← Multiplex(Π) 3: for all Si ∈ S do 4:

MUX.add layer(proximity graph(Π, Si))

5: end for 6: C = {c1, . . . , ck} ← community detection(MUX) 7: for all c ∈ C do 8:

ˆ π ← ensemble Ranking(c, Q)

9:

Π∗ ← Π∗ ∪ {ˆ π}

10: end for 11: return Π∗

28 / 53

slide-175
SLIDE 175

Applications Tools Conclusions

THE PROPOSED APPROACH

29 / 53

slide-176
SLIDE 176

Applications Tools Conclusions

THE PROPOSED APPROACH

30 / 53

slide-177
SLIDE 177

Applications Tools Conclusions

THE PROPOSED APPROACH

31 / 53

slide-178
SLIDE 178

Applications Tools Conclusions

ENSEMBLE RANKING

Problem I Let L be a set of elements to rank by n rankers I Let σi be the rank provided by ranker i I Goal: Compute a consensus ranking of L. Deja Vu: Social choice algorithms, but . . . I Small number of voters and big number of candidates I Algorithmic efficiency is required Algorithms I Borda I Kemeny approaches (commuting Condorcet winner if it exists)

32 / 53

slide-179
SLIDE 179

Applications Tools Conclusions

EXPERIMENT ON SMALL NETWORKS WITH KNOWN

GROUND TRUTH PARTITIONS

I Generation of 20 base clusterings applying a standard Label propagation algorithm I Proximity graphs : RNG I S = { NMI, ARI, VI } Q = { modularity, Local modularities L, M, R }

Table: Evaluation of the proposed graph-based ensemble selection Dataset Approach NMI ARI Zachary Ensemble clustering without selection 0.57 0.46 Ensemble clustering with selection 0.77 0.69 US Politics Ensemble clustering without selection 0.55 0.68 Ensemble clustering with selection 0.68 0.67 Dolphins Ensemble clustering without selection 0.55 0.39 Ensemble clustering with selection 0.58 0.59

33 / 53

slide-180
SLIDE 180

Applications Tools Conclusions

EXPERIMENT II : DBLP CO-AUTHORSHIP NETWORK

I Co-authorship network 1970-1977 (GCC) : |V| = 643, |m| = 886 I Generation of 10, 100 base clusterings I Proximity graphs : RNG I S = { NMI, ARI, VI } I Q = { modularity, Local modularities L, M, R }

Table: Evaluation of the proposed graph-based ensemble selection # base clusterings 10 Nodes Compression without selection 18,3% Nodes Compression with selection 20,9% Edge compression without selection 17,2% Edge compression with selection 17,6% Modularity without selection 0.3734 Modularity with selection 0.43756

34 / 53

slide-181
SLIDE 181

Applications Tools Conclusions

EXPERIMENT II : DBLP CO-AUTHORSHIP NETWORK

Table: Evaluation of the proposed graph-based ensemble selection # base clusterings 100 Nodes Compression without selection 35,1% Nodes Compression with selection 40,3% Edge compression without selection 36,2% Edge compression with selection 38,3% Modularity without selection 0.4031 Modularity with selection 0.4665

35 / 53

slide-182
SLIDE 182

Applications Tools Conclusions

MULTIPLEX ANALYSIS TOOLS

I muxviz : http://muxviz.net I Muna : http://lipn.fr/muna I Pymnet http://people.maths.ox.ac.uk/kivela/mln_library/

36 / 53

slide-183
SLIDE 183

Applications Tools Conclusions

MUXVIZ

I R package I Main features : ⌅ Visualization ⌅ Layer compression methods ⌅ Basic metrics ⌅ Community detection : Modularity-based, infomap I Input : text file per layer + one file for the general structure.

37 / 53

slide-184
SLIDE 184

Applications Tools Conclusions

MUNA

I Available for R and Python I Built on top of igraph I Extended set of multiplex network edition functions (similar to igraph) I Basic metrics : degree, neighborhood I Extended set of community detection approaches I Topological community evaluation indexes. I Limitations : I No visualisation support I Simple categorical coupling only.

38 / 53

slide-185
SLIDE 185

Applications Tools Conclusions

PYMNET

I Pure Python + integration with networkX package. I Can handle general multilayer networks I Rule based generation and lazy-evaluation of coupling edges I Various network analysis methods, transformations, reading and writing networks, network models etc. I Visualization support

39 / 53

slide-186
SLIDE 186

Applications Tools Conclusions

PYMNET: VISUALISATION EXAMPLE

40 / 53

slide-187
SLIDE 187

Applications Tools Conclusions

ATTRIBUTED NETWORK ANALYSIS TOOLS

I rsubgroup http://rsubgroup.org I VIKAMINE [AL12] http://vikamine.org I COMODO [AM11, ADM16] http://vikamine.org I DCM [PBvL14] http://patternsthatmatter.org/software.php#dcm I GAMER [GFBS13] http://dme.rwth-aachen.de/de/gamer

41 / 53

slide-188
SLIDE 188

Applications Tools Conclusions

RSUBGROUP

Open Source: http://rsubgroup.org I R-Package for subgroup discovery I Support for detecting compositional subgroups I Can be utilized for dyadic analysis (on attributed graphs) I Support for ARFF, CSV files I Simple interface to subgroup discovery functionality I Wrapper to Java-based (efficient) implementations I Based on Open Source VIKAMINE system

42 / 53

slide-189
SLIDE 189

Applications Tools Conclusions

VIKAMINE [AL12]

Open Source: http://vikamine.org I Visual, Interactive and Knowledge-intensive Analysis and semantic MINing Environment I Open Source Java implementation I Efficient automatic discovery & community detection algorithms I Seamless integration of visualization methods I Effective visualizations for ad-hoc analysis I Ad-hoc formalization, utilization, and extension of background knowledge I Works also on big data (Map/Reduce)

43 / 53

slide-190
SLIDE 190

Applications Tools Conclusions

COMODO [AM11, ADM16]

Open Source: Available as plugin to VIKAMINE system vikamine.org I Description-oriented approach for community detection on attributed networks I Open Source Java implementation I Algorithm based on subgroup discovery I Top-k pattern mining I Concise descriptions (conjunctions of features/descriptors) I Efficient search & pruning techniques

44 / 53

slide-191
SLIDE 191

Applications Tools Conclusions

SUMMARY

I Powerful applications supported using multiplex and attributed network analysis. I New tools for multiplex mining : Muna [FK15], muxviz[DPA14], Pymnet I New tools for mining attributed networks: I Attributed graph clustering, e.g., GAMER [GFBS13] http://dme.rwth-aachen.de/de/gamer I Description-oriented community detection: VIKAMINE, COMODO [AL12, AM11, ADM16] (www.vikamine.org) I Compositional subgroup analysis: VIKAMINE & rsubgroup (rsubgroup.org)

45 / 53

slide-192
SLIDE 192

Applications Tools Conclusions

HYBRID APPROACHES

I Hybrid: combinations of attributed, multiplex, dynamic,

  • e. g., [SAB+13] . . .

I Emerging research direction, combination of different techniques I Recent research directions regarding hybrid approaches, e.g., I subspace clustering [BGHS17] for edge attributed multi-graphs I Clustering attributed multi-graphs [PPD17] I Modeling and comparing graph-based hypotheses on attributed multiplex networks [ASK16, ASKA17] I Bayesian model fitting on edge formation in attributed multi-graphs [ENLSS17] I Exceptional model mining (Bayesian modeling) in attributed multiplex networks, cf. [Atz16]

46 / 53

slide-193
SLIDE 193

Applications Tools Conclusions

FURTHER DIRECTIONS & OUTLOOK

I Challenges modeling heterogeneous data, e.g. in ubiquitous and social environments, e. g., [ABK+14] I Necessary: Efficient methods and tools for the mining of such data I Mining complex networks from different perspectives & dimensions I Integration of attributed multiplex networks & temporal information I Explanation-aware methods, providing transparent modeling, mining & explanations, e. g., [RBSLB07, ARB10]

47 / 53

slide-194
SLIDE 194

Applications Tools Conclusions

BIBLIOGRAPHY I

Martin Atzmueller, Martin Becker, Mark Kibanov, Christoph Scholz, Stephan Doerfel, Andreas Hotho, Bjoern-Elmar Macek, Folke Mitzlaff, Juergen Mueller, and Gerd Stumme, Ubicon and its Applications for Ubiquitous Social Computing, New Review of Hypermedia and Multimedia 20 (2014), no. 1, 53–77. Ebrahim Akbari, Halina Mohamed Dahlan, Roliana Ibrahim, and Hosein Alizadeh, Hierarchical cluster ensemble selection, Engineering Applications of Artificial Intelligence 39 (2015), 146–156. Martin Atzmueller, Stephan Doerfel, and Folke Mitzlaff, Description-Oriented Community Detection using Exhaustive Subgroup Discovery, Information Sciences 329 (2016), 965–984. Javad Azimi and Xiaoli Fern, Adaptive cluster ensemble selection, IJCAI (Craig Boutilier, ed.), 2009, pp. 992–997. Martin Atzmueller and Florian Lemmerich, VIKAMINE - Open-Source Subgroup Discovery, Pattern Mining, and Analytics,

  • Proc. ECML/PKDD 2012: European Conference on Machine Learning and Principles and Practice of Knowledge

Discovery in Databases (Heidelberg, Germany), Springer Verlag, 2012. 48 / 53

slide-195
SLIDE 195

Applications Tools Conclusions

BIBLIOGRAPHY II

Martin Atzmueller and Folke Mitzlaff, Efficient Descriptive Community Mining, Proc. 24th International FLAIRS Conference (Palo Alto, CA, USA), AAAI Press, 2011, pp. 459 – 464. Charu C. Aggarwal and Chandan K. Reddy (eds.), Data clustering: Algorithms and applications, CRC Press, 2014. Martin Atzmueller and Thomas Roth-Berghofer, The Mining and Analysis Continuum of Explaining Uncovered, Proc. 30th SGAI International Conference on Artificial Intelligence (AI-2010), 2010. Martin Atzmueller, Andreas Schmidt, and Mark Kibanov, DASHTrails: An Approach for Modeling and Analysis of Distribution-Adapted Sequential Hypotheses and Trails, Proc. WWW 2016 (Companion), IW3C2 / ACM, 2016. Martin Atzmueller, Andreas Schmidt, Benjamin Kloepper, and David Arnu, HypGraphs: An Approach for Analysis and Assessment of Graph-Based and Sequential Hypotheses, New Frontiers in Mining Complex Patterns. Postproceedings NFMCP 2016 (Heidelberg, Germany), LNAI, springer, 2017. 49 / 53

slide-196
SLIDE 196

Applications Tools Conclusions

BIBLIOGRAPHY III

Martin Atzmueller, Detecting Community Patterns Capturing Exceptional Link Trails, Proc. IEEE/ACM ASONAM (Boston, MA, USA), IEEE Press, 2016. Brigitte Boden, Stephan G¨ unnemann, Holger Hoffmann, and Thomas Seidl, Mimag: mining coherent subgraphs in multi-layer graphs with edge labels, Knowledge and Information Systems 50 (2017), no. 2, 417–446. Manlio De Domenico, Mason A. Porter, and Alex Arenas, Multilayer analysis and visualization of networks, J. Complex Netw. (2014) 10 (2014). Lisette Esp´ ın-Noboa, Florian Lemmerich, Markus Strohmaier, and Philipp Singer, Janus: A hypothesis-driven bayesian approach for understanding edge formation in attributed multigraphs, Applied Network Science 2 (2017), no. 1, 16. Issam Falih and Rushed Kanawati, Muna: A multiplex network analysis library, The 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (Paris), August 2015, pp. 757–760. 50 / 53

slide-197
SLIDE 197

Applications Tools Conclusions

BIBLIOGRAPHY IV

Xiaoli Z. Fern and Wei Lin, Cluster ensemble selection, Statistical Analysis and Data Mining 1 (2008), no. 3, 128–141. Stephan G¨ unnemann, Ines F¨ arber, Brigitte Boden, and Thomas Seidl, GAMer: A Synthesis of Subspace Clustering and Dense Subgraph Mining, Knowledge and Information Systems (KAIS), Springer, 2013. Rushed Kanawati, Empirical evaluation of applying ensemble methods to ego-centered community identification in complex networks, Neurocomputing 150, B (2015), 417–427. Marina Meila, Comparing clusterings by the variation of information, COLT (Bernhard Sch¨

  • lkopf and Manfred K. Warmuth,

eds.), Lecture Notes in Computer Science, vol. 2777, Springer, 2003, pp. 173–187. Murilo Coelho Naldi, Andr´ e C. P. L. F. Carvalho, and Ricardo J. G. B. Campello, Cluster ensemble selection based on relative validity indexes, Data Min. Knowl. Discov. 27 (2013), no. 2, 259–289. 51 / 53

slide-198
SLIDE 198

Applications Tools Conclusions

BIBLIOGRAPHY V

Simon Pool, Francesco Bonchi, and Matthijs van Leeuwen, Description-driven community detection, ACM TIST 5 (2014),

  • no. 2, 28.

Andreas Papadopoulos, George Pallis, and Marios D Dikaiakos, Weighted clustering of attributed multi-graphs, Computing 99 (2017), no. 9, 813–840. Thomas Roth-Berghofer, Stefan Schulz, David Leake, and Daniel Bahls, Explanation-Aware Computing, AI Magazine 28 (2007), no. 4. Christoph Scholz, Martin Atzmueller, Alain Barrat, Ciro Cattuto, and Gerd Stumme, New Insights and Methods For Predicting Face-To-Face Contacts, Proc. 7th Intl. AAAI Conference on Weblogs and Social Media (Palo Alto, CA, USA) (Emre Kiciman, Nicole B. Ellison, Bernie Hogan, Paul Resnick, and Ian Soboroff, eds.), AAAI Press, 2013. 52 / 53

slide-199
SLIDE 199

Applications Tools Conclusions

BIBLIOGRAPHY VI

Jaewon Yang and Jure Leskovec, Defining and evaluating network communities based on ground-truth, ICDM (Mohammed Javeed Zaki, Arno Siebes, Jeffrey Xu Yu, Bart Goethals, Geoffrey I. Webb, and Xindong Wu, eds.), IEEE Computer Society, 2012, pp. 745–754. 53 / 53