[PPT] - Mining Interesting Link Formation Rules in Social Networks Cane PowerPoint Presentation

SLIDE 1

Mining Interesting Link Formation Rules in Social Networks

Cane Wing-Ki Leung, Ee-Peng Lim, David Lo, Jianshu Weng School of Information Systems Singapore Management University

SLIDE 2

Outline

Introduction
Methodology
Empirical Study
Conclusions

2 CIKM'10

27/10/2010

SLIDE 3

Introduction

Propose the task of mining interesting link formation rules in

social networks

Goal: examine how links are formed in social networks as a

structural effect

27/10/2010

CIKM'10 3

SLIDE 4

Example: Reciprocity Effect

A simple example – reciprocity effect:

– Given is a pair of nodes, called the start node s and the end node e – Suppose we know that e trusts s at a certain time point. Questions:

Will s also trust e later?
How frequently/likely will this happen?
What other connections between s and e may lead to link formation?

4 CIKM'10

27/10/2010

SLIDE 5

More on the task

Will s trust e later?

– A temporal constraint – A partial order in which “s trusts e” is formed after all other links connecting s and e

How frequently/likely will this happen?

– Quantifying the interestingness of the observed patterns

27/10/2010

CIKM'10 5

SLIDE 6

More on the task

What other connections between s and e may lead to link

formation?

– Structural constraints require that s and e be connected in some way – We consider dyadic and triadic structures, aka local structures, as they have long been used in sociology for studying and predicting the dynamics of large, complex networks – Seek to mine interesting patterns that obey such constraints

27/10/2010

CIKM'10 6

SLIDE 7

Outline

Introduction
Methodology
Empirical Study
Conclusions

7 CIKM'10

27/10/2010

SLIDE 8

Methodology

We propose to study local structures for link formation in

social networks

– Introduce link formation rules (LF-rules) as special subgraph patterns – Formulate our task as a subgraph mining task in a social network, modeled as a directed, labeled, temporal graph – Devised a subgraph mining approach (introduced next) – Applied the proposed approach to two real-world datasets

27/10/2010

CIKM'10 8

SLIDE 9

Methodology Overview

– Mine LF-rules from a given social network – Apply randomization technique to the network, for estimating the expected support of LF-rules in a random graph – Evaluate interesting rules with higher-than-expected support

9 CIKM'10

27/10/2010

SLIDE 10

LF-Patterns and Rules

LF-pattern:

– a graph pattern built upon dyadic and/or triadic structures – in any actual occurrence of a LF-pattern, the link from s to e, or simply (s,e), is formed after all other links in the same pattern

27/10/2010

CIKM'10 10

SLIDE 11

LF-Patterns and Rules

LF-rule:

– generated from a LF-pattern – consists of a precondition and a postcondition

– the (s,e) link in our illustrations is always the postcondition

27/10/2010

CIKM'10 11

SLIDE 12

Mining LF-Rules

LF-patterns define the structural constraints of LF-rules

– captures the formation of a link from a node s, called the start node, to another node e, called the end node

Mining LF-rules:

– we are given a graph G, a predefined minimum frequency (support) and a predefined minimum confidence – find all LF-patterns that satisfy the frequency threshold – generate LF-rules from the frequent LF-patterns and compute their confidence values – retain those that satisfy the confidence threshold

12 CIKM'10

27/10/2010

SLIDE 13

Mining LF-Rules

Each LF-rule is associated with

– a support value: % of nodes in G that served as the node s of the rule at least once – a confidence value: the likelihood that the (s,e) link exists given that the precondition connecting s and e exists

Example:

– Support: ~24% of nodes in G served as node s of this rule – Confidence: Among the nodes that received a link from another node, ~32% of them reciprocated the link

CIKM'10 13

27/10/2010

SLIDE 14

Graph Randomization

Why?

– LF-rules may exist in the network just by chance

How?

– One possibility is graph randomization: randomize an input graph G, but preserve important nodal properties – Compute the support of LF-rules from the randomized graph, called expected support

We randomized the connectivity in G while preserving its in-

degree, out-degree, label and timestamp distributions

14 CIKM'10

27/10/2010

SLIDE 15

Measuring (Un)Expectedness

Expected Support of a rule w.r.t. G

– its support in G’

Surprise of a rule

– support divided by expected support of a rule – the higher the more “surprising” or “unexpected”

If link formation does follow some rules, we shall expect

those rules to have higher support in G than in G’

15 CIKM'10

27/10/2010

SLIDE 16

Summary of Methodology

Introduce LF-patterns and LF-rules

– capture structural and temporal constraints

Devise a subgraph mining algorithm to find and count such

patterns in a graph G

– output: a set of LF-rules R with sufficient support and confidence

Conduct graph randomization on G

– measure the expected support and surprise values of all rules in R

Present interesting rules in R with high surprise values

16 CIKM'10

27/10/2010

SLIDE 17

Outline

Introduction
Methodology
Empirical Study
Conclusions

17 CIKM'10

27/10/2010

SLIDE 18

Datasets

Epinions

– Web of Trust, with trust (+ve) and distrust (-ve) links

myGamma, courtesy of BuzzCity

– friendship network, with friends (+ve) and foe (-ve) links

Expected support computed based on 10 randomized

samples of the graphs

27/10/2010

CIKM'10 18

SLIDE 19

Interesting LF-rules in myGamma

We focus on myGamma for which the complete history and
rdering of friendship links are available
Top 5-rules in terms of support

– report the interestingness scores of them in terms of support, expected support, surprise (supp/exp. supp), and confidence

19 CIKM'10

27/10/2010

SLIDE 20

Interestingness scores

support expected support surprise (supp/exp. supp) confidence 28.91% 22.41% 1.29 43.22% 28.38% 22.37% 1.27 43.1% 25.42% 13.54% 1.88 39.15% 24.37% 1.22% 20.06 31.98% 20.55% 11.49% 1.79 27.52%

20 CIKM'10

27/10/2010

SLIDE 21

Other Observations

Users tend to rely more on mutually trusted friends in

forming new friendship links. For example,

– R12 (right) has much higher confidence (~34% vs. ~22%) and surprise values (5.32 vs. 3.52) than R11 (left)

21 CIKM'10

27/10/2010

SLIDE 22

Other Observations

In myGamma, 3.45% of users reciprocated a friend link from

another user with a foe link, but with a much lower likelihood (15.98%) as compared to reciprocal friend links (31.98%)

– probably due to “unwanted friendship” – not frequent/interesting in Epinions as “unwanted trustor” is not an issue

22 CIKM'10

27/10/2010

SLIDE 23

Other Observations

If a user has formed a link based on a given precondition

through an intermediary (e.g. common friend), then there is a good chance that s(he) has formed a link based on multiple

ccurrences of the same precondition

– 29% of users support R5 (left)

About two-third of them also support R32 (middle)
About one-third of them also support R34 (right)

23 CIKM'10

27/10/2010

SLIDE 24

Outline

Background

– our task and motivations

Methodology
Results on myGamma
Conclusions

24 CIKM'10

27/10/2010

SLIDE 25

Conclusions

We proposed the task of mining interesting link formation

rules in social network

– Introduced the notions of LF-patterns and LF-rules, in which a new link between a node pair is formed as structural effect of preexisting links – Formulated as a subgraph mining task from a directed, labeled, temporal graph

Proposed a comprehensive subgraph mining approach

– Devised a LF-rule mining algorithm based on gSpan – Presented LF-rules with higher-than-expected support

CIKM'10 25

27/10/2010

SLIDE 26

Thank You!

26 CIKM'10

27/10/2010