Mining Interesting Link Formation Rules in Social Networks Cane - - PowerPoint PPT Presentation
Mining Interesting Link Formation Rules in Social Networks Cane - - PowerPoint PPT Presentation
Mining Interesting Link Formation Rules in Social Networks Cane Wing-Ki Leung, Ee-Peng Lim, David Lo, Jianshu Weng School of Information Systems Singapore Management University Outline Introduction Methodology Empirical Study
Outline
- Introduction
- Methodology
- Empirical Study
- Conclusions
2 CIKM'10
27/10/2010
Introduction
- Propose the task of mining interesting link formation rules in
social networks
- Goal: examine how links are formed in social networks as a
structural effect
27/10/2010
CIKM'10 3
Example: Reciprocity Effect
- A simple example – reciprocity effect:
– Given is a pair of nodes, called the start node s and the end node e – Suppose we know that e trusts s at a certain time point. Questions:
- Will s also trust e later?
- How frequently/likely will this happen?
- What other connections between s and e may lead to link formation?
4 CIKM'10
27/10/2010
More on the task
- Will s trust e later?
– A temporal constraint – A partial order in which “s trusts e” is formed after all other links connecting s and e
- How frequently/likely will this happen?
– Quantifying the interestingness of the observed patterns
27/10/2010
CIKM'10 5
More on the task
- What other connections between s and e may lead to link
formation?
– Structural constraints require that s and e be connected in some way – We consider dyadic and triadic structures, aka local structures, as they have long been used in sociology for studying and predicting the dynamics of large, complex networks – Seek to mine interesting patterns that obey such constraints
27/10/2010
CIKM'10 6
Outline
- Introduction
- Methodology
- Empirical Study
- Conclusions
7 CIKM'10
27/10/2010
Methodology
- We propose to study local structures for link formation in
social networks
– Introduce link formation rules (LF-rules) as special subgraph patterns – Formulate our task as a subgraph mining task in a social network, modeled as a directed, labeled, temporal graph – Devised a subgraph mining approach (introduced next) – Applied the proposed approach to two real-world datasets
27/10/2010
CIKM'10 8
Methodology Overview
– Mine LF-rules from a given social network – Apply randomization technique to the network, for estimating the expected support of LF-rules in a random graph – Evaluate interesting rules with higher-than-expected support
9 CIKM'10
27/10/2010
LF-Patterns and Rules
- LF-pattern:
– a graph pattern built upon dyadic and/or triadic structures – in any actual occurrence of a LF-pattern, the link from s to e, or simply (s,e), is formed after all other links in the same pattern
27/10/2010
CIKM'10 10
LF-Patterns and Rules
- LF-rule:
– generated from a LF-pattern – consists of a precondition and a postcondition
– the (s,e) link in our illustrations is always the postcondition
27/10/2010
CIKM'10 11
Mining LF-Rules
- LF-patterns define the structural constraints of LF-rules
– captures the formation of a link from a node s, called the start node, to another node e, called the end node
- Mining LF-rules:
– we are given a graph G, a predefined minimum frequency (support) and a predefined minimum confidence – find all LF-patterns that satisfy the frequency threshold – generate LF-rules from the frequent LF-patterns and compute their confidence values – retain those that satisfy the confidence threshold
12 CIKM'10
27/10/2010
Mining LF-Rules
- Each LF-rule is associated with
– a support value: % of nodes in G that served as the node s of the rule at least once – a confidence value: the likelihood that the (s,e) link exists given that the precondition connecting s and e exists
- Example:
– Support: ~24% of nodes in G served as node s of this rule – Confidence: Among the nodes that received a link from another node, ~32% of them reciprocated the link
CIKM'10 13
27/10/2010
Graph Randomization
- Why?
– LF-rules may exist in the network just by chance
- How?
– One possibility is graph randomization: randomize an input graph G, but preserve important nodal properties – Compute the support of LF-rules from the randomized graph, called expected support
- We randomized the connectivity in G while preserving its in-
degree, out-degree, label and timestamp distributions
14 CIKM'10
27/10/2010
Measuring (Un)Expectedness
- Expected Support of a rule w.r.t. G
– its support in G’
- Surprise of a rule
– support divided by expected support of a rule – the higher the more “surprising” or “unexpected”
- If link formation does follow some rules, we shall expect
those rules to have higher support in G than in G’
15 CIKM'10
27/10/2010
Summary of Methodology
- Introduce LF-patterns and LF-rules
– capture structural and temporal constraints
- Devise a subgraph mining algorithm to find and count such
patterns in a graph G
– output: a set of LF-rules R with sufficient support and confidence
- Conduct graph randomization on G
– measure the expected support and surprise values of all rules in R
- Present interesting rules in R with high surprise values
16 CIKM'10
27/10/2010
Outline
- Introduction
- Methodology
- Empirical Study
- Conclusions
17 CIKM'10
27/10/2010
Datasets
- Epinions
– Web of Trust, with trust (+ve) and distrust (-ve) links
- myGamma, courtesy of BuzzCity
– friendship network, with friends (+ve) and foe (-ve) links
- Expected support computed based on 10 randomized
samples of the graphs
27/10/2010
CIKM'10 18
Interesting LF-rules in myGamma
- We focus on myGamma for which the complete history and
- rdering of friendship links are available
- Top 5-rules in terms of support
– report the interestingness scores of them in terms of support, expected support, surprise (supp/exp. supp), and confidence
19 CIKM'10
27/10/2010
Interestingness scores
support expected support surprise (supp/exp. supp) confidence 28.91% 22.41% 1.29 43.22% 28.38% 22.37% 1.27 43.1% 25.42% 13.54% 1.88 39.15% 24.37% 1.22% 20.06 31.98% 20.55% 11.49% 1.79 27.52%
20 CIKM'10
27/10/2010
Other Observations
- Users tend to rely more on mutually trusted friends in
forming new friendship links. For example,
– R12 (right) has much higher confidence (~34% vs. ~22%) and surprise values (5.32 vs. 3.52) than R11 (left)
21 CIKM'10
27/10/2010
Other Observations
- In myGamma, 3.45% of users reciprocated a friend link from
another user with a foe link, but with a much lower likelihood (15.98%) as compared to reciprocal friend links (31.98%)
– probably due to “unwanted friendship” – not frequent/interesting in Epinions as “unwanted trustor” is not an issue
22 CIKM'10
27/10/2010
Other Observations
- If a user has formed a link based on a given precondition
through an intermediary (e.g. common friend), then there is a good chance that s(he) has formed a link based on multiple
- ccurrences of the same precondition
– 29% of users support R5 (left)
- About two-third of them also support R32 (middle)
- About one-third of them also support R34 (right)
23 CIKM'10
27/10/2010
Outline
- Background
– our task and motivations
- Methodology
- Results on myGamma
- Conclusions
24 CIKM'10
27/10/2010
Conclusions
- We proposed the task of mining interesting link formation
rules in social network
– Introduced the notions of LF-patterns and LF-rules, in which a new link between a node pair is formed as structural effect of preexisting links – Formulated as a subgraph mining task from a directed, labeled, temporal graph
- Proposed a comprehensive subgraph mining approach
– Devised a LF-rule mining algorithm based on gSpan – Presented LF-rules with higher-than-expected support
CIKM'10 25
27/10/2010
Thank You!
26 CIKM'10
27/10/2010