Social/Network/Analysis mohamed.bouguessa@uqo.ca/ 1 Web/today 2 - - PowerPoint PPT Presentation

social network analysis
SMART_READER_LITE
LIVE PREVIEW

Social/Network/Analysis mohamed.bouguessa@uqo.ca/ 1 Web/today 2 - - PowerPoint PPT Presentation

Social/Network/Analysis mohamed.bouguessa@uqo.ca/ 1 Web/today 2 Web/today//Diverse/applications 3 Web/today//Millions/of/users 4 Web/today//Rich/content 5 Web/today//Highly/dynamic/ 6 Web/today//Traces/of/activity 7


slide-1
SLIDE 1

1

mohamed.bouguessa@uqo.ca/

Social/Network/Analysis

slide-2
SLIDE 2

2

Web/today

slide-3
SLIDE 3

3

Web/today/–/Diverse/applications

slide-4
SLIDE 4

4

Web/today/–/Millions/of/users

slide-5
SLIDE 5

5

Web/today/–/Rich/content

slide-6
SLIDE 6

6

Web/today/–/Highly/dynamic/

slide-7
SLIDE 7

7

Web/today/–/Traces/of/activity

slide-8
SLIDE 8

8

Web/today/–/Rich/interactions

Rich/interactions/ between/users/ and/content/

slide-9
SLIDE 9

9 9

Web/today/–/Interaction/networks

Rich interactions between users and content Modeled as interaction network

slide-10
SLIDE 10

10

We#can#all#be#connected#through#a#series#of#six#contacts# appeals#to#me.#It#makes#the#world#seem#less#brutal,#and# more#warm#and#more#friendly.##

Six/degrees/of/separation

slide-11
SLIDE 11

11

Six/degrees/of/separation

slide-12
SLIDE 12

12

Testing/the/smallGworld/hypothesis

MSN Messenger Average path length is 6.6 90% of nodes is reachable <8 steps Network of who talks to whom on MSN Messenger: 240M nodes, 1.3 billion edges

slide-13
SLIDE 13

13

Why/study/networks?

  • Build/understanding/and/theory:/

– How#users#create#content#and#interact#with#it#and# among#themselves?#

  • Build/better/onGline/applications:/

– How#to#design#better#services#and#algorithms?#

slide-14
SLIDE 14

14

  • #A#social/network/is#a#social#structure#of#people,#related#

(directly#or#indirectly)#to#each#other#through#a#common# relation#or#interest.#

  • /Social/network/analysis/(SNA)/is#the#study#of#social#

networks#to#understand#their#structure#and#behavior.#

Social/Networks/Analysis

slide-15
SLIDE 15

15

Social/Networks

  • Social#network:#relationship#among#

interacting#units.#

slide-16
SLIDE 16

16

Social/Networks

Interacting unites: Actors / nodes discrete individual, corporate, or collective social units

slide-17
SLIDE 17

17

Relational ties between actors are channels to transfer, exchange or flow of resources. Relations, linkages or ties

Social/Networks

slide-18
SLIDE 18

18

Social/Networks

  • Social#network#representation#

– Adjacency#matrix##(socioGmatrix)# – Graph#(SocioGgraph)#

! ! ! ! ! ! ! ! ! ! ! ! " # $ $ $ $ $ $ $ $ $ $ $ $ % & 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

[ ]

9 8 7 6 5 4 3 2 1

! ! ! ! ! ! ! ! ! ! ! ! " # $ $ $ $ $ $ $ $ $ $ $ $ % & 9 8 7 6 5 4 3 2 1

slide-19
SLIDE 19

19

  • #Computer#Science#has#created#the#cyber#infrastructure#for#

–#Social#Interaction# –#Knowledge#Exchange# –#Knowledge#Discovery#

  • #Ability#to#capture#

–#different#about#various#types#of#social#interactions# –#at#a#very#fine#granularity# –#with#practically#no#reporting#bias#

Data/mining/techniques/can/be/used/for/building/ descriptive/and/predictive/models/of/social/interactions/

Key/Drivers/for/CS/Research/in/SNA

slide-20
SLIDE 20

20

SNA/Techniques

Prominent/problems/

  • Social#network#extraction/construction#
  • Identifying#prominent/trusted/expert#actors#
  • Identifying#Spammers###
  • Discovering#communities#in#social#networks#
  • Link#prediction#
  • Approximating#large#social#networks#
  • Evolution#of#social#networks#
slide-21
SLIDE 21

21

  • #Mining#a#social#network#from#data#sources#
  • #Recent#research#suggest#that#there#are#three#sources#of#social#

network#data#on#the#web#

  • #Content#available#on#web#pages#(e.g.#user#homepages,#message#

threads#etc.)#

  • #User#interaction#logs#(e.g.#email#and#messenger#chat#logs)#
  • #Social#interaction#information#provided#by#users#(e.g.#social#

network#service#websites#such#as#Orkut,#Friendster#and#MySpace)#

W e b D

  • c

u m e n t s C

  • m

m u n i c a t i

  • n

L

  • g

s P r

  • f

i l e _ 3 P r

  • f

i l e _ 1 P r

  • f

i l e _ 5 P r

  • f

i l e _ 4 P r

  • f

i l e _ 2 A c t

  • r

p r

  • f

i l e s

  • n

a S

  • c

i a l N e t w

  • r

k S e r v i c e W e b D

  • c

u m e n t s C

  • m

m u n i c a t i

  • n

L

  • g

s C

  • m

m u n i c a t i

  • n

L

  • g

s P r

  • f

i l e _ 3 P r

  • f

i l e _ 1 P r

  • f

i l e _ 5 P r

  • f

i l e _ 4 P r

  • f

i l e _ 2 A c t

  • r

p r

  • f

i l e s

  • n

a S

  • c

i a l N e t w

  • r

k S e r v i c e P r

  • f

i l e _ 3 P r

  • f

i l e _ 1 P r

  • f

i l e _ 5 P r

  • f

i l e _ 4 P r

  • f

i l e _ 2 A c t

  • r

p r

  • f

i l e s

  • n

a S

  • c

i a l N e t w

  • r

k S e r v i c e

Social/Network/Extraction

slide-22
SLIDE 22

22

Social/Network/Extraction

  • Extracting#a#social#network#

– Asking#people#about#their#relations# – Tracking#their#contacts#(emails,#phone#call,# visits,#etc.)#such#as#Enron#project# – Mining#their#contextual#data#(papers,# interviews,#resumes,#news,##biographies,# citations,#references,#web#pages,#blogs,# portfolios,#etc.)#!#Learning#social#network#

slide-23
SLIDE 23

23

Learning/Social/Networks

  • Learning##social#network#from#text#

– Descriptive#vs.#Predictive#model# – We#only#predict#the#possible#relations# between#the#actors#

slide-24
SLIDE 24

24

Learning/Social/Networks

Usually,#we#can#reach#documents#by# knowing#people…#

#

slide-25
SLIDE 25

25

Learning/Social/Networks

…and#directly#or#indirectly#we#will#know#

  • ther#documents#by#(or#about)#other#people#

through#these#documents…# # #

slide-26
SLIDE 26

26

…and#very#soon#we#will#have#a#social#network# including#some#individuals#who#have#been# connected#to#each#other#via#some#similar# contents.# #

Learning/Social/Networks

slide-27
SLIDE 27

27

SNA/Techniques

Prominent/problems/

  • Social#network#extraction/construction#
  • Identifying/prominent/trusted/expert/actors/
  • Identifying#Spammers###
  • Discovering#communities#in#social#networks#
  • Link#prediction#
  • Approximating#large#social#networks#
  • Evolution#of#social#networks#
slide-28
SLIDE 28

28

Identifying/prominent/expert/actors/ in/social/networks/

Link/Analysis/Technique/

  • HITS/
  • PageRank/
slide-29
SLIDE 29

29

  • #Being#Authority#depends#upon#inGedges;#an#authority#has#a#large#

number#of#edges#pointing#towards#it.#

  • #Being#a#Hub#depends#upon#outGedges;#a#hub#links#to#a#large#

number#of#nodes.#

  • #Notice#that#the#definition#of#hubs#and#authorities#is#circular.#
  • #Nodes#can#be#both#hubs#and#authorities#at#the#same#time#

Hubs/and/Authorities

slide-30
SLIDE 30

30

=

p q

q h p a ) ( ) (

=

q p

q a p h ) ( ) (

Hubs/and/Authorities

slide-31
SLIDE 31

31

  • # The# PageRank# assumption# is# that# a# node# transfers# its#

PageRank#values#evenly#to#all#the#nodes#it#connects#to.##

  • ##A#node#has#high#rank#if#the#sum#of#the#ranks#of#its#inGlinks#is#

high.###

  • #This#covers#both#the#case#where#a#node#has#many#inGlinks#

and#that#where#a#node#has#a#few#highly#ranked#inGlinks.#

Google’s/PageRank

slide-32
SLIDE 32

32

How/is/PageRank/calculated?/

! " # $ % & + + + − = ) ( ) ( ) 1 ( ) 1 ( * ) 1 ( ) ( Tn C Tn PR T C T PR d d A PR 

C(Ti):#the#number#of#OutGlinks#of#the#page/node#Ti#

That's#the#equation#that#calculates#a#page's#PageRank.#It's#the#

  • riginal#one#that#was#published#when#PageRank#was#being#

developed,#and#it#is#probable#that#Google#uses#a#variation#of#it# but#they#aren't#telling#us#what#it#is.#It#doesn't#matter#though,#as# this#equation#is#good#enough.##

Google’s/PageRank

slide-33
SLIDE 33

33

PR(A)#=#PR(B)#=#PR(C)#=PR(D)#=1# PR(A)#>#PR(B)#>#PR(C)#>#PR(D)##

Google’s/PageRank

slide-34
SLIDE 34

34

A B C D A B C D E A B C D E A B C D E

Google’s/PageRank

slide-35
SLIDE 35

35

Yahoo!/Answers/:/Identifying/the/expert/

User_x User_y User_z User Votes

Question

Answer_1

User_1

Answer_2

User_2

Answer_n

User_n

  • #Identifying#the#true#experts#among#Yahoo#Answers#participants#
  • #Keep#track#of#users#who#consistently#provide#good#answers#for#particular#topics#
  • #Provide#incentives#for#experts#to#stay#on#Yahoo!#Answers#in#order#to#improve#

service#

slide-36
SLIDE 36

36

Yahoo!/Answers

slide-37
SLIDE 37

37 37

Question/Life/Cycle

slide-38
SLIDE 38

38

3 4 1 2 1 4 1 2 1 3 2 4 1 2 1 3 5

Users who usually only ask questions Users who usually only answer questions Users who help each other

Example#of#interactions#between#askers#and#best#answerers##

Yahoo!/Answers

How$to$es(mate$the$authority$degree$for$each$user?

slide-39
SLIDE 39

39

PageRank?

Example:#The#category#of#“Programming”/

  • #User#B#answers#user#A’s#questions,#which#are#about#Java;#
  • #User#C#answers#B’s#questions,#which#are#about#PHP;###

" #Is#it#possible#to#state#that#C#is#more#expert#than#B?##

  • #No,#because:#B#and#C#have#different#expertise.##

A B C

JAVA PHP

slide-40
SLIDE 40

40

HITS?

  • #The#HITS#algorithm#assigns#high#authority#scores#to#

nodes#1,#2,#10,#11#and#12,#but#a#nearGzero#authority#score# to#node##

slide-41
SLIDE 41

41

HITS?

  • The#HITS#algorithm#will#allocate#high#authority#scores#to#the#nodes#N9–N15,#

while#giving#zero#authority#score#to#node#N1.##

  • The#reason#for#this#is#quite#similar#to#example#1.##
  • Specifically,#the#fact#that#node#N8#points#to#many#nodes#contribute#to#increase#

its#hub#score.#Hence,#causing#the#nodes#N9–N15#to#receive#higher#authority# scores.#However,#intuition#suggests#that#node#N1#is#the#most#authoritative# since#it#represents#an#answerer#with#a#large#number#of#best#answers.#

slide-42
SLIDE 42

42

Proposed/Approach

  • #The#authority#score#of#each#user#is#simply#the#number#of#best#

answer#of#each#users#normalized#so#their#square#sum#to#1:##

∑ =

=

N i i

y

1 2

1 ) (

  • $yi#provide#a#relative#score#of#the#authority#of#each#user#in#each#

category.##

" #We#are#interested#in#all#sets#of#Ui#having#large#values#of#yi.###

slide-43
SLIDE 43

43

Authority/Score

  • #Example:#Category#of#“Engineering”#
slide-44
SLIDE 44

44

Authority/Score

slide-45
SLIDE 45

45

Automatic/Identification/of/Authorities

slide-46
SLIDE 46

46

Experiments

We#conduct#experiments#on#datasets#which#represent#users’# activities#over#one#full#year#for#six#categories:##

Category % users who ask only % users who answer only % users who ask and answer Engineering 65% 31% 4% Biology 60% 36% 4% Programming 66% 29% 5% Mathematics 64% 31% 5% Physics 60% 34% 6% Chemistry 63% 32% 5%

slide-47
SLIDE 47

47

Authoritative/Users

slide-48
SLIDE 48

48

Quality/of/Content

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Biology Chemistry Engineering Mathematics Physics Programming Category Average quality score

  • #The#identified#authoritative#users#generate#highGquality#

content#in#Yahoo!#Answers.##

  • #Askers#are#very#selective#in#choosing#the#best#answerers#
slide-49
SLIDE 49

49

SNA/Techniques

Prominent/problems/

  • Social#network#extraction/construction#
  • Identifying#prominent/trusted/expert#actors#
  • Identifying/Spammers///
  • Discovering#communities#in#social#networks#
  • Link#prediction#
  • Approximating#large#social#networks#
  • Evolution#of#social#networks#
slide-50
SLIDE 50

50

Existing/Approaches

  • Use#supervised#machine#learning#methods#to#

perform#spam#classification.#

  • 1. First,#extract#several#features#that#

characterize#the#activities#of#each#user#in#the# network.#

  • 2. Then,#a#supervised#learning#model#is#used#to#

learn#the#behavior#of#spammers#and# legitimate#users.#

  • 3. Finally,#the#trained#model#is#applied#to#filter#
  • ut#spammers#
slide-51
SLIDE 51

51

Why/new/approach?

" Existing#approaches#suffer#from#their# dependency#on#the#training#data.# " Robust#learning#approach#for#spammer# detection#often#requires#large#amounts#of# labeled#training#data.# " The#problem#here#is#that#labeled#samples# are#more#difficult,#expensive#and#time# consuming#to#obtain#than#unlabeled#ones.#

slide-52
SLIDE 52

52

Why/new/approach?

  • The#scarcity#of#trained#data#is#one#of#the#major#

challenges#facing#current#spammer#detection# methods#in#social#networks.#

  • Unlabeled#data#are#relatively#easy#to#collect#

" #There#is#good#reason#to#focus#on# unsupervised#approaches#to#explore#the# vast#amount#of#unlabeled#data#available#at# low#cost.#

slide-53
SLIDE 53

53

Proposed/Approach

  • We#investigate#the#link#structure#of#the#network#

to#discriminate#between#spammer#and# legitimate#users.#

  • 1. First,#we#estimate#a#user’s#reputation#level#

score#within#a#network#

  • 2. Then,#we#propose#a#probabilistic#approach#

based#on#the#beta#mixture#model#to#identify# spammers##

slide-54
SLIDE 54

54

Interactions/in/Social/Networks

  • Interactions#between#normal#users#share#a#

similar#pattern#such#that#when#a#user#receives#a# message#from#a#legitimate#sender,#he/she#will,# generally,#reply#to#this#message.#

  • spammers#tend#to#send#messages#in#bulk#and#

receive#few#or#no#replies#back#from#their# recipients.#

slide-55
SLIDE 55

55

Communication/Reciprocity

  • xi$communication#reciprocity#of#a#user#ui$
  • OSi$is#a#set#of#accounts#that#received#at#least#one#

message#from#a#user#ui$

  • ISi$is#a#set#of#accounts#that#sent#at#least#one#

message#to#ui.#

slide-56
SLIDE 56

56

Communication/Reciprocity

  • CR#measures#the#probability#of#a#node#receiving#a#

response#from#each#of#its#neighbors.# " Legitimate#accounts#have#a#much#higher# probability#of#being#responded#to.# " spammers#have#a#very#low#response#rate## # CR#value#close#to#zero#

slide-57
SLIDE 57

57

The/Probabilistic/Model

  • The#estimated#CR#scores#can#be#considered#as#

coming#from#several#underlying#probability# distributions.#

  • Each#distribution#is#a#component#of#the#

mixture#model#representing#users#with#close# CR#values#

  • All#the#components#are#combined#into#a#

comprehensive#model#by#a#mixture#form.#

slide-58
SLIDE 58

58

  • We#propose#to#use#the#beta#mixture#

model#due#to#its#shape#flexibility#

The/Probabilistic/Model

slide-59
SLIDE 59

59

The/Probabilistic/Model

  • Formally,#we#expect#that#the#CR#scores#follow#

a#mixture#density#of#the#form#

  • The#density#function#of#the#lth#component#is#

given#by#

slide-60
SLIDE 60

60

The/Probabilistic/Model

  • #How#to#estimate#the#parameters#of#the#beta#

components?# Maximum#likelihood#technique#

slide-61
SLIDE 61

61

The/Probabilistic/Model

  • #How#to#estimate#the#number#of#components?#
  • 1. Calculate#the#maximum#likelihood#of#the#

parameters#of#the#mixture#for#a#range#of#values#

  • f#m#(from#1#to#m_max).#
  • 2. Calculate#an#associated#criterion#and#selecting#

the#value#of#m#which#optimizes#the#criterion.## ## ##

slide-62
SLIDE 62

62

The/Probabilistic/Model

  • #How#to#estimate#the#number#of#components?#

Integrated/Classification/Likelihood/// Bayesian/Information/Criteria/(ICLGBIC)/

slide-63
SLIDE 63

63

Procedure

slide-64
SLIDE 64

64

Experiments/:/Data/Sets

  • We#employ#a#data#set#representing#the#email#

network#of#the#University#at#Rovira#i#Virgili#(URV)#in# Tarragona,#Spain# " 1,133#users#(faculty,#reserachers,#etc.)# " all#legitimate#users# " the#URV#email#network#icludes#no#spam#sender# " Transactions/from/spammers/are/therefore/ simulated/by/generating/mock/spam/accounts/ in/the/URV/email/network/data/

slide-65
SLIDE 65

65

Data/Sets

  • We#generated#six#different#sets#of#simulated#

spam#accounts#in#order#to#inject#spam#traffic#into# the#URV#data#set#of#legitimate#senders.## " Data200,#Data400,#Data600,#Data800,# Data1000,#and#Data1200,#containing#200,#400,# 600,#800,#1,000#and#1,200#spam#accounts# respectively,#

  • number#of#legitimate#senders#was#1,133#in#

each#set.#

slide-66
SLIDE 66

66

Comparison

  • We#compare#our#approach#to#that#of#MailNet#a#

supervised#learning#approach#for#detecting# spammers#on#email#social#networks.## " MailNet#first#extracts#a#number#of#features#from# each#user#in#the#email#social#network.## " Then,#support#vector#machine#learning#is#used# to#perform#spam#classification#

slide-67
SLIDE 67

67

Results

  • The#first#component#represents#spam#accounts#with#small#CR#values#

which#are#close#to#zero.##

  • The#third#component#corresponds#to#accounts#with#CR#scores#close#to#
  • ne#(users#who#get#replies#to#the#vast#majority#or#all#of#the#messages#

that#they#sent).##

  • The#second#component#corresponds#to#users#who#receive,#in#general#

replies#to#most#of#the#messages#they#send#

slide-68
SLIDE 68

68

Results

slide-69
SLIDE 69

69

Results

  • Our#unsupervised#method#performs#as#well#as#

the#SVMGbased#approach#MailNet.#

  • But#our#approach#also#has#the#ability#to#mine#

unlabeled#data#G#a#considerable#practical# advantage#for#realGworld#applications#in# which#class#labels#are#not#available.#

slide-70
SLIDE 70

70

Application/to/Yahoo!/Answers

slide-71
SLIDE 71

71 71

Question/Life/Cycle

slide-72
SLIDE 72

72

  • Yahoo!#Answers#also#has#a#social#networking#

capability#where#users#can#connect#and#make# friends#with#other#members#who#share#similar# interests.#

  • In#our#experiments,#we#analyzed#a#network#

consisting#of#167,455#Yahoo!#Answers#users# accounts,#using#our#approach#to#identify# spammers#

Yahoo!/Answers/Social/Network/

slide-73
SLIDE 73

73

Results

" #we#found#that#32,724#accounts#could#be# classified#as#spam.#

slide-74
SLIDE 74

74

Quality/of/Content

  • We#evaluated#the#quality#of#content#generated#

by#users#identified#as#spammers#and#compared# it#to#the#quality#of#content#generated#by#users# classified#as#nonGspammers.#

  • We#expect#that#normal#users#will#generate#

relatively#goodGquality#content##

slide-75
SLIDE 75

75

  • In# order# to# estimate# the# quality# of# the# content#

generated#by#the#identified#authoritative#users#we#use# the#algorithm#of#Agichtein#et#al.#(WSDM#2008).#

  • In# brief:#The# approach# combines# the# analysis# of# the#

textual#content#with#the#user#feedback#on#the#site#in#

  • rder#to#estimate#a#quality#score#for#each#question#and#

answer.##

  • The#quality#score#is#the#confidence#score#of#a#binary#

classifier#trained#on#high#and#low#quality#examples.##

  • When# the# question# or# answer# is# of# high# quality,# the#

value#of#the#quality#score#is#close#to#1##

Quality/of/Content

slide-76
SLIDE 76

76

Quality/of/Content

  • legitimate#users#generate#relatively#better##

quality#content#than#do#spammers.#

slide-77
SLIDE 77

77

SNA/Techniques

Prominent/problems/

  • Social#network#extraction/construction#
  • Identifying#prominent/trusted/expert#actors#
  • Identifying#Spammers###
  • Discovering/communities/in/social/networks/
  • Link#prediction#
  • Approximating#large#social#networks#
  • Evolution#of#social#networks#
slide-78
SLIDE 78

78

Community/Structure///// in/Social/Network

Non-Sybil Region Sybil Region

slide-79
SLIDE 79

79

Graph/Clustering

slide-80
SLIDE 80

80

Algorithms+based+on+Czekanovski5Dice+Distance

( ) ( ) ( ) ( )

2 1 2 1 2 1 2 1 ) 2 , 1 ( S S S S S S S S N N dist     + − =

Distance#between#two#nodes#

S1:#number#of#nodes#connected#to#N1#(including#N1)# S2:#number#of#nodes#connected#to#N2#(including#N2)# # Small#distance#$#High#similarity#

slide-81
SLIDE 81

81

Czekanovski5Dice+Distance

N1 N2 N3 N4 N5 N6

  • /Exemple/
  • #dist(N1,#N2)#=#?##

S1#=#{N1,#N2,#N3}# S2#=#{N2,#N1,#N3}#

( ) ( ) ( ) ( )

3 3 3 3 2 1 2 1 2 1 2 1 ) 2 , 1 ( = + − = + − = S S S S S S S S N N dist    

  • #dist(N3,#N4)#=#?##

S3#=#{N3,#N1,#N2,#N4}# S4#=#{N4,#N3,#N5,#N6}#

( ) ( ) ( ) ( )

5 . 2 6 2 6 4 3 4 3 4 3 4 3 ) 4 , 3 ( = + − = + − = S S S S S S S S N N dist    

slide-82
SLIDE 82

82

Czekanovski5Dice+Distance

82

(a)#Graph# (b)#Smilarity# Matrix# (c)#Dendogramme# (d)#Clustering#

slide-83
SLIDE 83

83

Applica9on

The#Santa#Fe#Institute#collaboration#network#

slide-84
SLIDE 84

84

Applica9on

Enron#email#network##

slide-85
SLIDE 85

85

Discovering+Knowledge5Sharing+ Communi9es+in+ + Ques9on5Answering+Forums

slide-86
SLIDE 86

86

Knowledge5Sharing+Community

  • 1. A#knowledgeGsharing#community#is#defined#by#a#

set#of#askers#and#authoritative#users.#

  • 2. Within#each#community,#askers#exhibit#more#

homogenous#behavior#in#terms#of#their# interactions#with#authoritative#users#than# elsewhere.#

  • 3. Authoritative#users#may#belong#to#more#than#one#

community.#

slide-87
SLIDE 87

87

Knowledge5Sharing+Community

Existing#graphGbased#community#detection# methods#are#not#appropriate#for#our#study.##

slide-88
SLIDE 88

88

Example

a1 : e1, e2 a2 : e1, e2 a3 : e2, e3 a4 : e2, e3 a5 : e1, e2, e3 a6 : e1, e2, e3

slide-89
SLIDE 89

89

Example

Modeling#users#interactions#as#a#graph##

slide-90
SLIDE 90

90

The+GRACLUS+Algorithm

slide-91
SLIDE 91

91

Modeling+Interac9ons+Between+Users

" #We#use#a#transactional#data#model#to#represent#the# interactions#between#askers#and#authoritative#users.#

  • #The#first#community#is#defined#by#T1,$T2,$T3#et#T4$
  • $The#second#community#is#defined#by#T5,$T6,$T7$et$T8$
slide-92
SLIDE 92

92

#Boolean#representation#of#the#interaction#between# askers#and#authoritative#users.#

Illustra9on

slide-93
SLIDE 93

93

The+TRANCLUS+Algorithm

  • #A$=$${a1,$a2,$…,$an}#a#set#of#n#askers#
  • #E$=${e1,$e2,$…,$ed}#a#set#of#d#authoritative#users#
  • #TD$={T1,$T2,$…,$Tn}#a#collection#of#n#transactions#that#

summarizes#the#interactions#of#all#askers#ai$with#the# identified#authoritative#users.#

slide-94
SLIDE 94

94

Problem+Defini9on

Given#the#set#A#of#askers#and#the#set#E#of# authoritative#users,##

  • #Construct#the#set#TD.#
  • #Partition#TD#into#a#set#of#disjoint#clusters############

###C=${C1,$C2,$…,$Cnc}## " #The#identified#clusters#represent#the#communities# we#want#to#discover.#

slide-95
SLIDE 95

95

Criterion+Func9on

( )

( )

∑ ∑

= ∈

# $ % & ' ( × =

nc s C e s s

s

e Z C e

  • cc

n n C CF

1 3 2

) ( ) , ( 1 1 ) (

( )

1 ) , ( ) ( + − = TD e

  • cc

n e Z

slide-96
SLIDE 96

96

The+TRANCLUS+Scheme

slide-97
SLIDE 97

97

Applica9on+to+Yahoo!+Answers

slide-98
SLIDE 98

98

Content+Analysis

" #The#clustered#askers#tend#to#post#questions#on#closed# related#topics##

slide-99
SLIDE 99

99

Influence/of/Social/Networks/on/Product/Recommendations/

A1 A2 . . . AN A1 A2 A3 … AN P1 P2 P3 … PM A1 A2 . . . AN

Recommendation System

Social Network Product Opinion

  • #Understanding#the#impact#of#social#networks#on#market#behavior#
  • #Improved#recommendation#systems#

Emerging+Applica9on

slide-100
SLIDE 100

100

Tha hat’s t’s all fol

  • lks!