On the Submodularity of Influence in Social Networks Elchanan - - PowerPoint PPT Presentation

on the submodularity of influence in social networks
SMART_READER_LITE
LIVE PREVIEW

On the Submodularity of Influence in Social Networks Elchanan - - PowerPoint PPT Presentation

On the Submodularity of Influence in Social Networks Elchanan Mossel & Sebastien Roch STOC07 Speaker: Xinran He Xinranhe1990@gmail.com Social Network Social network as a graph Nodes represent individuals. Edges are social


slide-1
SLIDE 1

On the Submodularity of Influence in Social Networks

Elchanan Mossel & Sebastien Roch STOC07

Speaker: Xinran He Xinranhe1990@gmail.com

slide-2
SLIDE 2

Social Network

  • Social network as a graph

–Nodes represent individuals. –Edges are social relations with different strengths:

  • Neighbors, Coworkers relation in real life
  • Virtual Friendship in Facebook
  • Follower-Followee relations in Twitter
slide-3
SLIDE 3

Diffusion In Social Network

  • The adoption of new products can propagate in the social

networkDiffusion in the social network

  • Information, rumors, innovation, ......
slide-4
SLIDE 4

Influence Maximization

  • Influence maximization: Find k people that

generates the largest influence spread (i.e. expected number of activated nodes) [KKT 2003]

slide-5
SLIDE 5

Linear Threshold Model

  • Given a social network with edge weight wuv and a set of

Initially active individuals S as seed.

  • Every individual independently chooses a threshold Θv

uniformly in [0,1].

  • At any step t later, still inactive nodes become activated

if where Nv is the set of activated direct neighbors of v.

  • The diffusion ends when no more nodes are activated.
  • The influence spread σ(S)=E[|Pend||S], is the expected

number of active nodes when the diffusion process ends.

v

N u v uv

w θ

slide-6
SLIDE 6

Linear Threshold Example

Inactive Node Active Node Threshold Active neighbors

v w

0.5 0.3 0.2 0.5 0.1 0.4 0.3 0.2 0.6 0.2

Stop!

U X

Step 0 Step 1 Step 2 Step 3

slide-7
SLIDE 7

Influence Maximization

  • Find a seed set S, |S| ≤ k, σ(S) is maximized.
  • Influence Maximization Problem is NP-hard

under linear threshold model[Kempe et.al 2003].

  • We have to solve it approximately.
  • Main tool for analysis

Theorem: The greedy algorithm is a 1-1/e approximation for maximizing monotone and submodular set functions[Nemhauser/Wolsey 1978].

slide-8
SLIDE 8

Submodular & Monotone

  • A set function f: 2VR is monotone if
  • A set function f: 2VR is submodular if

all for , ) ( ) ( V T S T f S f ⊆ ⊆ ≤

V T S T S f T S f T f S f ⊆ ∪ + ∩ ≥ + , all for ) ( ) ( ) ( ) (

slide-9
SLIDE 9

Submodularity

  • A function set f is submodular if
  • Or equivalently
  • Submodularity can be considered as

diminishing return property.

V T S T S f T S f T f S f ⊆ ∪ + ∩ ≥ + , all for , ) ( ) ( ) ( ) ( V T S S f v S f T f v T f ⊆ ⊆ − ∪ ≤ − ∪ all for , ) ( }) { ( ) ( }) { (

slide-10
SLIDE 10

Submodularity: Examples

  • Maximum coverage problem:

Given a collection of sets S={S1,…,Sm} and a number k, find , maximize σ(S’)= σ is submodular.

  • The influence spread σ under the linear

threshold model is submodular[Kempe et.al 2003].

k S S S ≤ ⊆ | ' | , '

. '

S S i

i

S

 Influence Maximization Problem under linear Threshold model can be solved approximately.

slide-11
SLIDE 11

v

N u v uv

w θ Linear Threshold Model: General Threshold Model:

General Threshold Model

fv(S) : activation function of node v over S. S is the set of already activated nodes.

  • General Threshold model is generalization of many

diffusion models:

fv(S)=

v

N u uv

w

Linear Threshold Model [KKT 2003]

− −

v

N u uv

p ) 1 ( 1

Independent Cascade Model [KKT 2003]

= r 1 i 1

  • i

i

)) S , ( p

  • (1
  • 1

ω

v

Decreasing Cascade Model [KKT 2005]

… … fv(S) ≥ θv

slide-12
SLIDE 12

General Threshold Model(2)

Conjecture: Under the general threshold model with monotone and submodular fv , σ(S) is monotone and submodular [KKT 2003].

For Linear Threshold model, the influence spread σ(S) is submodular [KKT 2003].

slide-13
SLIDE 13

Main Result

Theorem: Under the general threshold model with monotone and submodular fv , σ(S) is monotone and submodular [Mossel/Roch 2007]. Corollary: The greedy algorithm is a (1-1/e) approximation to solve the influence maximization problem under general threshold model.

slide-14
SLIDE 14

Proof: General Idea(1)

  • By coupling four diffusion process:
  • Such that

A={A0=S,A1,A2,…,Aend} B={B0=T,B1,B2,…,Bend} C={C0=S∩T,C1,C2,…,Cend} D={D0=S∪T,D1,D2,…,Dend}

t t t t t t

B A D B A C ∪ ⊆ ∩ ⊆ and

slide-15
SLIDE 15

Proof: General Idea(2)

) ( ) ( ) ( ) ( have we n, expectatio g Then takin | | | | | | | | | | | | Then and If T S T S T S D C B A B A B A B A D B A C

end end end end end end end end t t t t t t

∪ + ∩ ≥ + + ≥ ∪ + ∩ ≥ + ∪ ⊆ ∩ ⊆ σ σ σ σ

Aend Bend

slide-16
SLIDE 16
  • Couple the four processes with the same

thresholds θv.

  • Show by induction.

– Base Case: – Assume . – For a node v still inactive at step t, we have . Therefore if v is activated in step t+1 in C, it must also be activated in A.

t t t

B A C ∩ ⊆

t t t t

B C A C ⊆ ⊆ , A S T S C = ⊆ ∩ =

t t

A C ⊆ ) ( ) (

t v t v

A f C f ≤

1 1 + + ⊆

t t

A C

fv(Ct) fv(At)

slide-17
SLIDE 17

:First Attempt

  • Let’s try the same coupling method

for .

t t t

B A D ∪ ⊆ t t t

B A D ∪ ⊆

0.3 0.3 1 2 3

D

0.3 0.3 1 2 3

A

0.3 0.3 1 2 3

B

Θ3=0.5 Θ3=0.5 Θ3=0.5

slide-18
SLIDE 18

Antisense Coupling

  • Then how could we keep
  • Intuitively, using ϴ for activation of S and 1-

ϴ for activation of T will maximize their union.

t t t

B A D ∪ ⊆

?

slide-19
SLIDE 19

Piecemeal Growth

Grow S(1) Until it ends Grow S(2) Until it ends …… Grow S(k) Until it ends

Add S(1) Add S(2) Add S(k)

Lemma: The distribution over the activated node set at the end of original process with seed set S and the piecemeal growth process P(S(1),…,S(k)) is identical.

. set seed

  • f

partition a is ,..., where process, diffusion growth piecemeal the the as ) ,..., ( Define

) ( ) 1 ( ) ( ) 1 (

S S S S S P P

k k

=

slide-20
SLIDE 20

Piecemeal Growth: Proof

  • By coupling three piecemeal growth processes

T’, T, T’’ and original process S with same θ.

Grow S Grow nothing Add S at stage 1 Add nothing at stage 2 Grow S(1) Grow S(2) Add S(1) at stage 1 Add S(2) at stage 2 Grow nothing Grow S Add nothing at stage 1 Add S at stage 2 end end end end end s s s

T S S T T T T T = = = ⊆ ⊆ that so ' ' ' and ' ' '

slide-21
SLIDE 21

Need-to-know Representation(1)

  • Consider the diffusion in a different way:

Need-to-know Representation.

  • Principle of Deferred Decisions: We don’t decide all

thresholds at the beginning; instead we reveal the value of thresholds whenever needed.

  • For example: if node v is inactive at step t-1, we only

want to know whether it is activated at step t. Θv

fv(St-2) fv(St-1)

Θv

slide-22
SLIDE 22

Need-to-know Representation(2)

Lemma: The following process is equivalent to the

  • riginal one:

nothing. do we Otherwise

  • )].

( ), ( [ in uniformly pick we and activated becomes , ) ( 1 ) ( ) ( y probabilit With

  • node

inactive still each for and initialize we , 1 1 step At 2. ze 1.Initiali

1 2 v 2 2 1 1 − − − − − −

− − = − ≤ ≤ =

t v t v t v t v t v t t

S f S f v S f S f S f v S S n t S S θ

fv(St-2) fv(St-1)

) ( 1

2 −

t v S

f

) ( ) (

2 1 − −

t v t v

S f S f

slide-23
SLIDE 23

Antisense Coupling(1)

. set seed

  • f

partition a is ,..., where ) ; ,..., ( diffusion antisense the Define

) ( ) 1 ( ) ( ) 1 (

S S S T S S P P

k k

=

Grow S(1) Until it ends …… Grow S(k) Until it ends Grow T Until it ends

K stage piecemeal growth

Add T at the beginning of stage k+1

τ

Any step t in the final stage, activate nodes under the condition .

v v t v

P f P f θ

τ

− + ≥ 1 ) ( ) (

slide-24
SLIDE 24

Antisense Coupling(2)

Grow S(1) …… Grow S(k) Grow T Grow S(1) …… Grow S(k) Grow T

τ fv(Pτ) fv(Qτ) θ Θ’ fv(Pt) fv(Qt) θ Θ’v =fv(Pτ)+1- Θv

'

) (

v t v P

f θ ≥

v t v P

f θ ≥ ) (

slide-25
SLIDE 25

Antisense Coupling(3)

Grow S(1) …… Grow S(k) Grow T Grow S(1) …… Grow S(k) Grow T

Lemma: The distributions over the activated node set at the end of the piecemeal growth process P(S(1),…,S(k);T) and the antisense diffusion process Q(S(1),…,S(k);T) are identical.

slide-26
SLIDE 26
  • From Need-to-know Representation point of

view:

Antisense Coupling: Proof(1)

] 1 ), ( [ ] 1 ), ( [ in d distribute uniformly have we , at time inactive still node any For

v τ τ

θ τ Q f P f t v

v v

= =

Grow S(1) …… Grow S(k) Grow T Grow S(1) …… Grow S(k) Grow T

slide-27
SLIDE 27

Antisense Coupling: Proof(2)

  • Then for any still inactive node, we pick its Θv

uniformly in [fv(Pτ),1].

  • We define Θ’v =fv(Qτ)+1- Θv .
  • Since Θv and Θ’v have the same distribution,

the final stage in growing T in P and Q is identical.

  • Therefore Pend and Qend have the same

distribution.

slide-28
SLIDE 28

Coupling: Overview

Grow S∩T Until it ends Grow S\T Until it ends Grow nothing Grow S∩T Until it ends Grow Nothing Grow T\S Until it ends Grow S∩T Until it ends Grow S\T Until it ends Grow T\S Until it ends

stages three all in t step any for

t t t

B A D ∪ ⊆

slide-29
SLIDE 29

Coupling: First two stages

  • At=Dt for all t in the first two stages.
  • Therefore for all steps t in the first

two stages.

  • We will show for any step in final

stage.

t t t

B A D ∪ ⊆

Grow S∩T Grow Nothing Grow T\S Grow S∩B Grow S\T Grow nothing Grow S∩T Grow S\T Grow T\S

First two stages Last stage

τ

t t t

B A D ∪ ⊆

slide-30
SLIDE 30

Coupling: Antisense Coupling

  • We first prove for any step in

the final stage by induction on t.

  • Base case:
  • Because:

τ τ

B B D D

t t

\ \ ⊆

τ τ τ τ

B B D D \ \

1 1 + +

τ τ τ τ τ τ

D B S T B B S T D D ⊆ ∪ = ∪ =

+ +

) \ ( ) \ (

1 1

slide-31
SLIDE 31

Coupling: Antisense Coupling

  • Assume .
  • We need to show that .

τ τ

B B D D

t t

\ \ ⊆

τ τ

B B D D

t t

\ \

1 1 + +

) ( ) ( ) ( ) (

τ τ

D f D f B f B f

v t v v t v

− ≥ −

τ τ

B B D D

t t

\ \ ⊆

v v t v v v t v

B f B f D f D f θ θ

τ τ

− + ≥ ⇒ − + ≥ 1 ) ( ) ( 1 ) ( ) (

τ τ

B B D D

t t

\ \

1 1 + +

τ τ τ τ

D D T B B T D S B S

t t

\ , \ ' , ' , = = = =

Lemma: For any and and submodular f, we have .

' S S ⊆

' T T ⊆ ) ' ( ) ' ( ) ( ) ' ( S f T S f S f T S f − ∪ ≥ − ∪

slide-32
SLIDE 32

Coupling: Wrapup

  • Therefore we have:

proved) y (Previousl , stage final in the all for \ \

t t t t t t t t t t

B A C B A D t D A B B B D D ∩ ⊆ ∪ ⊆ = ⊆ ⊆

τ τ τ Grow S∩T Grow S\T Grow T\S Grow S∩T Grow Nothing Grow T\S Grow S∩T Grow S\T Grow nothing

First two stages Last stage

τ

slide-33
SLIDE 33

Further Generalization

  • We have defined σ(S)=E[|Pend||S].
  • We can introduce a set function ω(·) on Pend and

define the influence spread as σω(S)=E[ω(Pend)|S] instead. Theorem: Under the general threshold model with monotone and submodular fv and ω, σω(S) is monotone and submodular. [Mossel/Roch 2007]

slide-34
SLIDE 34

Further Generalization: Proof

). ( ) ( ) ( ) ( have we n, expectatio Taking ). ( ) ( ) ( ) ( ) ( ) ( Then . and Assume T S T S T S D C B A B A B A B A D B A C

end end end end end end end end end end end end end end

∪ + ∩ ≥ + + ≥ ∪ + ∩ ≥ + ∪ ⊆ ∩ ⊆

ω ω ω ω

σ σ σ σ ω ω ω ω ω ω

slide-35
SLIDE 35

Conclusion

  • General Threshold Model generalizes many

popular diffusion models.

  • Proof methodology: Coupling (piecemeal

growth & antisense coupling) Theorem: Under the general threshold model with monotone and submodular fv and ω, σω(S) is monotone and submodular. [Mossel/Roch 2007]

slide-36
SLIDE 36

Algorithm for Influence Maximization

Corollary: The greedy algorithm is a (1-1/e) approximation to solve the influence maximization problem under general threshold model.

S u S S S v S

S V v

return : 6 for end : 5 } { : 4 )) ( }) { ( ( max arg u select : 3 do k to 1 i for : 2 set empty to S initialize : 1 Greedy(k) : 1 Algorithm

\

∪ = − ∪ = =

σ σ

slide-37
SLIDE 37
  • Time complexity: O(knCm)
  • Where n=|V|, m=|E|, C the times of Monte-

Carlo simulation.

Algorithm for Influence Maximization

S u S S S v S

S V v

return : 6 for end : 5 } { : 4 )) ( }) { ( ( max arg u select : 3 do k to 1 i for : 2 set empty to S initialize : 1 Greedy(k) : 1 Algorithm

\

∪ = − ∪ = =

σ σ

slide-38
SLIDE 38

Algorithm for Influence Maximization

Name Main Idea Model Guarantee Reference CELF Lazy Forward optimization All 1-1/e Leskovec et al. 2007 CELF++ Further optimization of CELF All 1-1/e Goyal et al. 2011 PMIA Use directed tree structure IC No Chen et al. 2010 LDAG Use DAG structure LT No Chen et al. 2010 IRIE Use PageRank to initialize and update locally IC No Chen et al. 2012 CGA Use community structure IC Wang et al. 2010 MSA Simulated Annealing All No Jiang et al. 2011

θ d

e

∆ + −

1 1

1

slide-39
SLIDE 39

Open Questions

  • Different classes of activation function fv .

– Local subadditive set function  Global subadditive influence spread σ(S)?

  • Find approximation algorithm for solving the

influence maximization problem under diffusion models with non-submodular influence spread σ(S).

slide-40
SLIDE 40