S YBIL F USE : Combining Local Attributes with Global Structure to - - PowerPoint PPT Presentation

s ybil f use combining local attributes with global
SMART_READER_LITE
LIVE PREVIEW

S YBIL F USE : Combining Local Attributes with Global Structure to - - PowerPoint PPT Presentation

S YBIL F USE : Combining Local Attributes with Global Structure to Perform Robust Sybil Detection Peng Gao 1 Binghui Wang 2 Neil Zhenqiang Gong 2 Sanjeev R. Kulkarni 1 Kurt Thomas 3 Prateek Mittal 1 1 Princeton University 2 Iowa State University 3


slide-1
SLIDE 1

SYBILFUSE: Combining Local Attributes with Global Structure to Perform Robust Sybil Detection

Peng Gao 1 Binghui Wang 2 Neil Zhenqiang Gong 2 Sanjeev R. Kulkarni 1 Kurt Thomas 3 Prateek Mittal 1

1Princeton University 2Iowa State University 3Google

slide-2
SLIDE 2

Outline

1

Introduction to Sybil Attack

2

Background and Related Work

3

The SYBILFUSE Framework

4

Evaluation on Labeled Twitter Networks

5

Conclusion

Peng Gao SYBILFUSE 2 / 45

slide-3
SLIDE 3

Outline

1

Introduction to Sybil Attack

2

Background and Related Work

3

The SYBILFUSE Framework

4

Evaluation on Labeled Twitter Networks

5

Conclusion

Peng Gao SYBILFUSE 3 / 45

slide-4
SLIDE 4

Sybil Attack: Introduction

Sybil Attack: A single adversary injects multiple colluding identities in the system to compromise security and privacy.

Peng Gao SYBILFUSE 4 / 45

slide-5
SLIDE 5

Sybil Attack: Introduction

Sybil Attack: A single adversary injects multiple colluding identities in the system to compromise security and privacy.

Peng Gao SYBILFUSE 5 / 45

slide-6
SLIDE 6

Sybil Attack: Introduction

Sybil Attack: A single adversary injects multiple colluding identities in the system to compromise security and privacy.

Peng Gao SYBILFUSE 6 / 45

slide-7
SLIDE 7

Sybil Attack: Impact

Sybil Attack

Fake news Fake reviews Malware Spam messages Scams Unsolicited friend requests Others Private data

Peng Gao SYBILFUSE 7 / 45

slide-8
SLIDE 8

Sybil Attack: Network Model

Benign Region Sybil Region Attack Edges

Peng Gao SYBILFUSE 8 / 45

slide-9
SLIDE 9

Outline

1

Introduction to Sybil Attack

2

Background and Related Work

3

The SYBILFUSE Framework

4

Evaluation on Labeled Twitter Networks

5

Conclusion

Peng Gao SYBILFUSE 9 / 45

slide-10
SLIDE 10

Local Attributes-Based Approaches

  • Blacklisting [Ramachandran et al. CCS’07]
  • Whitelisting [Yardi et al. Firsy Monday Vol15(1)’10]
  • URL filtering [Thomas et al. IEEE S&P’11]
  • Local structural features [Yang et al. IMC’11]

Peng Gao SYBILFUSE 10 / 45

slide-11
SLIDE 11

Local Attributes-Based Approaches

  • Blacklisting [Ramachandran et al. CCS’07]
  • Whitelisting [Yardi et al. Firsy Monday Vol15(1)’10]
  • URL filtering [Thomas et al. IEEE S&P’11]
  • Local structural features [Yang et al. IMC’11]

Limitations:

  • Sybils can mimic the behaviors of benign users by manipulating

their profiles and connections.

Peng Gao SYBILFUSE 11 / 45

slide-12
SLIDE 12

Global Structure-Based Approaches

  • SybilGuard [Yu et al. SIGCOMM’06]
  • SybilLimit [Yu et al. IEEE S&P’08]
  • SybilInfer [Danezis et al. NDSS’09]
  • SybilRank [Cao et al. NSDI’12]
  • CIA [Yang et al. WWW’12]
  • SybilBelief [Gong et al. TIFS’13]
  • ´

Integro [Boshmaf et al. NDSS’15]

  • SybilSCAR [Wang et al. INFOCOM’17]

Peng Gao SYBILFUSE 12 / 45

slide-13
SLIDE 13

Global Structure-Based Approaches

  • SybilGuard [Yu et al. SIGCOMM’06]
  • SybilLimit [Yu et al. IEEE S&P’08]
  • SybilInfer [Danezis et al. NDSS’09]
  • SybilRank [Cao et al. NSDI’12]
  • CIA [Yang et al. WWW’12]
  • SybilBelief [Gong et al. TIFS’13]
  • ´

Integro [Boshmaf et al. NDSS’15]

  • SybilSCAR [Wang et al. INFOCOM’17]

Limitations:

  • Strong-trust assumptions: limited number of attack edges

Peng Gao SYBILFUSE 13 / 45

slide-14
SLIDE 14

Global Structure-Based Approaches

  • SybilGuard [Yu et al. SIGCOMM’06]
  • SybilLimit [Yu et al. IEEE S&P’08]
  • SybilInfer [Danezis et al. NDSS’09]
  • SybilRank [Cao et al. NSDI’12]
  • CIA [Yang et al. WWW’12]
  • SybilBelief [Gong et al. TIFS’13]
  • ´

Integro [Boshmaf et al. NDSS’15]

  • SybilSCAR [Wang et al. INFOCOM’17]

Limitations:

  • Strong-trust assumptions: limited number of attack edges
  • RenRen network does not follow [Yang et al. IMC’11]
  • Link farming on Twitter [Ghosh et al. WWW’12]

Peng Gao SYBILFUSE 14 / 45

slide-15
SLIDE 15

Global Structure-Based Approaches

  • SybilGuard [Yu et al. SIGCOMM’06]
  • SybilLimit [Yu et al. IEEE S&P’08]
  • SybilInfer [Danezis et al. NDSS’09]
  • SybilRank [Cao et al. NSDI’12]
  • CIA [Yang et al. WWW’12]
  • SybilBelief [Gong et al. TIFS’13]
  • ´

Integro [Boshmaf et al. NDSS’15]

  • SybilSCAR [Wang et al. INFOCOM’17]

Limitations:

  • Strong-trust assumptions: limited number of attack edges
  • RenRen network does not follow [Yang et al. IMC’11]
  • Link farming on Twitter [Ghosh et al. WWW’12]
  • ´

Integro requires the number of victims to be small and the victims are accurately predicted.

Peng Gao SYBILFUSE 15 / 45

slide-16
SLIDE 16

Outline

1

Introduction to Sybil Attack

2

Background and Related Work

3

The SYBILFUSE Framework

4

Evaluation on Labeled Twitter Networks

5

Conclusion

Peng Gao SYBILFUSE 16 / 45

slide-17
SLIDE 17

Framework Overview

SybilFuse Framework

Structural Attributes Local Attributes Content Attributes Known Labels Social Network Data Input Output Predicted Labels Node Ranking Directed/Undirected Graph Global Structure Weighted Random Walk Weighted Loopy Belief Propagation Trust Score Propagation

local trust scores

Local Classifiers

final scores

Peng Gao SYBILFUSE 17 / 45

slide-18
SLIDE 18

Local Trust Score Computation

Sv for node v: probability that v is benign

  • Computed via training a node classifier using local node attributes

(e.g., degree, local clustering coefficient, profile info)

  • Normalize to [0.1, 0.9]

Peng Gao SYBILFUSE 18 / 45

slide-19
SLIDE 19

Local Trust Score Computation

Sv for node v: probability that v is benign

  • Computed via training a node classifier using local node attributes

(e.g., degree, local clustering coefficient, profile info)

  • Normalize to [0.1, 0.9]

Su,v for edge (u, v): probability that u and v take the same label (i.e., models homophily strength)

  • Computed via training an edge classifier
  • Similarity between node u and node v
  • Normalize to [0.1, 0.9]

Peng Gao SYBILFUSE 19 / 45

slide-20
SLIDE 20

Trust Score Propagation: Weighted Random Walk

Set the initial score of every node v: S(0)(v) =      0.9 v is a training benign node 0.1 v is a training Sybil node Sv else

Peng Gao SYBILFUSE 20 / 45

slide-21
SLIDE 21

Trust Score Propagation: Weighted Random Walk

Set the initial score of every node v: S(0)(v) =      0.9 v is a training benign node 0.1 v is a training Sybil node Sv else Score update equation: S(i)(v) =

  • (u,v)∈E

S(i−1)(u) Su,v

  • (u,w)∈E Su,w

Peng Gao SYBILFUSE 21 / 45

slide-22
SLIDE 22

Trust Score Propagation: Weighted Random Walk

Set the initial score of every node v: S(0)(v) =      0.9 v is a training benign node 0.1 v is a training Sybil node Sv else Score update equation: S(i)(v) =

  • (u,v)∈E

S(i−1)(u) Su,v

  • (u,w)∈E Su,w

After d = O(log n) iterations, we obtain the final score SF

v :

SF

v = S(d)(v)

Peng Gao SYBILFUSE 22 / 45

slide-23
SLIDE 23

Trust Score Propagation: Weighted LBP

Node & edge potentials: Xv ∈ {1, −1} represents the label of node v ψv(Xv) =

  • Sv

if Xv = 1 1 − Sv if Xv = −1 ψu,v(Xu, Xv) =

  • Su,v

if XuXv = 1 1 − Su,v if XuXv = −1 (G, Ψ) defines a pairwise Markov Random Field.

Peng Gao SYBILFUSE 23 / 45

slide-24
SLIDE 24

Trust Score Propagation: Weighted LBP

Belief update equation: mu→v(Xv) =

  • Xu

 ψu(Xu)ψu,v(Xu, Xv)

  • s∈Neighbors(u)\v

ms→u(Xs)  

Peng Gao SYBILFUSE 24 / 45

slide-25
SLIDE 25

Trust Score Propagation: Weighted LBP

Belief update equation: mu→v(Xv) =

  • Xu

 ψu(Xu)ψu,v(Xu, Xv)

  • s∈Neighbors(u)\v

ms→u(Xs)   After d = 5 ∼ 10 iterations, we obtain the final score SF

v :

belv(Xv = xv) ∝ ψv(Xv = xv)

  • u∈Neighbors(v)

mu→v(Xv = xv) SF

v =

belv(Xv = 1) belv(Xv = 1) + belv(Xv = −1)

Peng Gao SYBILFUSE 25 / 45

slide-26
SLIDE 26

Sybil Account Prediction and Ranking

Label Lv of node v is predicted as: Lv = sign(SF

v − threshold)

We can also rank nodes according to SF

v . Sybil nodes with low scores

will be ranked upfront.

Peng Gao SYBILFUSE 26 / 45

slide-27
SLIDE 27

Outline

1

Introduction to Sybil Attack

2

Background and Related Work

3

The SYBILFUSE Framework

4

Evaluation on Labeled Twitter Networks

5

Conclusion

Peng Gao SYBILFUSE 27 / 45

slide-28
SLIDE 28

Small Twitter Network: Measurement

  • 8,167 nodes (7,358 benign nodes & 809 Sybil nodes) and 54,146

edges (40,001 attack edges)

Peng Gao SYBILFUSE 28 / 45

slide-29
SLIDE 29

Small Twitter Network: Measurement

  • 8,167 nodes (7,358 benign nodes & 809 Sybil nodes) and 54,146

edges (40,001 attack edges) We have the following observations:

  • More than half (53.4%) of Sybils are isolated, i.e., only connect to

benign nodes.

Peng Gao SYBILFUSE 29 / 45

slide-30
SLIDE 30

Small Twitter Network: Measurement

  • 8,167 nodes (7,358 benign nodes & 809 Sybil nodes) and 54,146

edges (40,001 attack edges) We have the following observations:

  • More than half (53.4%) of Sybils are isolated, i.e., only connect to

benign nodes.

  • The number of attack edges is large, with 49 attack edges on

average per Sybil.

Peng Gao SYBILFUSE 30 / 45

slide-31
SLIDE 31

Small Twitter Network: Measurement

  • 8,167 nodes (7,358 benign nodes & 809 Sybil nodes) and 54,146

edges (40,001 attack edges) We have the following observations:

  • More than half (53.4%) of Sybils are isolated, i.e., only connect to

benign nodes.

  • The number of attack edges is large, with 49 attack edges on

average per Sybil.

  • More than 75% of benign nodes are victims.

Peng Gao SYBILFUSE 31 / 45

slide-32
SLIDE 32

Small Twitter Network: Measurement

  • 8,167 nodes (7,358 benign nodes & 809 Sybil nodes) and 54,146

edges (40,001 attack edges) We have the following observations:

  • More than half (53.4%) of Sybils are isolated, i.e., only connect to

benign nodes.

  • The number of attack edges is large, with 49 attack edges on

average per Sybil.

  • More than 75% of benign nodes are victims.

Thus, the benign region and the Sybil region can hardly be viewed as separate communities.

Peng Gao SYBILFUSE 32 / 45

slide-33
SLIDE 33

Small Twitter Network: Local Node Trust Scores

  • Incoming requests accepted ratio: Reqin(v) = |In(v)∩Out(v)|

|In(v)|

  • Outgoing requests accepted ratio: Reqout(v) = |In(v)∩Out(v)|

|Out(v)|

  • Local clustering coefficient: CC(v) = |{(i,j):i,j∈Nei(v),(i,j)∈E}|

|Nei(v)|(|Nei(v)|−1)

)

Peng Gao SYBILFUSE 33 / 45

slide-34
SLIDE 34

Small Twitter Network: Local Node Trust Scores

  • Incoming requests accepted ratio: Reqin(v) = |In(v)∩Out(v)|

|In(v)|

  • Outgoing requests accepted ratio: Reqout(v) = |In(v)∩Out(v)|

|Out(v)|

  • Local clustering coefficient: CC(v) = |{(i,j):i,j∈Nei(v),(i,j)∈E}|

|Nei(v)|(|Nei(v)|−1)

) We randomly sample 50 benign nodes and 50 Sybil nodes as the training set, and train a SVM classifier with RBF kernel using LIBSVM.

Peng Gao SYBILFUSE 34 / 45

slide-35
SLIDE 35

Small Twitter Network: Sybil Ranking Performance

100 200 300 400 500 Top K nodes 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Fraction of Sybil nodes

RG SVM SR CIA INT INT-PF SF-RW

(a) Random walk-based approaches

100 200 300 400 500 Top K nodes 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Fraction of Sybil nodes

EnC-SR EnC-CIA EnC-SB EnC-SS SB SS SF-LBP

(b) LBP-based approaches and ensemble methods

Peng Gao SYBILFUSE 35 / 45

slide-36
SLIDE 36

Large Twitter Network: Measurement

  • 21,297,772 nodes and 265,025,545 edges (18,414,469 attack

edges)

  • 145,156 (0.7%) suspended nodes
  • 1,911,482 (9.0%) deleted nodes
  • The rest were active

Peng Gao SYBILFUSE 36 / 45

slide-37
SLIDE 37

Large Twitter Network: Measurement

  • 21,297,772 nodes and 265,025,545 edges (18,414,469 attack

edges)

  • 145,156 (0.7%) suspended nodes
  • 1,911,482 (9.0%) deleted nodes
  • The rest were active

We have the following observations:

  • Half of Sybils are isolated.
  • The number of attack edges is large (127 attack edges on average

per Sybil).

Peng Gao SYBILFUSE 37 / 45

slide-38
SLIDE 38

Large Twitter Network: Node Feature Distribution

We use the same set of features: Reqin(v), Reqout(v), CC(v).

(a) Scatter plot

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Clustering coefficient CDF

Empirical CDF

Benign nodes Sybil nodes

(b) CDF

Peng Gao SYBILFUSE 38 / 45

slide-39
SLIDE 39

Large Twitter Network: Node Feature Distribution

We use the same set of features: Reqin(v), Reqout(v), CC(v).

(a) Scatter plot

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Clustering coefficient CDF

Empirical CDF

Benign nodes Sybil nodes

(b) CDF

We randomly sample 3000 benign nodes and 3000 Sybil nodes as the training set, and train a SVM classifier with RBF kernel using LIBSVM.

Peng Gao SYBILFUSE 39 / 45

slide-40
SLIDE 40

Large Twitter Network: Evaluation

AUC

SR CIA INT INT-PF SB SS SF-RW SF-LBP 0.57 0.80 0.48 0.54 0.74 0.74 0.81 0.85

Peng Gao SYBILFUSE 40 / 45

slide-41
SLIDE 41

Large Twitter Network: Evaluation

AUC

SR CIA INT INT-PF SB SS SF-RW SF-LBP 0.57 0.80 0.48 0.54 0.74 0.74 0.81 0.85

Sybil ranking

1K 10K 50K 100K 1M 10M Top K nodes 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Fraction of Sybil nodes

SR CIA INT INT-PF SB SS SF-RW SF-LBP Peng Gao SYBILFUSE 41 / 45

slide-42
SLIDE 42

Outline

1

Introduction to Sybil Attack

2

Background and Related Work

3

The SYBILFUSE Framework

4

Evaluation on Labeled Twitter Networks

5

Conclusion

Peng Gao SYBILFUSE 42 / 45

slide-43
SLIDE 43

Conclusion

  • We proposed SYBILFUSE, a general framework that combines

local attributes with global structure.

Peng Gao SYBILFUSE 43 / 45

slide-44
SLIDE 44

Conclusion

  • We proposed SYBILFUSE, a general framework that combines

local attributes with global structure.

  • We evaluated SYBILFUSE on synthetic and real-world social

networks, and demonstrated that SYBILFUSE outperforms existing approaches.

Peng Gao SYBILFUSE 44 / 45

slide-45
SLIDE 45

Thank you! Q&A