An Evaluation of Edge Modification Techniques for Privacy-Preserving - - PowerPoint PPT Presentation

an evaluation of edge modification techniques for privacy
SMART_READER_LITE
LIVE PREVIEW

An Evaluation of Edge Modification Techniques for Privacy-Preserving - - PowerPoint PPT Presentation

Introduction Edge Modification Techniques Experimental Set Up Information loss Conclusions An Evaluation of Edge Modification Techniques for Privacy-Preserving on Graphs Jordi Casas-Roma Universitat Oberta de Catalunya Barcelona, Spain


slide-1
SLIDE 1

Introduction Edge Modification Techniques Experimental Set Up Information loss Conclusions

An Evaluation of Edge Modification Techniques for Privacy-Preserving on Graphs

Jordi Casas-Roma

Universitat Oberta de Catalunya Barcelona, Spain jcasasr@uoc.edu

MDAI 2015, Sk¨

  • vde, Sweden, September 21-23, 2015

Jordi Casas-Roma Edge Modification Techniques for Privacy-Preserving on Graphs

slide-2
SLIDE 2

Introduction Edge Modification Techniques Experimental Set Up Information loss Conclusions

Overview

1

Introduction

2

Edge Modification Techniques

3

Experimental Set Up

4

Information loss

5

Conclusions

Jordi Casas-Roma Edge Modification Techniques for Privacy-Preserving on Graphs

slide-3
SLIDE 3

Introduction Edge Modification Techniques Experimental Set Up Information loss Conclusions Motivation Definitions

Introduction

Scenario Release data to third parties Preserve the privacy of users

Jordi Casas-Roma Edge Modification Techniques for Privacy-Preserving on Graphs

slide-4
SLIDE 4

Introduction Edge Modification Techniques Experimental Set Up Information loss Conclusions Motivation Definitions

Simple Anonymization

Simple anonymization does not work! User Dan can be re-identified using his structural properties.

Figure 1 : Original network

Amy Tim Bob Lis Ann Dan Tom Eva Joe

Figure 2 : Simple anonymization

1 2 3 4 5 6 7 8 9

Figure 3 : Dan’s 1-neighbourhood

2 3 6 8 9

Figure 4 : Dan is re-identified

1 2 3 4 5 6 7 8 9 Jordi Casas-Roma Edge Modification Techniques for Privacy-Preserving on Graphs

slide-5
SLIDE 5

Introduction Edge Modification Techniques Experimental Set Up Information loss Conclusions Motivation Definitions

Anonymization methods

Goals Introduce noise to hinder the re-identification processes.

Adding/removing edges. Adding fake nodes. Grouping nodes into clusters. . . .

Preserve user’s privacy vs. Maximize data utility (minimize information loss).

Figure 5 : Dan’s 1-neighbourhood

2 3 6 8 9

Figure 6 : Noise added

1 2 3 4 5 6 7 8 9 Jordi Casas-Roma Edge Modification Techniques for Privacy-Preserving on Graphs

slide-6
SLIDE 6

Introduction Edge Modification Techniques Experimental Set Up Information loss Conclusions Motivation Definitions

Definitions

Network Let G = (V , E) be a simple, unweighed and undirected network, where V is the set of nodes and E the set of edges. We define n = |V | to denote the number of nodes and m = |E| to denote the number of edges. Perturbed graphs We designate G = (V , E) and G = ( V , E) to refer the original and the anonymous graphs, respectively.

Jordi Casas-Roma Edge Modification Techniques for Privacy-Preserving on Graphs

slide-7
SLIDE 7

Introduction Edge Modification Techniques Experimental Set Up Information loss Conclusions Graph modification techniques Edge add Edge del Edge add/del Edge switch

Graph modification techniques

Random perturbation Adding/Removing/Switching edges Trying to preserve some features (average distance, spectral properties, etc) Constrained perturbation Sequential edge modifications in order to fulfil some desired constraints Also adding new fake vertices Example, k-anonymity model Our approach Edge add Edge del Edge add/del Edge switch

Jordi Casas-Roma Edge Modification Techniques for Privacy-Preserving on Graphs

slide-8
SLIDE 8

Introduction Edge Modification Techniques Experimental Set Up Information loss Conclusions Graph modification techniques Edge add Edge del Edge add/del Edge switch

Edge add

Properties Create a new edge {vi, vj} ∈ E

  • m > m

True relationships will be preserved in perturbed data

vi vj

Jordi Casas-Roma Edge Modification Techniques for Privacy-Preserving on Graphs

slide-9
SLIDE 9

Introduction Edge Modification Techniques Experimental Set Up Information loss Conclusions Graph modification techniques Edge add Edge del Edge add/del Edge switch

Edge del

Properties Remove an existing edge {vi, vj} ∈ E

  • m < m

No fake relationships are included in the anonymous data, but several true relations are deleted from original data.

vi vj

Jordi Casas-Roma Edge Modification Techniques for Privacy-Preserving on Graphs

slide-10
SLIDE 10

Introduction Edge Modification Techniques Experimental Set Up Information loss Conclusions Graph modification techniques Edge add Edge del Edge add/del Edge switch

Edge add/del

Properties It is a combination of the previous pair methods. Delete an existing edge {vi, vj} ∈ E and add a new one {vk, vp} ∈ E Some true relations are deleted and some fake ones are created

  • m = m

All vertices involved in this operation change their degree

vi vj vk vp

Jordi Casas-Roma Edge Modification Techniques for Privacy-Preserving on Graphs

slide-11
SLIDE 11

Introduction Edge Modification Techniques Experimental Set Up Information loss Conclusions Graph modification techniques Edge add Edge del Edge add/del Edge switch

Edge switch

Properties Delete edge {vi, vj} ∈ E and creating a new edge {vi, vp} ∈ E Some true relations are removed, some fake ones are created

  • m = m

Two vertices change their degree (vj and vp) while the third one (vi) does not.

vi vj vp

Jordi Casas-Roma Edge Modification Techniques for Privacy-Preserving on Graphs

slide-12
SLIDE 12

Introduction Edge Modification Techniques Experimental Set Up Information loss Conclusions Framework Evaluated Metrics Datasets

Experimental framework

G

  • G

ǫm(G, G) Perturbation process p Metric m Metric m

Jordi Casas-Roma Edge Modification Techniques for Privacy-Preserving on Graphs

slide-13
SLIDE 13

Introduction Edge Modification Techniques Experimental Set Up Information loss Conclusions Framework Evaluated Metrics Datasets

Network metrics

Structural network metrics: Average distance (dist) Transitivity (T) Spectral network metrics The largest eigenvalue of the adjacency matrix A (λ1) We compute the error on these network metrics as follows: ǫm(G, G) = |m(G) − m( Gp)| (1)

Jordi Casas-Roma Edge Modification Techniques for Privacy-Preserving on Graphs

slide-14
SLIDE 14

Introduction Edge Modification Techniques Experimental Set Up Information loss Conclusions Framework Evaluated Metrics Datasets

Vertex metrics

Structural vertex metrics Betweenness centrality (CB) Closeness centrality (CC) Degree centrality (CD) And we compute the error on vertex metrics by: ǫm(G, G) =

  • 1

n((g1 − g1)2 + . . . + (gn − gn)2) (2)

Jordi Casas-Roma Edge Modification Techniques for Privacy-Preserving on Graphs

slide-15
SLIDE 15

Introduction Edge Modification Techniques Experimental Set Up Information loss Conclusions Framework Evaluated Metrics Datasets

Synthetic networks

Erd¨

  • s-R´

enyi Model defines a random graph as n vertices connected by m edges that are chosen randomly from the n(n − 1)/2 possible

  • edges. In our experiments, we set n=1,000 and m=5,000. This

dataset is denoted as “ER-1000”. Barab´ asi-Albert Model, also called scale-free model, is a network whose degree distribution follows a power-law. That is, for degree d, its probability density function is P(k) = d−γ. In our experiments, we set the number of vertices to be 1,000 and γ=1, i.e. linear preferential attachment. This dataset is denoted as “BA-1000”.

Dataset n m deg dist D ER-1000 1,000 4,969 9.938 3.263 5 BA-1000 1,000 4,985 9.970 2.481 4

Jordi Casas-Roma Edge Modification Techniques for Privacy-Preserving on Graphs

slide-16
SLIDE 16

Introduction Edge Modification Techniques Experimental Set Up Information loss Conclusions Framework Evaluated Metrics Datasets

Real networks

Zachary’s Karate Club is a network widely used in literature. The graph shows the relationships among 34 members of a karate club. Jazz musicians is a collaboration graph of jazz musicians and their relationship. URV email is the email communication network at the University Rovira i Virgili in Tarragona (Spain). Nodes are users and each edge represents that at least one email has been sent. Political blogosphere data (polblogs) compiles the data on the links among US political blogs.

Dataset n m deg dist D Zachary’s Karate Club 34 78 4.588 2.408 5 Jazz musicians 198 2,742 27.697 2.235 6 URV email 1,133 5,451 9.622 3.606 8 Polblogs 1,222 16,714 27.31 2.737 8

Jordi Casas-Roma Edge Modification Techniques for Privacy-Preserving on Graphs

slide-17
SLIDE 17

Introduction Edge Modification Techniques Experimental Set Up Information loss Conclusions Empirical results

Some examples

200 400 600 800 1000 5 10 15 20

(e) Deg. dist. ER-1000

5 10 15 20 25 5.5 6.0 6.5 7.0 7.5 Add Del AddDel Switch

(f) λ1 on Karate

5 10 15 20 25 1.9 2.0 2.1 2.2 2.3 Add Del AddDel Switch

(g) dist on Jazz

200 400 600 800 1000 50 150 250

(h) Deg. dist. BA-1000

5 10 15 20 25 0.00 0.10 0.20 Add Del AddDel Switch

(i) CC on URV email

5 10 15 20 25 0.14 0.16 0.18 0.20 0.22 Add Del AddDel Switch

(j) T on Polblogs

Jordi Casas-Roma Edge Modification Techniques for Privacy-Preserving on Graphs

slide-18
SLIDE 18

Introduction Edge Modification Techniques Experimental Set Up Information loss Conclusions Empirical results

Synthetic networks

Network Method dist T CB CC CD λ1 ε ER-1000 Add 0.1402 0.0012 0.0005 0.0149 0.0016 1.2454 4.407 Del 0.1833 0.0013 0.0006 0.0197 0.0016 1.2262 5.984 Add/del 0.0005 0.0002 0.0007 0.0073 0.0015 0.0122 1.077 Switch 0.0003 0.0001 0.0005 0.0055 0.0010 0.0048 0.020 BA-1000 Add 0.0118 0.0025 0.0005 0.0030 0.0016 0.6507 0.667 Del 0.1111 0.0038 0.0007 0.0315 0.0034 3.5769 6.000 Add/del 0.0902 0.0014 0.0016 0.0230 0.0034 2.9250 4.279 Switch 0.0488 0.0011 0.0005 0.0162 0.0019 1.4601 1.114

Jordi Casas-Roma Edge Modification Techniques for Privacy-Preserving on Graphs

slide-19
SLIDE 19

Introduction Edge Modification Techniques Experimental Set Up Information loss Conclusions Empirical results

Real networks

Network Method dist T CB CC CD λ1 ε Karate Add 0.1799 0.0060 0.0268 0.0428 0.0270 0.4312 2.772 Del 0.1393 0.0223 0.0204 0.0696 0.0296 0.6171 4.104 Add/del 0.0393 0.0166 0.0311 0.0404 0.0331 0.2352 2.730 Switch 0.0935 0.0291 0.0297 0.0424 0.0233 0.1056 2.365 Jazz Add 0.2290 0.0486 0.0073 0.0532 0.0199 1.9575 2.814 Del 0.0653 0.0658 0.0021 0.0940 0.0223 4.7641 3.265 Add/del 0.1888 0.1115 0.0077 0.0497 0.0179 2.9508 3.817 Switch 0.1859 0.1129 0.0068 0.0451 0.0111 2.1005 2.622 URV email Add 0.2142 0.0179 0.0011 0.0193 0.0014 0.5120 1.000 Del 0.1238 0.0208 0.0007 0.2177 0.0017 2.3656 3.309 Add/del 0.1028 0.0387 0.0013 0.1587 0.0016 1.9539 3.321 Switch 0.1319 0.0429 0.0011 0.1481 0.0010 1.3955 2.385 Polblogs Add 0.1738 0.0114 0.0013 0.1649 0.0031 1.0974 2.000 Del 0.0569 0.0280 0.0005 0.1502 0.0050 9.0615 3.258 AddDel 0.1158 0.0389 0.0015 0.1177 0.0045 7.8086 2.934 Switch 0.1620 0.0459 0.0014 0.0991 0.0025 6.1445 2.531

Jordi Casas-Roma Edge Modification Techniques for Privacy-Preserving on Graphs

slide-20
SLIDE 20

Introduction Edge Modification Techniques Experimental Set Up Information loss Conclusions Conclusions Future work The end

Conclusions

Conclusions Edge switch gets lower information loss when it is applied to networks which do not fulfil the scale-free model, i.e. ER-1000 and Jazz musicians. Edge add obtains the lowest information loss when dealing with scale-free networks, such as BA-1000, URV email and Polblogs. Edge switch better preserves the degree distribution keeping some related measures close to the original values. Edge del and Edge add/del introduce more perturbation on almost all analysed networks.

Jordi Casas-Roma Edge Modification Techniques for Privacy-Preserving on Graphs

slide-21
SLIDE 21

Introduction Edge Modification Techniques Experimental Set Up Information loss Conclusions Conclusions Future work The end

Future work

Future work Consider other generic information loss measures (diameter, clustering coefficient, etc) Add methods related to “noise node addition” Use real information loss measures (information flow, clustering or community detection, etc)

Jordi Casas-Roma Edge Modification Techniques for Privacy-Preserving on Graphs

slide-22
SLIDE 22

Introduction Edge Modification Techniques Experimental Set Up Information loss Conclusions Conclusions Future work The end

The End

Thank you!

Jordi Casas-Roma UOC jcasasr@uoc.edu

Jordi Casas-Roma Edge Modification Techniques for Privacy-Preserving on Graphs