Updates-Leak: Data Set Inference and Reconstruction Attacks in - - PowerPoint PPT Presentation

updates leak data set inference and reconstruction
SMART_READER_LITE
LIVE PREVIEW

Updates-Leak: Data Set Inference and Reconstruction Attacks in - - PowerPoint PPT Presentation

Updates-Leak: Data Set Inference and Reconstruction Attacks in Online Learning Ahmed Salem , Apratim Bhattacharya, Michael Backes Mario Fritz,Yang Zhang CISPA Helmholtz Center for Information Security, Max Planck Institute for Informatics 1


slide-1
SLIDE 1
  • 1

Updates-Leak: Data Set Inference and Reconstruction Attacks in Online Learning

Ahmed Salem, Apratim Bhattacharya, Michael Backes Mario Fritz,Yang Zhang

CISPA Helmholtz Center for Information Security, Max Planck Institute for Informatics

slide-2
SLIDE 2

Online Learning

2

Model

Train U p d a t e

  • Data generation rate
  • 90% of the data in the world

today has been created in the last two years alone

  • Cost of retraining

Training set Updating set

slide-3
SLIDE 3

35 70 0 1 2 3 4 5 6 7 8 9 35 70 0 1 2 3 4 5 6 7 8 9

Attack Surface in Online Learning

3

Target Model 25 50 0 1 2 3 4 5 6 7 8 9

Update

25 50 0 1 2 3 4 5 6 7 8 9

?

Research Question: Can this posterior difference be a new attack surface?

slide-4
SLIDE 4
  • Attacker has black-box access to the target model
  • Attacker knows:
  • Target model’s architecture
  • A shadow dataset from the same distribution of the target

model’s dataset

Threat Model

4

slide-5
SLIDE 5

Attack Model

Decoder

General Attack Pipeline

Target Model 25 50 0 1 2 3 4 5 6 7 8 9 35 70 0 1 2 3 4 5 6 7 8 9

Posterior difference

Encoder

Single-sample label Inference Single-sample reconstruction Multi-sample label distribution Multi-sample reconstruction

5

Probing set Probing set

Update

?

slide-6
SLIDE 6

Attack Model Training

Target Model Shadow Model

updating set 1 updating set n . . .

Shadow Updated Model 1 Shadow Updated Model n

Posterior difference 1 Posterior difference n . . . . . . X Y

  • Target model’s architecture
  • Shadow dataset

6

Update Update

Probing Set

slide-7
SLIDE 7

Single-sample Label Inference

7

Attack Model

Decoder Target Model 25 50 0 1 2 3 4 5 6 7 8 9 35 70 0 1 2 3 4 5 6 7 8 9

Posterior difference

Encoder Probing set Probing set

Update

?

Single-sample label Inference

It is a 0

  • More than 6x and 9x

better than baseline for MNIST and CIFAR-10

slide-8
SLIDE 8

Single-sample Reconstruction

8

Attack Model

Decoder Target Model 25 50 0 1 2 3 4 5 6 7 8 9 35 70 0 1 2 3 4 5 6 7 8 9

Posterior difference

Encoder Probing set Probing set

Update

?

Single-sample reconstruction

  • More complicated than inferring

label

  • Attacker needs a sample

generator

  • We rely on autoencoder’s

decoder

slide-9
SLIDE 9

Autoencoder

9

slide-10
SLIDE 10

Single-sample Reconstruction

Encoder Decoder Encoder Decoder Autoencoder Transfer

10

slide-11
SLIDE 11

Single-sample Reconstruction

11

0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.035

Mean squared error (MSE)

Autoencoder (Oracle) ASSR Label-random Random 0.00 0.02 0.04 0.06 0.08 0.10

Mean squared error (MSE)

Autoencoder (Oracle) ASSR Label-random Random

CIFAR-10 MNIST

slide-12
SLIDE 12

?

0 1 2 3 4 5 6 7 8 9

Multi-sample Label Estimation

12

Attack Model

Decoder Target Model 25 50 0 1 2 3 4 5 6 7 8 9 35 70 0 1 2 3 4 5 6 7 8 9

Posterior difference

Encoder Probing set Probing set

Update

Multi-sample label distribution

KL-divergence as the loss

slide-13
SLIDE 13

Multi-sample Label Estimation

MNIST (10) CIFAR-10 (10) 0.00 0.02 0.04 0.06 0.08 0.10 0.12

KL-divergence

ALDE Baseline Transfer 100-10 MNIST (100) CIFAR-10 (100) 0.00 0.01 0.02 0.03 0.04 0.05

KL-divergence

ALDE Baseline Transfer 10-100

13

slide-14
SLIDE 14

Multi-sample Reconstruction

14

Attack Model

Decoder Target Model 25 50 0 1 2 3 4 5 6 7 8 9 35 70 0 1 2 3 4 5 6 7 8 9

Posterior difference

Encoder Probing set Probing set

Update

?

Multi-sample reconstruction

  • Most challenging scenario in this

attack scenario

  • Reconstruct a set of data samples
  • Autoencoder cannot help anymore
  • What we do?
slide-15
SLIDE 15

Generative Adversarial Network (GAN)

Image credit: Thalles Silva

15

slide-16
SLIDE 16

Multi-sample Reconstruction

Encoder Generator Standard Gaussian Noise Discriminator

Best match loss

16

slide-17
SLIDE 17

17

MNIST CIFAR-10 0.00 0.01 0.02 0.03 0.04 0.05 0.06

Mean squared error (MSE)

One-to-one match AMSR Baseline

Multi-sample Reconstruction

slide-18
SLIDE 18

Multi-sample Reconstruction

18

slide-19
SLIDE 19

Multi-sample Reconstruction

19

slide-20
SLIDE 20

It is a 0

0 1 2 3 4 5 6 7 8 9

Multi-sample label distribution

Summary

20

Target Model

Attack Model

Decoder

Posterior difference

Encoder

Single-sample label Inference Single-sample reconstruction Multi-sample reconstruction

25 50 0 1 2 3 4 5 6 7 8 9 Probing set 35 70 0 1 2 3 4 5 6 7 8 9 Probing set

Update

?

Thank you for your attention! Questions?

ahmed.salem@cispa.saarland https://ahmedsalem2.github.io/ @AhmedGaSalem