Updates-Leak: Data Set Inference and Reconstruction Attacks in - - PowerPoint PPT Presentation

▶

Sep 19, 2022 353 likes •572 views

Updates-Leak: Data Set Inference and Reconstruction Attacks in Online Learning Ahmed Salem , Apratim Bhattacharya, Michael Backes Mario Fritz,Yang Zhang CISPA Helmholtz Center for Information Security, Max Planck Institute for Informatics 1

SLIDE 1

Updates-Leak: Data Set Inference and Reconstruction Attacks in Online Learning

Ahmed Salem, Apratim Bhattacharya, Michael Backes Mario Fritz,Yang Zhang

CISPA Helmholtz Center for Information Security, Max Planck Institute for Informatics

SLIDE 2

Online Learning

Model

Train U p d a t e

Data generation rate
90% of the data in the world

today has been created in the last two years alone

Cost of retraining

Training set Updating set

SLIDE 3

35 70 0 1 2 3 4 5 6 7 8 9 35 70 0 1 2 3 4 5 6 7 8 9

Attack Surface in Online Learning

Target Model 25 50 0 1 2 3 4 5 6 7 8 9

Update

25 50 0 1 2 3 4 5 6 7 8 9

Research Question: Can this posterior difference be a new attack surface?

SLIDE 4

Attacker has black-box access to the target model
Attacker knows:
Target model’s architecture
A shadow dataset from the same distribution of the target

model’s dataset

Threat Model

SLIDE 5

Attack Model

Decoder

General Attack Pipeline

Target Model 25 50 0 1 2 3 4 5 6 7 8 9 35 70 0 1 2 3 4 5 6 7 8 9

Posterior difference

Encoder

Single-sample label Inference Single-sample reconstruction Multi-sample label distribution Multi-sample reconstruction

Probing set Probing set

Update

SLIDE 6

Attack Model Training

Target Model Shadow Model

updating set 1 updating set n . . .

Shadow Updated Model 1 Shadow Updated Model n

Posterior difference 1 Posterior difference n . . . . . . X Y

Target model’s architecture
Shadow dataset

Update Update

Probing Set

SLIDE 7

Single-sample Label Inference

Attack Model

Decoder Target Model 25 50 0 1 2 3 4 5 6 7 8 9 35 70 0 1 2 3 4 5 6 7 8 9

Posterior difference

Encoder Probing set Probing set

Update

Single-sample label Inference

It is a 0

More than 6x and 9x

better than baseline for MNIST and CIFAR-10

SLIDE 8

Single-sample Reconstruction

Attack Model

Decoder Target Model 25 50 0 1 2 3 4 5 6 7 8 9 35 70 0 1 2 3 4 5 6 7 8 9

Posterior difference

Encoder Probing set Probing set

Update

Single-sample reconstruction

More complicated than inferring

label

Attacker needs a sample

generator

We rely on autoencoder’s

decoder

SLIDE 9

Autoencoder

SLIDE 10

Single-sample Reconstruction

Encoder Decoder Encoder Decoder Autoencoder Transfer

SLIDE 11

Single-sample Reconstruction

0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.035

Mean squared error (MSE)

Autoencoder (Oracle) ASSR Label-random Random 0.00 0.02 0.04 0.06 0.08 0.10

Mean squared error (MSE)

Autoencoder (Oracle) ASSR Label-random Random

CIFAR-10 MNIST

SLIDE 12

0 1 2 3 4 5 6 7 8 9

Multi-sample Label Estimation

Attack Model

Decoder Target Model 25 50 0 1 2 3 4 5 6 7 8 9 35 70 0 1 2 3 4 5 6 7 8 9

Posterior difference

Encoder Probing set Probing set

Update

Multi-sample label distribution

KL-divergence as the loss

SLIDE 13

Multi-sample Label Estimation

MNIST (10) CIFAR-10 (10) 0.00 0.02 0.04 0.06 0.08 0.10 0.12

KL-divergence

ALDE Baseline Transfer 100-10 MNIST (100) CIFAR-10 (100) 0.00 0.01 0.02 0.03 0.04 0.05

KL-divergence

ALDE Baseline Transfer 10-100

SLIDE 14

Multi-sample Reconstruction

Attack Model

Decoder Target Model 25 50 0 1 2 3 4 5 6 7 8 9 35 70 0 1 2 3 4 5 6 7 8 9

Posterior difference

Encoder Probing set Probing set

Update

Multi-sample reconstruction

Most challenging scenario in this

attack scenario

Reconstruct a set of data samples
Autoencoder cannot help anymore
What we do?

SLIDE 15

Generative Adversarial Network (GAN)

Image credit: Thalles Silva

SLIDE 16

Multi-sample Reconstruction

Encoder Generator Standard Gaussian Noise Discriminator

Best match loss

SLIDE 17

MNIST CIFAR-10 0.00 0.01 0.02 0.03 0.04 0.05 0.06

Mean squared error (MSE)

One-to-one match AMSR Baseline

Multi-sample Reconstruction

SLIDE 18

Multi-sample Reconstruction

SLIDE 19

Multi-sample Reconstruction

SLIDE 20

It is a 0

0 1 2 3 4 5 6 7 8 9

Multi-sample label distribution

Summary

Target Model

Attack Model

Decoder

Posterior difference

Encoder

Single-sample label Inference Single-sample reconstruction Multi-sample reconstruction

25 50 0 1 2 3 4 5 6 7 8 9 Probing set 35 70 0 1 2 3 4 5 6 7 8 9 Probing set

Update

Thank you for your attention! Questions?

ahmed.salem@cispa.saarland https://ahmedsalem2.github.io/ @AhmedGaSalem