[PPT] - Transferable Adversarial Examples: Insights, A9acks & Defenses PowerPoint Presentation

SLIDE 1

Transferable Adversarial Examples: Insights, A9acks & Defenses

June 12th 2017 Florian Tramèr Joint work with Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh & Patrick McDaniel

SLIDE 2

Adversarial Examples Threat Model: White-Box A9acks

2

ML Model

bird tree plane

Loss

ground truth

“Fast Gradient Sign Method” (FGSM)

Take gradient

f the loss

r = ✏ · sign (rxJ(x, y, ✓))

SLIDE 3

3

ML Model

bird tree plane

Hypothetical Attacks on Autonomous Vehicles

Denial of service

Confusing object

Harm others

Adversarial input recognized as “open space on the road”

Harm self / passengers

Adversarial input recognized as “navigable road”

+

krk∞ = ✏

“Fast Gradient Sign Method” (FGSM)

r = ✏ · sign (rxJ(x, y, ✓))

Adversarial Examples Threat Model: White-Box A9acks

SLIDE 4

4

ML Model

plane plane plane

Adversarial Examples transfer

ML Model

Adversarial Examples Threat Model: Black-Box A9acks

SLIDE 5

The Space of Transferable Adversarial Examples

5

SLIDE 6

How large is the “space” of adversarial examples?

At least 2D

– Warde-Farley & Goodfellow 2016 – Liu et al. 2017

6

Church window plots. Warde-Farley & Goodfellow 2016

SLIDE 7

Gradient-Aligned Subspaces

7

Adversarial examples

form a con)guous subspace of “high” dimensionality

– 15-45 dimensions for DNNs and CNNs on MNIST – IntersecAon of adversarial subspaces is also mul^dimensional

SLIDE 8

Decision Boundary Similarity

8

Distance to boundary Distance between boundaries

SLIDE 9

Decision Boundary Similarity

Experiments with MNIST and DREBIN (malware)

– DNN, Logis^c Regression, SVM – 3 direc^ons:

Aligned with gradient (adversarial example)
In direc^on of data point of different class
In random direc^on
Results: In any direc^on,

distance to boundary ≫ distance btw boundaries

9

Models are similar “everywhere”

SLIDE 10

Open Ques^ons

Why this similarity?

– Data dependent results? – E.g., for a binary MNIST task (3s vs 7s) we prove: – These adversarial examples also transfer to DNNs and CNNs but we can’t prove this is inherent…

10

If F1 (linear model) and F2 (quadraAc model) have high accuracy, then there are adversarial examples that transfer between the two models

SLIDE 11

Transferability and Adversarial Training

11

SLIDE 12

Adversarial Training

12

ML Model

bird

Loss ML Model

plane

Loss

take gradient (FGSM)

SLIDE 13

A9acks on Adversarial Training

13

1.0 3.6 18.2 5 10 15 20 Error Rate (%)

MNIST

22.0 26.8 36.5 5 10 15 20 25 30 35 40 Error Rate (%)

ImageNet (top1)

Adversarial examples transferred from another (standard) model

SLIDE 14

“Gradient Masking”

How to get robustness to FGSM-style a9acks?

Large Margin Classifier

Gradient Masking

14

SLIDE 15

Loss of Adversarially Trained Model

15

Data Point Move in direc^on of another model’s gradient (black-box a9ack) Adversarial Example Move in direc^on

f model’s gradient

(white-box a9ack) Non-Adversarial Example Loss

SLIDE 16

Loss of Adversarially Trained Model

16

Loss

SLIDE 17

Simple One-Shot A9ack: RAND+FGSM

17

1. Small random step
2. Step in direc^on of gradient

3.6 34.1 20 40 FGSM RAND+FGSM Error Rate (%)

MNIST

26.8 64.3 20 40 60 80 FGSM RAND+FGSM Error Rate (%)

ImageNet (top1)

SLIDE 18

FGSM vs RAND+FGSM

An improved one-shot a9ack even against

non-defended models:

≈ + 4% error on MNIST ≈ + 11% error on ImageNet

Adversarial training with RAND+FGSM

– Doesn’t work… – Are we stuck with adversarial training?

18

SLIDE 19

What’s Wrong with Adversarial Training?

Minimize

19

loss(x, y) + loss(x + ✏ · sign(grad), y)

Small if:

1. The model is actually robust
2. Or, the gradient points in a

direcAon that is not adversarial Degenerate Minimum

SLIDE 20

Ensemble Adversarial Training

How do we avoid these degenerate minima?

20

ML Model Loss

ML Model ML Model

pre-trained

SLIDE 21

Results

21

0.7 3.8 15.5 0.7 6.0 3.9 2 4 6 8 10 12 14 16 18 Clean Data White-Box FGSM A9ack Black-Box FGSM A9ack Error Rate

MNIST (standard CNN)

Adv. Training

Ensemble Adv. Training

Source model for a9ack was not used during training Less white-box FGSM samples seen during training

SLIDE 22

Results

22

22.0 26.8 36.5 23.6 30.0 30.4 20.2 25.9 24.6 5 10 15 20 25 30 35 40 Clean Data White-Box FGSM A9ack Black-Box FGSM A9ack Error Rate

ImageNet (Incep)on v3, Incep)on ResNet v2)

Adv. Training

Ensemble Adv. Training Ensemble Adv. Training (ResNet)

SLIDE 23

What about stronger a9acks?

Li9le to no improvement on white-box

itera^ve and RAND+FGSM a9acks!

But, improvements in black-box seMng!

23

15.5 15.2 13.5 9.5 3.9 7.0 6.2 2.9 0.0 10.0 20.0 FGSM Carlini-Wagner I-FGSM RAND+FGSM Error Rate

Black-Box APacks on MNIST

Adv. Training

Ensemble Adv. Training

≈ ≈

SLIDE 24

What about stronger a9acks?

24

36.5 30.8 30.4 29.9 24.6 25.0 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 FGSM RAND+FGSM Error Rate

Black-Box APacks on ImageNet

Adv. Training

Ensemble Adv. Training Ensemble Adv. Training (ResNet)

SLIDE 25

Prac^cal Considera^ons for Ensemble Adversarial Training

Pre-compute gradients for pre-trained models

– Lower per-batch cost than with adversarial training!

Randomize source model in each batch

– If num_models % num_batches = 0, we see the same adversarial examples in each epoch if we just rotate

Convergence is slower

Standard Incep^on v3: ~150 epochs Adversarial training: ~190 epochs Ensemble adversarial training: ~280 epochs

25

Maybe because the task is actually hard?...

SLIDE 26

Takeaways

Test defenses on black-box a9acks

– Dis^lla^on (Papernot et al. 2016, a9ack by Carlini et al. 2016) – Biologically Inspired Networks

(Nayebi & Ganguli 2017, a9ack by Brendel & Bethge 2017)

– Adversarial Training, and probably many others…

Ensemble Adversarial Training can improve

robustness to black-box a9acks

26

« If you don’t know where to go, just move at random. »

— Morgan Freeman — (or Dan Boneh)

SLIDE 27

Open Problems

Be9er black-box a9acks?

– Using ensemble of source models? (Lin et al. 2017) – How much does oracle access to the model help?

More efficient ensemble adversarial training?
Can we say anything formal (and useful) about

adversarial examples?

27

Transferable Adversarial Examples: Insights, A9acks & Defenses

June 12th 2017 Florian Tramèr Joint work with Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh & Patrick McDaniel

Adversarial Examples Threat Model: White-Box A9acks

Hypothetical Attacks on Autonomous Vehicles

Denial of service

Harm others

Harm self / passengers

+

krk∞ = ✏

Adversarial Examples Threat Model: White-Box A9acks

Adversarial Examples transfer

Adversarial Examples Threat Model: Black-Box A9acks

The Space of Transferable Adversarial Examples

How large is the “space” of adversarial examples?

Gradient-Aligned Subspaces

form a con)guous subspace of “high” dimensionality

Decision Boundary Similarity

Decision Boundary Similarity

Open Ques^ons

Transferability and Adversarial Training

Adversarial Training

A9acks on Adversarial Training

Gradient Masking

Loss of Adversarially Trained Model

Loss of Adversarially Trained Model

Simple One-Shot A9ack: RAND+FGSM

FGSM vs RAND+FGSM

non-defended models:

What’s Wrong with Adversarial Training?

loss(x, y) + loss(x + ✏ · sign(grad), y)

Ensemble Adversarial Training

Results

Results

What about stronger a9acks?

itera^ve and RAND+FGSM a9acks!

≈ ≈

What about stronger a9acks?

Prac^cal Considera^ons for Ensemble Adversarial Training

Takeaways

robustness to black-box a9acks

Open Problems

adversarial examples?

THANK YOU