Transferable Adversarial Examples: Insights, A9acks & Defenses - - PowerPoint PPT Presentation
Transferable Adversarial Examples: Insights, A9acks & Defenses - - PowerPoint PPT Presentation
Transferable Adversarial Examples: Insights, A9acks & Defenses June 12 th 2017 Florian Tramr Joint work with Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh & Patrick McDaniel Adversarial Examples Threat Model: White-Box
Adversarial Examples Threat Model: White-Box A9acks
2
ML Model
bird tree plane
Loss
ground truth
“Fast Gradient Sign Method” (FGSM)
Take gradient
- f the loss
r = ✏ · sign (rxJ(x, y, ✓))
3
ML Model
bird tree plane
Hypothetical Attacks on Autonomous Vehicles
Denial of service
Confusing object
Harm others
Adversarial input recognized as “open space on the road”
Harm self / passengers
Adversarial input recognized as “navigable road”
+
krk∞ = ✏
“Fast Gradient Sign Method” (FGSM)
r = ✏ · sign (rxJ(x, y, ✓))
Adversarial Examples Threat Model: White-Box A9acks
4
ML Model
plane plane plane
Adversarial Examples transfer
ML Model
ML Model
Adversarial Examples Threat Model: Black-Box A9acks
The Space of Transferable Adversarial Examples
5
How large is the “space” of adversarial examples?
- At least 2D
– Warde-Farley & Goodfellow 2016 – Liu et al. 2017
6
Church window plots. Warde-Farley & Goodfellow 2016
Gradient-Aligned Subspaces
7
- Adversarial examples
form a con)guous subspace of “high” dimensionality
– 15-45 dimensions for DNNs and CNNs on MNIST – IntersecAon of adversarial subspaces is also mul^dimensional
Decision Boundary Similarity
8
Distance to boundary Distance between boundaries
Decision Boundary Similarity
- Experiments with MNIST and DREBIN (malware)
– DNN, Logis^c Regression, SVM – 3 direc^ons:
- Aligned with gradient (adversarial example)
- In direc^on of data point of different class
- In random direc^on
- Results: In any direc^on,
distance to boundary ≫ distance btw boundaries
9
Models are similar “everywhere”
Open Ques^ons
- Why this similarity?
– Data dependent results? – E.g., for a binary MNIST task (3s vs 7s) we prove: – These adversarial examples also transfer to DNNs and CNNs but we can’t prove this is inherent…
10
If F1 (linear model) and F2 (quadraAc model) have high accuracy, then there are adversarial examples that transfer between the two models
Transferability and Adversarial Training
11
Adversarial Training
12
ML Model
bird
Loss ML Model
plane
Loss
take gradient (FGSM)
A9acks on Adversarial Training
13
1.0 3.6 18.2 5 10 15 20 Error Rate (%)
MNIST
22.0 26.8 36.5 5 10 15 20 25 30 35 40 Error Rate (%)
ImageNet (top1)
Adversarial examples transferred from another (standard) model
“Gradient Masking”
- How to get robustness to FGSM-style a9acks?
Large Margin Classifier
Gradient Masking
14
Loss of Adversarially Trained Model
15
Data Point Move in direc^on of another model’s gradient (black-box a9ack) Adversarial Example Move in direc^on
- f model’s gradient
(white-box a9ack) Non-Adversarial Example Loss
Loss of Adversarially Trained Model
16
Loss
Simple One-Shot A9ack: RAND+FGSM
17
- 1. Small random step
- 2. Step in direc^on of gradient
3.6 34.1 20 40 FGSM RAND+FGSM Error Rate (%)
MNIST
26.8 64.3 20 40 60 80 FGSM RAND+FGSM Error Rate (%)
ImageNet (top1)
FGSM vs RAND+FGSM
- An improved one-shot a9ack even against
non-defended models:
≈ + 4% error on MNIST ≈ + 11% error on ImageNet
- Adversarial training with RAND+FGSM
– Doesn’t work… – Are we stuck with adversarial training?
18
What’s Wrong with Adversarial Training?
- Minimize
19
loss(x, y) + loss(x + ✏ · sign(grad), y)
Small if:
- 1. The model is actually robust
- 2. Or, the gradient points in a
direcAon that is not adversarial Degenerate Minimum
Ensemble Adversarial Training
- How do we avoid these degenerate minima?
20
ML Model Loss
ML Model ML Model
pre-trained
Results
21
0.7 3.8 15.5 0.7 6.0 3.9 2 4 6 8 10 12 14 16 18 Clean Data White-Box FGSM A9ack Black-Box FGSM A9ack Error Rate
MNIST (standard CNN)
- Adv. Training
Ensemble Adv. Training
Source model for a9ack was not used during training Less white-box FGSM samples seen during training
Results
22
22.0 26.8 36.5 23.6 30.0 30.4 20.2 25.9 24.6 5 10 15 20 25 30 35 40 Clean Data White-Box FGSM A9ack Black-Box FGSM A9ack Error Rate
ImageNet (Incep)on v3, Incep)on ResNet v2)
- Adv. Training
Ensemble Adv. Training Ensemble Adv. Training (ResNet)
What about stronger a9acks?
- Li9le to no improvement on white-box
itera^ve and RAND+FGSM a9acks!
- But, improvements in black-box seMng!
23
15.5 15.2 13.5 9.5 3.9 7.0 6.2 2.9 0.0 10.0 20.0 FGSM Carlini-Wagner I-FGSM RAND+FGSM Error Rate
Black-Box APacks on MNIST
- Adv. Training
Ensemble Adv. Training
≈ ≈
What about stronger a9acks?
24
36.5 30.8 30.4 29.9 24.6 25.0 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 FGSM RAND+FGSM Error Rate
Black-Box APacks on ImageNet
- Adv. Training
Ensemble Adv. Training Ensemble Adv. Training (ResNet)
Prac^cal Considera^ons for Ensemble Adversarial Training
- Pre-compute gradients for pre-trained models
– Lower per-batch cost than with adversarial training!
- Randomize source model in each batch
– If num_models % num_batches = 0, we see the same adversarial examples in each epoch if we just rotate
- Convergence is slower
Standard Incep^on v3: ~150 epochs Adversarial training: ~190 epochs Ensemble adversarial training: ~280 epochs
25
Maybe because the task is actually hard?...
Takeaways
- Test defenses on black-box a9acks
– Dis^lla^on (Papernot et al. 2016, a9ack by Carlini et al. 2016) – Biologically Inspired Networks
(Nayebi & Ganguli 2017, a9ack by Brendel & Bethge 2017)
– Adversarial Training, and probably many others…
- Ensemble Adversarial Training can improve
robustness to black-box a9acks
26
« If you don’t know where to go, just move at random. »
— Morgan Freeman — (or Dan Boneh)
Open Problems
- Be9er black-box a9acks?
– Using ensemble of source models? (Lin et al. 2017) – How much does oracle access to the model help?
- More efficient ensemble adversarial training?
- Can we say anything formal (and useful) about
adversarial examples?
27