Efficient Defenses Against Adversarial Examples for Deep Neural - - PowerPoint PPT Presentation

efficient defenses against adversarial examples for deep
SMART_READER_LITE
LIVE PREVIEW

Efficient Defenses Against Adversarial Examples for Deep Neural - - PowerPoint PPT Presentation

Efficient Defenses Against Adversarial Examples for Deep Neural Networks Valentina Zantedeschi Irina Nicolae Ambrish Rawat @vzantedesc @ririnicolae @ambrishrawat Jean Monnet University IBM Research AI GreHack #5 November 17, 2017


slide-1
SLIDE 1

Efficient Defenses Against Adversarial Examples for Deep Neural Networks

Valentina Zantedeschi @vzantedesc Irina Nicolae @ririnicolae Ambrish Rawat @ambrishrawat

Jean Monnet University IBM Research AI

GreHack #5

November 17, 2017

slide-2
SLIDE 2

Security and Machine Learning

So far...

  • Machine learning for security
  • Intrusion detection1
  • Malware analysis2

This talk is about

  • Security for machine learning

1Buczak & Guven, A Survey of Data Mining and Machine Learning Methods for

Cyber Security Intrusion Detection. IEEE Comunications Surveys & Tutorials, 2015.

2Gandotra et al., Malware Analysis and Classification: A Survey, Journal of

Information Security, 5, 56–64, 2014.

2 / 37

slide-3
SLIDE 3

Machine Learning and Adversarial Examples

3 / 37

slide-4
SLIDE 4

Machine Learning

Training

Prediction Model Expected Outputs

e.g. class id

Inputs

e.g picture

Training

Prediction

Prediction Model

Bird

4 / 37

slide-5
SLIDE 5

Adversarial Examples

+ =

giant panda capuchin

84% confidence 67% confidence

adversarial noise

  • Perturb model inputs with crafted noise
  • Model fails to recognize input correctly
  • Attack undetectable by humans
  • Random noise does not work.

5 / 37

slide-6
SLIDE 6

Practical Examples of Attacks

6 / 37

slide-7
SLIDE 7

Self-Driving Cars

Image segmentation3

Attack noise hides pedestrians from the detection system.

3Metzen et al., Universal Adversarial Perturbations Against Semantic Image

  • Segmentation. https://arxiv.org/abs/1704.05712.

7 / 37

slide-8
SLIDE 8

Self-Driving Cars

Road signs4

Car ends up ignoring the stop sign. True image Adversarial image

4McDaniel et al., Machine Learning in Adversarial Settings. IEEE Security

and Privacy, vol. 14, pp. 68-72, 2016.

8 / 37

slide-9
SLIDE 9

Executing Voice Commands

Okay Google, text John!5

  • Stealthy voice commands recognized by devices
  • Humans cannot detect it.

5Zhang et al., DolphinAttack: Inaudible Voice Commands, ACM CCS 2017. 9 / 37

slide-10
SLIDE 10

Deep Learning and Adversarial Samples

10 / 37

slide-11
SLIDE 11

Deep Neural Networks

Deep Magic Box

Output

e.g. class id

Input

e.g picture

11 / 37

slide-12
SLIDE 12

Deep Neural Networks

...

Output

e.g. class id

Input

e.g picture

  • Interconnected layers propagate the information forward.
  • Model learns weights for each neuron.

12 / 37

slide-13
SLIDE 13

Deep Neural Networks

Giant Panda

84% confidence

...

true example

  • Specific neurons light-up depending on the input.
  • Cumulative effect of activation moves forward in the layers.

13 / 37

slide-14
SLIDE 14

Deep Neural Networks

...

Capuchin

67% confidence

adversarial

Small variations in the input → important changes in the output. + Enhanced discriminative capacities – Opens the door to adversarial examples

14 / 37

slide-15
SLIDE 15

Decision Boundary of the Model

The learned model slightly differs from the true data distribution...

15 / 37

slide-16
SLIDE 16

The Space of Adversarial Examples

... which makes room for adversarial examples.

16 / 37

slide-17
SLIDE 17

Attack: Use the Adversarial Directions

  • Most attacks try to move inputs across the boundary.
  • Attacking with a random distortion doesn’t work well in

practice.

17 / 37

slide-18
SLIDE 18

Finding Adversarial Examples

Given x, find x′ where

  • x and x′ are close
  • output(x) = output(x′)

Approximations of the original problem FGSM [1] quick, rough, fixed budget Random + FGSM [2] random step, then FGSM DeepFool [3] find minimal perturbations JSMA [4] modify most salient pixels C&W [5] strongest to date

18 / 37

slide-19
SLIDE 19

Defense: Adversarial Training

  • Adapt the classifier to attack directions by including

adversarial data at training.

19 / 37

slide-20
SLIDE 20

Defense: Adversarial Training

  • Adapt the classifier to attack directions by including

adversarial data at training.

  • But there are always new adversarial samples to be crafted.

20 / 37

slide-21
SLIDE 21

Defenses

Type Description AT data augmentation train also with adv. examples VAT data augmentation train also with virtual adv. examples FS preprocessing squeeze input domain LS preprocessing smooth target outputs

  • Adversarial Training (AT) [1]
  • Virtual Adversarial Training (VAT) [6]
  • Feature Squeezing (FS) [7]
  • Label Smoothing (LS) [8]

21 / 37

slide-22
SLIDE 22

Contribution: Effective Defenses Against Adversarial Samples

22 / 37

slide-23
SLIDE 23

Gaussian Data Augmentation (GDA)

Gaussian noise does not work for attacks, but does it work as a defense?

  • Reinforce neighborhoods around points using random noise.
  • For each input image, generate N versions by adding Gaussian

noise to the pixels.

  • Train the model on the original data and the noisy inputs.

23 / 37

slide-24
SLIDE 24

Bounding the Activation Function

Objective Limit the cumulative effect of errors in the layers. RELU f (x) =

  • 0,

x < 0 x, x ≥ 0.

24 / 37

slide-25
SLIDE 25

Bounding the Activation Function

Objective Limit the cumulative effect of errors in the layers. RELU f (x) =

  • 0,

x < 0 x, x ≥ 0. Bounded RELU ft(x) =      0, x < 0 x, 0 ≤ x < t t, x ≥ t.

25 / 37

slide-26
SLIDE 26

Comparison with Other Defenses

Defense Training Prediction Feature Squeezing

  • preproc. input
  • preproc. input, perf. loss

Label Smoothing

  • preproc. output
  • Adversarial Training

train + attack + retrain

  • GDA + BRELU

add noise

  • Advantages of GDA + BRELU
  • Defense agnostic to attack strategy
  • Model performance for original inputs is conserved
  • Performs better than other defenses on adversarial samples
  • Almost no overhead for training and prediction.

26 / 37

slide-27
SLIDE 27

Experiments

27 / 37

slide-28
SLIDE 28

Setup

  • MNIST dataset of handwritten digits
  • 60,000 training + 10,000 test images
  • CIFAR-10 dataset of 32 × 32 RGB images
  • 50,000 training + 10,000 test images
  • 10 categories
  • Convolutional neural net (CNN)

architecture

28 / 37

slide-29
SLIDE 29

Setup

Threat model

  • Black-box: attacker has access to inputs and outputs
  • White-box: attacker also has access to model parameters

Steps

  • Train model with different defenses
  • Generate attack images
  • Compute defense performance on attack images

29 / 37

slide-30
SLIDE 30

Minimal Perturbation

Amount of perturbation necessary to fool the model

FGSM DeepFool JSMA

Without defenses With our defenses

With GDA + BRELU, the perturbation necessary for an attack becomes visually detectable.

30 / 37

slide-31
SLIDE 31

White-Box Attacks

Comparison of different defenses against white-box attacks

(a) FGSM attack (b) Random + FGSM attack

CIFAR-10 Accuracy = % of correct predictions = TP + TN

31 / 37

slide-32
SLIDE 32

Black-Box Attacks

Comparison of different defenses against black-box attacks

❳❳❳❳❳❳❳❳ ❳

Defense Attack FGSM Rand+FGSM DeepFool JSMA C&W CNN 94.46 40.70 92.95 97.95 93.10 Feature squeezing 96.31 91.09 96.68 97.48 96.75 Label smoothing 86.79 20.28 84.58 95.86 84.81 FGSM adv. training 91.86 49.77 85.91 98.62 97.71 VAT 97.53 74.35 96.03 98.26 96.11 GDA + RELU 98.47 80.25 97.84 98.96 97.87 GDA + BRELU 98.08 75.50 98.00 98.88 98.03 Attacks transferred from ResNet to CNN on MNIST Accuracy = % of correct predictions = TP + TN

32 / 37

slide-33
SLIDE 33

Demo

33 / 37

slide-34
SLIDE 34

Conclusion

34 / 37

slide-35
SLIDE 35

Conclusion

Our contribution

  • Improved defense against multiple types of attacks
  • Model performance for clean inputs is preserved
  • No retraining, no overhead for prediction
  • Easy to integrate into models.

Takeaway

  • The problem of adversarial examples needs to be solved before

applying machine learning. nemesis

  • Our library of attacks and defenses
  • Soon to be open source.

Full paper at https://arxiv.org/pdf/1707.06728.pdf

35 / 37

slide-36
SLIDE 36

References I

[1] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. CoRR, abs/1412.6572, 2014. URL http://arxiv.org/abs/1412.6572. [2] Florian Tram` er, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick

  • McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv

preprint arXiv:1705.07204, 2017. [3] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. CoRR, abs/1511.04599, 2015. URL http://arxiv.org/abs/1511.04599. [4] Nicolas Papernot, Patrick D. McDaniel, Somesh Jha, Matt Fredrikson,

  • Z. Berkay Celik, and Ananthram Swami. The limitations of deep learning in

adversarial settings. CoRR, abs/1511.07528, 2015. URL http://arxiv.org/abs/1511.07528. [5] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, 2017. URL https://arxiv.org/abs/1608.04644.

36 / 37

slide-37
SLIDE 37

References II

[6] Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, and Shin Ishii. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. arXiv preprint arXiv:1704.03976, 2017. [7] Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. CoRR, abs/1704.01155,

  • 2017. URL http://arxiv.org/abs/1704.01155.

[8] David Warde-Farley and Ian Goodfellow. Adversarial perturbations of deep neural networks. In Tamir Hazan, George Papandreou, and Daniel Tarlow, editors, Perturbation, Optimization, and Statistics. 2016.

37 / 37