Kevin Roth*, Yannic Kilcher *, Thomas Hofmann ETH Zrich 2 6 # r - - PowerPoint PPT Presentation

kevin roth yannic kilcher thomas hofmann eth z rich
SMART_READER_LITE
LIVE PREVIEW

Kevin Roth*, Yannic Kilcher *, Thomas Hofmann ETH Zrich 2 6 # r - - PowerPoint PPT Presentation

Kevin Roth*, Yannic Kilcher *, Thomas Hofmann ETH Zrich 2 6 # r e t s o p Log-Odds & Adversarial Examples Log-Odds & Adversarial Examples Adversarial examples cause atypically large feature space perturbations along the


slide-1
SLIDE 1

Kevin Roth*, Yannic Kilcher*, Thomas Hofmann ETH Zürich

p

  • s

t e r # 6 2

slide-2
SLIDE 2

Log-Odds & Adversarial Examples

slide-3
SLIDE 3

Log-Odds & Adversarial Examples

Adversarial examples cause atypically large feature space perturbations along the weight-difference direction

slide-4
SLIDE 4

x*

Adversarial Cone

slide-5
SLIDE 5

x* xadv

Adversarial Cone

slide-6
SLIDE 6

x* xadv

Adversarial Cone

random

slide-7
SLIDE 7

Py*(.) = 1 Py*(.) = 0

x* xadv

Adversarial Cone

random

slide-8
SLIDE 8

Py*(.) = 1 Py*(.) = 0

x* xadv

Adversarial Cone

random

slide-9
SLIDE 9

Py*(.) = 1 Py*(.) = 0

x* xadv

Adversarial Cone

Adversarial examples are embedded in a cone-like structure

slide-10
SLIDE 10

Adversarial Cone

*

softmax

xadv + t noise

slide-11
SLIDE 11

Adversarial Cone

*

softmax

xadv + t noise

slide-12
SLIDE 12

Adversarial Cone

Noise as a probing instrument

*

softmax

xadv + t noise

slide-13
SLIDE 13

The robustness properties of are different dependent on whether

  • r

Main Idea: Log-Odds Robustness

tends to have a characteristic direction if whereas it tends not to have a specific direction if

slide-14
SLIDE 14

Main Idea: Log-Odds Robustness

natural adversarial

Noise can partially undo effect of adversarial perturbation and directionally revert log-odds towards the true class y*

slide-15
SLIDE 15

Statistical Test & Corrected Classification

We propose to use noise-perturbed pairwise log-odds to test whether classified as should be thought of as a manipulated example of true class :

adversarial if

Corrected classification :

slide-16
SLIDE 16

Detection Rates & Corrected Classification

  • Our statistical test detects nearly all adversarial examples with FPR ~1%
  • Our correction method reclassifies almost all adversarial examples successfully
  • Drop in performance on clean samples is negligible
slide-17
SLIDE 17

Detection Rates & Corrected Classification

Detection rate increases with increasing attack strength Corrected classification manages to compensate for decay in uncorrected accuracy due to increase in attack strength

attack strength ε

slide-18
SLIDE 18

Defending against Defense-Aware Attacks

  • Attacker has full knowledge of the defense :

perturbations that work in expectation under noise source used for detection Detection rates and corrected accuracies remain remarkably high

slide-19
SLIDE 19

Kevin Roth Yannic Kilcher Thomas Hofmann

Thank You

Follow-Up Work: Adversarial Training Generalizes Data-dependent Spectral Norm Regularization poster #62

ICML Workshop on Generalization (June 14)

slide-20
SLIDE 20
slide-21
SLIDE 21

The approaches most related to our work are those that detect whether or not the input has been perturbed, either by detecting characteristic regularities in the adversarial perturbations themselves or in the network activations they induce.

  • Grosse, Kathrin, et al. "On the (statistical) detection of adversarial examples." (2017).
  • Metzen, Jan Hendrik, et al. "On detecting adversarial perturbations." (2017).
  • Feinman, Reuben, et al. "Detecting adversarial samples from artifacts." (2017).
  • Xu, Weilin, David Evans, and Yanjun Qi. "Feature squeezing: Detecting adversarial examples in

deep neural networks." (2017).

  • Song, Yang, et al. "Pixeldefend: Leveraging generative models to understand and defend against

adversarial examples." (2017).

  • Carlini, Nicholas, and David Wagner. "Adversarial examples are not easily detected: Bypassing ten

detection methods." (2017).

  • … and many more

References