Hang Su
suhangss@tsinghua.edu.cn Institute for Artificial Intelligence
- Dept. of Computer Science & Technology
Tsinghua University
Adversarial Attacks and Defenses in Deep Learning
1
Adversarial Attacks and Defenses in Deep Learning Hang Su - - PowerPoint PPT Presentation
Adversarial Attacks and Defenses in Deep Learning Hang Su suhangss@tsinghua.edu.cn Institute for Artificial Intelligence Dept. of Computer Science & Technology Tsinghua University 1 Background Artificial intelligence (AI) is a
1
2
3
Alps: 94.39% Dog: 99.99% Puffer: 97.99% Crab: 100.00%
4
5
Salt Lake City, USA, 2018.
Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, 2018.
Alps: 94.39% Puffer: 97.99% Dog: 99.99% Crab: 100.00%
6
[Sharif Bhagavatula Bauer Reiter 2016] Adversarial Attack on Social Network [Dai et al, ICML2018]
7
8
Training Inference
Training Inference
F
9
System Designer System Designer
Simulate attack Evaluate attack’s impact Develop countermeasure Model adversary
10
11
–
− − − −
f (x) = sign g(x)
( ) =
+1, malicious
² # %
12
13
Number of Iterations
1 2 3 4 5 6 7 8 9 10
Success Rate (%)
20 40 60 80 100
Inc-v3 vs. I-FGSM Inc-v4 vs. I-FGSM IncRes-v2 vs. I-FGSM Res-152 vs. I-FGSM
Inception V4, Inception ResNet V2, ResNet v2-152;
* wining solution at NIPS 2017 competition
15
❖ Generate adversarial examples against white-box models, and leverage transferability for attacks; ❖ Require no knowledge of the target model, no queries; ❖ Need white-box models (datasets);
❖ The target model provides output probability distribution; ❖ Black-box optimization by gradient estimation methods; ❖ Impractical in some real-world applications;
❖ The target model only provides hard-label predictions; ❖ Practical in real-world applications; ❖ Need a large number of queries
17
1 𝑟 σ𝑗=1 𝑟
𝑔 𝑦+𝜏𝑣𝑗,𝑧 −𝑔 𝑦,𝑧 𝜏
18
𝑐≥0 E 𝛼𝑔 𝑦 − 𝑐 ො
2
19
lim
𝜏→0 𝑀 ො
= 𝛼𝑔 𝑦
2 2 − 𝛼𝑔 𝑦 ⊤𝐃𝛼𝑔 𝑦
2
1−1
𝑟 𝛼𝑔 𝑦 ⊤𝐃2𝛼𝑔 𝑦 +1 𝑟𝛼𝑔 𝑦 ⊤𝐃𝛼𝑔 𝑦 ,
𝐃 = E[𝑣𝑗𝑣𝑗
⊤].
20
21
❖ Generate adversarial examples against white-box models, and leverage transferability for attacks; ❖ Require no knowledge of the target model, no queries; ❖ Need white-box models (datasets);
❖ The target model provides output probability distribution; ❖ Black-box optimization by gradient estimation methods; ❖ Impractical in some real-world applications;
❖ The target model only provides hard-label predictions; ❖ Practical in real-world applications; ❖ Need a large number of queries
23
A “Black-box” Model Original Images 1,000 Queries 10,000 Queries 100,000 Queries False True True True
𝑦∗
❖ 𝐸(⋅,⋅) is a distance metric; 𝐷(⋅) is an adversarial criterion (𝐷 𝑔 𝑦 = 0).
𝑦∗
Non-adversarial Region
2
27
Original pair 0 queries 100 queries 1,000 queries 2,000 queries 10,000 queries 100,000 queries Original pair 0 queries 100 queries 1,000 queries 2,000 queries 10,000 queries 100,000 queries 2.4e-5 7.4e-5 1.0e-3 3.3e-3 1.7e-2 1.3e-1 9.9e-6 2.5e-5 4.1e-4 1.4e-3 1.2e-2 6.1e-2 Dodging Attack Impersonation Attack
Evolutionary Boundary Optimization Original pair
countermeasure Adversary Designer
countermeasur e Designer Designer From reactive to proactive
30
31
–
& g(&) & ’ ! " #(" )
32
33
34
35
Clean Im Adv Im Image Network
36
Den Im Adv Im Clean Im D L1 Den Im Adv Im Clean Im D L1 Feat1 Feat2 CNN CNN
High-level Representation Guided Denoiser (HGD) Pixel Guided Denoiser (PGD) Train Test Naive
Adv Im CNN Den Im Adv Im D CNN Den Im Adv Im D CNN
37
38
Max-Mahalanobis Linear Discriminant Analysis Networks [Pang et al., ICML 2018]
39
40
Observation: Most of existing detection method focus on designing new metric to detect adversarial examples. What we do: We instead design new training method to make the network better match the existing detection metric. We propose the reverse cross-entropy (RCE) training method, which maps the normal inputs to low-dimensional manifolds in the feature space. This help detector to distinguish adversarial examples and normal ones more easily.
Towards Robust Detection of Adversarial Examples [Pang et al., NeurIPS 2018]
41
plane car bird cat deer dog frog horse ship truck plane car bird cat deer dog frog horse ship truck
Cross-Entropy (CE): Reverse Cross-Entropy (RCE):
1𝑧: One-hot label
𝑆𝑧: Reverse label
{0, 0, 0, 1, 0, 0, 0, 0, 0, 0} {
1 9, 1 9, 1 9, 0, 1 9, 1 9, 1 9, 1 9, 1 9, 1 9}
42
Original input z0
Decision boundary Isoline of non-ME=t Decision boundary Decision boundary Isoline of non-ME=t Isoline of non-ME=t
Adversarial input z1 Adversarial input z2 C
When the non-ME of the returned predictions are maximized, the learned features for each class will tend to locate near the black dash lines, where the points on the dash lines have the maximal non-ME. The left plot is the decision domain in 2-d feature space for 3 classes (each class with one color)
43
Original input z0
Decision boundary Isoline of non-ME=t Decision boundary Decision boundary Isoline of non-ME=t Isoline of non-ME=t
Adversarial input z1 Adversarial input z2 C
When the non-ME of the returned predictions are maximized, the learned features for each class will tend to locate near the black dash lines, where the points on the dash lines have the maximal non-ME. The left plot is the decision domain in 2-d feature space for 3 classes (each class with one color)
44
Original input z0
Decision boundary Isoline of non-ME=t Decision boundary Decision boundary Isoline of non-ME=t Isoline of non-ME=t
Adversarial input z1 Adversarial input z2 C
Then if an adversary wants to craft an adversarial example based on 𝒜𝟏, he has to move further to 𝒜𝟑 rather than 𝒜𝟐 to
45
46
Inception v3 Inception ResNet v2 ResNet 152 Tramer et al. (2018) Liao et al. (2018) Xie et al. (2018) Guo et al. (2018)
47
𝑗𝑘 is the translation operation that shifts image by 𝑗 and 𝑘 pixels along each dimension, i.e.,
𝑗𝑘 𝑦 𝑏,𝑐 = 𝑦𝑏−𝑗,𝑐−𝑘;
𝑦𝑏𝑒𝑤 𝑗,𝑘
𝑗𝑘(𝑦𝑏𝑒𝑤), 𝑧
∞ ≤ 𝜗
48
𝑦
𝑗,𝑘
𝑦= ො 𝑦 ≈ 𝑋 ∗ 𝛼 𝑦𝐾 𝑦, 𝑧 ቚ 𝑦= ො 𝑦
❖ A uniform kernel 𝑋
𝑗,𝑘 = 1 2𝑙+1 2;
❖ A linear kernel ෩ 𝑋
𝑗,𝑘 = 1 − 𝑗 𝑙+1
1 −
𝑘 𝑙+1 , 𝑋 𝑗,𝑘 = ෩ 𝑋𝑗,𝑘 σ ෩ 𝑋𝑗,𝑘
❖ A Gaussian kernel ෩ 𝑋
𝑗,𝑘 = 1 2𝜌𝜏2 exp − 𝑗2+𝑘2 2𝜏2
, 𝑋
𝑗,𝑘 = ෩ 𝑋𝑗,𝑘 σ ෩ 𝑋𝑗,𝑘
49
𝑦𝐾(𝑦𝑠𝑓𝑏𝑚, 𝑧))
𝑏𝑒𝑤 = 𝑦𝑢 𝑏𝑒𝑤 + α ⋅ sign 𝑋 ∗ 𝛼 𝑦𝐾 𝑦𝑢 𝑏𝑒𝑤, 𝑧
𝑦𝐾 𝑦𝑢 𝑏𝑒𝑤, 𝑧
𝑦𝐾 𝑦𝑢 𝑏𝑒𝑤, 𝑧 1
𝑏𝑒𝑤= 𝑦𝑢 𝑏𝑒𝑤 + 𝛽 ⋅ sign 𝑢+1
50
Raw Image FGSM TI-FGSM
51
52
Vision and Pattern Recognition (CVPR), Salt Lake City, USA, 2018.
53
◼ Channel-wise control gates inspired by model pruning associate scalar value with each layer’s output channel a group of control gates multiplied to k-th layer’s output channel-wise
54
◼ Learning control gates by Distillation Guided Pruning (DGP) inspired by knowledge distillation all the control gates pretrained network : , [Hinton15]
cross entropy
55
56
57
30% drop for 1% deactivated
60
S e c u r i t y
D N N s Adversarial ML
2004-2005: pioneering work Dalvi et al., KDD 200; Lowd & Meek, KDD 2005 2013: Srndic & Laskov, NDSS claim nonlinear classifiers are secure 2006-2010: Barreno, Nelson, Rubinstein, Joseph, Tygar The Security of Machine Learning 2013-2014: Biggio et al., ECML, IEEE TKDE high-confidence & black-box evasion attacks to show vulnerability of nonlinear classifiers 2014: Srndic & Laskov, IEEE S&P shows vulnerability of nonlinear classifiers with our ECML ‘13 gradient-based attack 2014: Szegedy et al., ICLR adversarial examples vs DL 2016: Papernot et al., IEEE S&P evasion attacks / adversarial examples 2017: Papernot et al., ASIACCS black-box evasion attacks 2017: Carlini & Wagner, IEEE S&P high-confidence evasion attacks 2017: Grosse et al., ESORICS Application to Android malware 2017: Demontis et al., IEEE TDSC Secure learning for Android malware 2004 2014 2006 2013 2014 2017 2016 2017 2017 2017 2014
Attacks
Single-step attacks [Goodfellow et al., 2014] Adaptive Attacks [Athalye et al., 2018] Optimization Attacks [Carlini and Wagner., 2017] Multiple-step attacks [Kurakin et al., 2014] FGSM-based Adversarial Training [Kurakin et al., 2015] Defensive distillation [Papernot et al., 2016] Randomness, denoising [Xie et al., 2018; Liao et al, 2018]
Defenses Attacks / Defenses Cycle
61
62
63
Logic Reasoning Expert System Statistical Learning
Logistic Regression SVM
Deep Learning
CNN, RNN DRL, GAN
AI with understanding
Trustworthy Explainable Knowledge embedding Safety ……
Post-Deep Learning Era