Adversarial Attacks and Defenses in Deep Learning Hang Su - - PowerPoint PPT Presentation

adversarial attacks and defenses in deep learning
SMART_READER_LITE
LIVE PREVIEW

Adversarial Attacks and Defenses in Deep Learning Hang Su - - PowerPoint PPT Presentation

Adversarial Attacks and Defenses in Deep Learning Hang Su suhangss@tsinghua.edu.cn Institute for Artificial Intelligence Dept. of Computer Science & Technology Tsinghua University 1 Background Artificial intelligence (AI) is a


slide-1
SLIDE 1

Hang Su

suhangss@tsinghua.edu.cn Institute for Artificial Intelligence

  • Dept. of Computer Science & Technology

Tsinghua University

Adversarial Attacks and Defenses in Deep Learning

1

slide-2
SLIDE 2

2

Background

⚫ Artificial intelligence (AI) is a transformative technology that holds promise for tremendous societal and economic benefit, which has made dramatic success in a torrent of applications ⚫ AI has the potential to revolutionize how we live, work, learn, discover, and communicate.

slide-3
SLIDE 3

3

⚫ AI — The Revolution Hasn’t Happened Yet.

  • --Michael Jordan

⚫ The effectiveness

  • f AI algorithms will be limited by the machine’s

inability to explain its decisions and actions to human users. ⚫ Several machine learning models, including neural networks, consistently misclassify adversarial examples

AI is NOT Trustworthy

Alps: 94.39% Dog: 99.99% Puffer: 97.99% Crab: 100.00%

slide-4
SLIDE 4

4

Content

⚫ Understandable: traceability, explainability and communication ⚫ Adversarial Robust: resilience to attack and security

Understandable (Adversarial) Robust

Trustworthy AI

slide-5
SLIDE 5

5

Robustness

⚫ A crucial component of achieving Trustworthy AI is technical robustness ⚫ Technical robustness requires that AI systems be developed with a preventative approach to risks

  • YP Dong et al., Boosting Adversarial Attacks with Momentum, In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR),

Salt Lake City, USA, 2018.

  • Fangzhou Liao et al. Defense Against Adversarial Attacks Using High-Level Representation Guided Denoiser, In Proc. of the IEEE Conference on

Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, 2018.

Alps: 94.39% Puffer: 97.99% Dog: 99.99% Crab: 100.00%

slide-6
SLIDE 6

6

The world can be adversarial

⚫ We need to demystify the black-box models, and develop more transparent and interpretable models to make them more trustworthy and robust ➢DNNs can be easily duped by adversarial examples crafted by adding small, human-imperceptible noises ➢It may pose severe risks for numerous applications

[Sharif Bhagavatula Bauer Reiter 2016] Adversarial Attack on Social Network [Dai et al, ICML2018]

slide-7
SLIDE 7

7

Is ML inherently not reliable?

⚫ No: But we need to re-think how we do ML ⚫ Adversarial aspects = stress-testing our solutions ⚫ Towards Adversarial Robust Models

“pig” “pig” (91%) “airliner” (99%) + 0.005 x =

slide-8
SLIDE 8

8

A Limitation of the ML Framework

⚫ Measure of performance: Fraction of mistakes during testing ⚫ But: In reality, the distributions we use ML on are NOT the ones we train it on

Training Inference

Training Inference

F

=

slide-9
SLIDE 9

9

Adversary-aware Machine Learning

⚫ Machine learning systems should be aware of the arms race with the adversary

➢Know your adversary ➢Be proactive ➢Protect your classifier

System Designer System Designer

Simulate attack Evaluate attack’s impact Develop countermeasure Model adversary

slide-10
SLIDE 10

10

Adversarial Attack Scenarios

⚫ White box attack (WBA): Access to any information about the target classifier, including the prediction, gradient information, etc. ⚫ Practical black box attack (PBA): Only the prediction of the target classifier is available. When the prediction confidence is accessible: PBA-C; if only the discrete label is allowed: PBA-D. ⚫ Restricted black box attack (RBA): Black-box queries allowed only on some samples, and the attacker must create adversarial perturbations to other samples.

WBA > PBA-C > PBA-D > RBA

slide-11
SLIDE 11

11

White-box attacks

  • in

− − − −

x

f (x) = sign g(x)

( ) =

+1, malicious

  • 1, legitimate

² # %

x '

slide-12
SLIDE 12

12

Black-box attacks (transferability)

⚫ Cross-model transferability (Liu et al., 2017) ⚫ Cross-data transferability (Moosavi-Dezfooli et al., 2017)

slide-13
SLIDE 13

13

Limitations of black-box attacks

⚫ The trade-off between transferability and attack ability, makes black-box attacks less effective.

Number of Iterations

1 2 3 4 5 6 7 8 9 10

Success Rate (%)

20 40 60 80 100

Inc-v3 vs. I-FGSM Inc-v4 vs. I-FGSM IncRes-v2 vs. I-FGSM Res-152 vs. I-FGSM

  • Attack Inception V3;
  • Evaluate the success rates of attacks on Inception V3,

Inception V4, Inception ResNet V2, ResNet v2-152;

  • ϵ = 16;
  • 1000 images from ImageNet.
slide-14
SLIDE 14

Momentum iterative FGSM [CVPR18]

Dong et al, Boosting Adversarial Attack via Momentum, CVPR 2018

* wining solution at NIPS 2017 competition

slide-15
SLIDE 15

15

Experimental Results

⚫ 𝜗 = 16, 𝜈 = 1.0, 10 10 iterations ➢MI-FGSM can attack a white-box model with near 100% success rates ➢It fools an black-box model with much higher success rates

slide-16
SLIDE 16

Query-based Black-box Attacks

⚫ Transfer-based

❖ Generate adversarial examples against white-box models, and leverage transferability for attacks; ❖ Require no knowledge of the target model, no queries; ❖ Need white-box models (datasets);

⚫ Score-based

❖ The target model provides output probability distribution; ❖ Black-box optimization by gradient estimation methods; ❖ Impractical in some real-world applications;

⚫ Decision-based

❖ The target model only provides hard-label predictions; ❖ Practical in real-world applications; ❖ Need a large number of queries

slide-17
SLIDE 17

17

Score-based Attacks

⚫ Query loss function 𝑔(𝑦) given 𝑦 ⚫ Goal: Maximize 𝑔(𝑦) until attack succeeds ⚫ Estimate the gradient of 𝛼𝑔(𝑦) by queries, and apply the first-order

  • ptimization methods.

➢In ordinary RGF method, 𝑣𝑗 is sampled uniformly from the 𝐸-dimensional Euclidean hypersphere. ො 𝑕 =

1 𝑟 σ𝑗=1 𝑟

ො 𝑕𝑗, where ො 𝑕𝑗 =

𝑔 𝑦+𝜏𝑣𝑗,𝑧 −𝑔 𝑦,𝑧 𝜏

⋅ 𝑣𝑗

slide-18
SLIDE 18

18

Gradient estimation framework

⚫ Our loss function: 𝑀 ො 𝑕 = min

𝑐≥0 E 𝛼𝑔 𝑦 − 𝑐 ො

𝑕 2

2

⚫ Minimized mean square error w.r.t. the scale coefficient 𝑐

➢Usually the normalized gradient is used, hence the norm does not matter

slide-19
SLIDE 19

19

Prior-guided RGF (P-RGF) method

⚫ Use the normalized ( 𝑤 2 = 1) transfer gradient of a surrogated model ⚫ The gradient estimator can be implemented as 𝑣𝑗 = 𝜇 ⋅ 𝑤 + 1 − 𝜇 ⋅ 𝐉 − 𝑤𝑤⊤ 𝜊𝑗 ⚫ Incorporate the data prior to accelerate the gradient estimation

lim

𝜏→0 𝑀 ො

𝑕 = 𝛼𝑔 𝑦

2 2 − 𝛼𝑔 𝑦 ⊤𝐃𝛼𝑔 𝑦

2

1−1

𝑟 𝛼𝑔 𝑦 ⊤𝐃2𝛼𝑔 𝑦 +1 𝑟𝛼𝑔 𝑦 ⊤𝐃𝛼𝑔 𝑦 ,

𝐃 = E[𝑣𝑗𝑣𝑗

⊤].

slide-20
SLIDE 20

20

Performance of gradient estimation

⚫ Cosine similarity (averaged over all images) between the gradient estimate and the true gradient w.r.t. attack iterations: ⚫ The transfer gradient is more useful at the beginning and less useful later

➢Showing the advantage of using adaptive 𝜇∗

slide-21
SLIDE 21

21

Results on defensive models

⚫ ASR: Attack Success Rate (#queries is under 10,000); AVG. Q: Average #queries over successful attacks. ⚫ Methods with the subscript “D” refers to the data-dependent version

  • f the P-RGF method.
slide-22
SLIDE 22

Query-based Black-box Attacks

⚫ Transfer-based

❖ Generate adversarial examples against white-box models, and leverage transferability for attacks; ❖ Require no knowledge of the target model, no queries; ❖ Need white-box models (datasets);

⚫ Score-based

❖ The target model provides output probability distribution; ❖ Black-box optimization by gradient estimation methods; ❖ Impractical in some real-world applications;

⚫ Decision-based

❖ The target model only provides hard-label predictions; ❖ Practical in real-world applications; ❖ Need a large number of queries

slide-23
SLIDE 23

Query-based Adversarial Attack

23

⚫ We search for an adversarial example by modeling the local geometry of the search directions and reduce the dimension of the search space.

A “Black-box” Model Original Images 1,000 Queries 10,000 Queries 100,000 Queries False True True True

slide-24
SLIDE 24

Objective Function

⚫ Constrained optimization problem

argmin

𝑦∗

𝐸 𝑦∗, 𝑦 , 𝑡. 𝑢. 𝐷 𝑔 𝑦∗ = 1 ,

❖ 𝐸(⋅,⋅) is a distance metric; 𝐷(⋅) is an adversarial criterion (𝐷 𝑔 𝑦 = 0).

⚫ A reformulation

argmin

𝑦∗

𝑀 𝑦∗ = 𝐸 𝑦∗, 𝑦 + 𝜀 𝐷 𝑔 𝑦∗ = 1

Non-adversarial Region

Implement a black- box gradient estimation using a local-search based on query

slide-25
SLIDE 25

Evolutionary Attack

⚫ (1+1) covariance matrix adaptation evolution strategy

❖ Initialize ෤ 𝑦∗ ∈ 𝑆𝑜 (already adversarial) ❖ For t = 1, 2, …, T do ➢Sample 𝑨~N 0, σ2C ➢If 𝑀 ෤ 𝑦∗ + 𝑨 < 𝑀(෤ 𝑦∗): ➢ ෤ 𝑦∗ = ෤ 𝑦∗ + 𝑨 ➢Update(σ, C) ❖ Return ෤ 𝑦∗

⚫ Model the local geometry of the search directions ⚫ Reduce the dimension of the search space

slide-26
SLIDE 26

Covariance Matrix Adaptation

⚫ The storage and computation complexity

  • f

full covariance matrix is at least 𝑃 𝑜2 ; ⚫ We use a diagonal covariance matrix; ⚫ Update rule: 𝑞𝑑 = 1 − 𝑑𝑑 𝑞𝑑 + 𝑑𝑑(2 − 𝑑𝑑) 𝑨 𝜏 𝑑𝑗𝑗 = 1 − 𝑑𝑑𝑝𝑤 𝑑𝑗𝑗 + 𝑑𝑑𝑝𝑤 𝑞𝑑 𝑗

2

slide-27
SLIDE 27

Experimental Result

27

Original pair 0 queries 100 queries 1,000 queries 2,000 queries 10,000 queries 100,000 queries Original pair 0 queries 100 queries 1,000 queries 2,000 queries 10,000 queries 100,000 queries 2.4e-5 7.4e-5 1.0e-3 3.3e-3 1.7e-2 1.3e-1 9.9e-6 2.5e-5 4.1e-4 1.4e-3 1.2e-2 6.1e-2 Dodging Attack Impersonation Attack

Evolutionary Boundary Optimization Original pair

Attack on Face Recognition API Could be useful in privacy protection!

slide-28
SLIDE 28

Experimental Result

Verification Identification

slide-29
SLIDE 29

From reactive to proactive

⚫ Machine learning itself can be the weakest link in the security chain since they work in black-box manner

  • 1. Analyze system
  • 2. Devise attack
  • 3. Analyze attack
  • 4. Develop

countermeasure Adversary Designer

  • 1. Model adversary
  • 2. Simulate attack
  • 3. Evaluate attack
  • 4. Develop

countermeasur e Designer Designer From reactive to proactive

slide-30
SLIDE 30

30

How to Defend Adversarial Attacks?

⚫ Possible strategy one: To correctly classify adversarial examples ➢Optimal ➢Difficult to achieve ➢Computationally expensive (adversarial training) ⚫ Possible strategy two: To detect and filter out adversarial examples ➢Suboptimal ➢Little computation ➢Methods borrowed from anomaly detection

slide-31
SLIDE 31

31

Why is machine learning so vulnerable?

⚫ Learning algorithms tend to overemphasize some features to discriminate among classes ⚫ Large sensitivity to changes of such input features: ⚫ Different classifiers tend to find the same set of relevant features ➢ that is why attacks can transfer across models!

& g(&) & ’ ! " #(" )

slide-32
SLIDE 32

32

Undistinguishable?

slide-33
SLIDE 33

33

Basic idea

⚫ Denoise the image

➢𝑌 + 𝑒𝑌 → 𝑍′ ➢ 𝑌 + 𝑒𝑌 → ෠ 𝑌 → 𝑍

⚫ Denoise method

➢Traditional denoiser (median filter, BM3D) ➢Denoise by neural network

slide-34
SLIDE 34

34

Denoise effect of different methods

⚫ Unexpected:

➢Convolutional Autoencoder is very poor at reconstructing large images ➢Although most noise is removed, the accuracy doesn’t increase

slide-35
SLIDE 35

35

Error amplification effect

Clean Im Adv Im Image Network

slide-36
SLIDE 36

36

Denoiser in Feature Space [CVPR2018]

Den Im Adv Im Clean Im D L1 Den Im Adv Im Clean Im D L1 Feat1 Feat2 CNN CNN

High-level Representation Guided Denoiser (HGD) Pixel Guided Denoiser (PGD) Train Test Naive

Adv Im CNN Den Im Adv Im D CNN Den Im Adv Im D CNN

slide-37
SLIDE 37

37

Transferability of HGD

⚫ Train 750 classes, test other 250 classes ⚫ Borrow HGD from other model

> > Liao et al, Defense Against Adversarial Attacks Using High-Level Representation Guided Denoiser, CVPR2018 * wining solution at NIPS 2017 competition

slide-38
SLIDE 38

MM-LDA Network [ICML2018]

38

Observation 1: A feed-forward network consists of a non-linear transformation from input x to feature z, and a softmax classifier acting on z. Observation 2: Efron et al.(1975) shows that if the input distributes as a mixture of Gaussian, then linear discriminant analysis (LDA) is more efficient than logistic regression (LR). What we do: Explicitly model the feature distribution as a mixture of Gaussian. The means are calculated by our algorithm to separate each class the most. Our method can provide theoretical guarantees on the robustness.

Max-Mahalanobis Linear Discriminant Analysis Networks [Pang et al., ICML 2018]

slide-39
SLIDE 39

39

Experimental Results

slide-40
SLIDE 40

RCE Defense [NIPS2018]

40

Observation: Most of existing detection method focus on designing new metric to detect adversarial examples. What we do: We instead design new training method to make the network better match the existing detection metric. We propose the reverse cross-entropy (RCE) training method, which maps the normal inputs to low-dimensional manifolds in the feature space. This help detector to distinguish adversarial examples and normal ones more easily.

Towards Robust Detection of Adversarial Examples [Pang et al., NeurIPS 2018]

slide-41
SLIDE 41

41

Reverse Cross Entropy

plane car bird cat deer dog frog horse ship truck plane car bird cat deer dog frog horse ship truck

Cross-Entropy (CE): Reverse Cross-Entropy (RCE):

1𝑧: One-hot label

𝓜𝑫𝑭 = −𝟐𝒛 ∙ 𝐦𝐩𝐡(𝐆)

𝑆𝑧: Reverse label

𝓜𝑺𝑫𝑭 = −𝑺𝒛 ∙ 𝐦𝐩𝐡(𝐆)

{0, 0, 0, 1, 0, 0, 0, 0, 0, 0} {

1 9, 1 9, 1 9, 0, 1 9, 1 9, 1 9, 1 9, 1 9, 1 9}

slide-42
SLIDE 42

42

The Insights of RCE Training [NIPS2018]

Original input z0

Decision boundary Isoline of non-ME=t Decision boundary Decision boundary Isoline of non-ME=t Isoline of non-ME=t

Adversarial input z1 Adversarial input z2 C

When the non-ME of the returned predictions are maximized, the learned features for each class will tend to locate near the black dash lines, where the points on the dash lines have the maximal non-ME. The left plot is the decision domain in 2-d feature space for 3 classes (each class with one color)

slide-43
SLIDE 43

43

Original input z0

Decision boundary Isoline of non-ME=t Decision boundary Decision boundary Isoline of non-ME=t Isoline of non-ME=t

Adversarial input z1 Adversarial input z2 C

When the non-ME of the returned predictions are maximized, the learned features for each class will tend to locate near the black dash lines, where the points on the dash lines have the maximal non-ME. The left plot is the decision domain in 2-d feature space for 3 classes (each class with one color)

The Insights of RCE Training

slide-44
SLIDE 44

44

Original input z0

Decision boundary Isoline of non-ME=t Decision boundary Decision boundary Isoline of non-ME=t Isoline of non-ME=t

Adversarial input z1 Adversarial input z2 C

Then if an adversary wants to craft an adversarial example based on 𝒜𝟏, he has to move further to 𝒜𝟑 rather than 𝒜𝟐 to

  • btain a normal value of non-ME.

The Insights of RCE Training

slide-45
SLIDE 45

RCE Defense

45

slide-46
SLIDE 46

Adversarial Defense

46

⚫ Defense methods are shown to be robust to transferable adversarial examples in the black-box manner ⚫ The defenses sometimes rely on different discriminative regions for prediction

Inception v3 Inception ResNet v2 ResNet 152 Tramer et al. (2018) Liao et al. (2018) Xie et al. (2018) Guo et al. (2018)

slide-47
SLIDE 47

Adversarial Attacks on Models with Defense

47

⚫ Optimizing an adversarial example for an ensemble of inputs.

❖ 𝑈

𝑗𝑘 is the translation operation that shifts image by 𝑗 and 𝑘 pixels along each dimension, i.e.,

𝑈

𝑗𝑘 𝑦 𝑏,𝑐 = 𝑦𝑏−𝑗,𝑐−𝑘;

⚫ High computational complexity

max

𝑦𝑏𝑒𝑤 ෍ 𝑗,𝑘

𝑥𝑗𝑘𝐾 𝑈

𝑗𝑘(𝑦𝑏𝑒𝑤), 𝑧

𝑡. 𝑢. 𝑦𝑏𝑒𝑤 − 𝑦𝑠𝑓𝑏𝑚

∞ ≤ 𝜗

slide-48
SLIDE 48

Translation-invariance of Adversarial Attacks

48

⚫ Assumption: 𝛼

𝑦𝐾 𝑦, 𝑧 ȁ𝑦=𝑈𝑗𝑘( ො 𝑦) ≈ 𝛼 𝑦𝐾 𝑦, 𝑧 ȁ𝑦= ො 𝑦

⚫ Loss gradient:

𝛼

𝑦

𝑗,𝑘

𝑥𝑗𝑘𝐾 𝑈𝑗𝑘 𝑦 , 𝑧 ቚ

𝑦= ො 𝑦 ≈ 𝑋 ∗ 𝛼 𝑦𝐾 𝑦, 𝑧 ቚ 𝑦= ො 𝑦

⚫ Kernel matrix

❖ A uniform kernel 𝑋

𝑗,𝑘 = 1 2𝑙+1 2;

❖ A linear kernel ෩ 𝑋

𝑗,𝑘 = 1 − 𝑗 𝑙+1

1 −

𝑘 𝑙+1 , 𝑋 𝑗,𝑘 = ෩ 𝑋𝑗,𝑘 σ ෩ 𝑋𝑗,𝑘

❖ A Gaussian kernel ෩ 𝑋

𝑗,𝑘 = 1 2𝜌𝜏2 exp − 𝑗2+𝑘2 2𝜏2

, 𝑋

𝑗,𝑘 = ෩ 𝑋𝑗,𝑘 σ ෩ 𝑋𝑗,𝑘

slide-49
SLIDE 49

Translation-invariance of Adversarial Attacks

49

⚫TI-FGSM

𝑦𝑏𝑒𝑤 = 𝑦𝑠𝑓𝑏𝑚 + 𝜗 ⋅ sign(𝑋 ∗ 𝛼

𝑦𝐾(𝑦𝑠𝑓𝑏𝑚, 𝑧))

⚫TI-BIM

𝑦𝑢+1

𝑏𝑒𝑤 = 𝑦𝑢 𝑏𝑒𝑤 + α ⋅ sign 𝑋 ∗ 𝛼 𝑦𝐾 𝑦𝑢 𝑏𝑒𝑤, 𝑧

⚫TI-MI-FGSM

𝑕𝑢+1 = 𝜈 ⋅ 𝑕𝑢 + 𝑋 ∗ 𝛼

𝑦𝐾 𝑦𝑢 𝑏𝑒𝑤, 𝑧

𝑋 ∗ 𝛼

𝑦𝐾 𝑦𝑢 𝑏𝑒𝑤, 𝑧 1

, 𝑦𝑢+1

𝑏𝑒𝑤= 𝑦𝑢 𝑏𝑒𝑤 + 𝛽 ⋅ sign 𝑕𝑢+1

The proposed method can be integrated into any gradient-based attack.

slide-50
SLIDE 50

Experiments

50

Raw Image FGSM TI-FGSM

  • Dong et al, Evading Defenses to Transferable Adversarial Examples by Translation-Invariant Attacks, CVPR2019 (oral)
slide-51
SLIDE 51

51

Why we need understanding?

⚫ The highest activity a human being can attain is learning for understanding, because to understand is to be free. ——Baruch Spinoza ⚫ 人类能获得的最高级的活动,就是学会去理解,因为理解了就达到思想的 自由. ——斯宾诺沙(葡萄牙哲学家)

slide-52
SLIDE 52

52

Critical Data Routing Paths

⚫ Critical Nodes: important channels

  • f layers’ output

➢ if they were set to 0, the performance would deteriorate Idea: identifying the critical data routing paths (CDRPs) in the network for each given input

  • Y. Wang et al, “Interpret Neural Networks by Identifying Critical Data Routing Paths”, in IEEE Conference on Computer

Vision and Pattern Recognition (CVPR), Salt Lake City, USA, 2018.

slide-53
SLIDE 53

53

Method

◼ Channel-wise control gates  inspired by model pruning  associate scalar value with each layer’s output channel  a group of control gates multiplied to k-th layer’s output channel-wise

slide-54
SLIDE 54

54

Problem Formulation

◼ Learning control gates by Distillation Guided Pruning (DGP)  inspired by knowledge distillation  all the control gates  pretrained network : , [Hinton15]

cross entropy

slide-55
SLIDE 55

55

Experiments

⚫ Functional process of intra-layer routing nodes

➢All the individual critical nodes in a certain layer composing the intra-layer routing nodes. ➢T-SNE method to display optimized control gates

slide-56
SLIDE 56

56

Experiments

⚫ Semantic Concepts Emerge in CDRPs

➢Individual critical node as dummy object/part detector ➢select the top k images whose corresponding control gates values

  • n this node rank highest.
slide-57
SLIDE 57

57

Experiments

⚫ Ablation study (VGG-16) ➢ partially deactivate the critical nodes on the identified CDRPs in the full model ➢ deactivates the critical nodes with larger control gates (Top Mode) or smaller (Bottom Mode)

30% drop for 1% deactivated

slide-58
SLIDE 58

Adversarial Samples Detection

⚫ CDRPs divergence between real and adversarial image

slide-59
SLIDE 59

Adversarial sample detection

⚫ We randomly sample 1/5/10 images and 1 image of each class from ImageNet training and validation datasets as training and test samples ⚫ Each sample is used to generate an adversarial sample by iterative FGSM

slide-60
SLIDE 60

60

Timeline of Learning Security

S e c u r i t y

  • f

D N N s Adversarial ML

2004-2005: pioneering work Dalvi et al., KDD 200; Lowd & Meek, KDD 2005 2013: Srndic & Laskov, NDSS claim nonlinear classifiers are secure 2006-2010: Barreno, Nelson, Rubinstein, Joseph, Tygar The Security of Machine Learning 2013-2014: Biggio et al., ECML, IEEE TKDE high-confidence & black-box evasion attacks to show vulnerability of nonlinear classifiers 2014: Srndic & Laskov, IEEE S&P shows vulnerability of nonlinear classifiers with our ECML ‘13 gradient-based attack 2014: Szegedy et al., ICLR adversarial examples vs DL 2016: Papernot et al., IEEE S&P evasion attacks / adversarial examples 2017: Papernot et al., ASIACCS black-box evasion attacks 2017: Carlini & Wagner, IEEE S&P high-confidence evasion attacks 2017: Grosse et al., ESORICS Application to Android malware 2017: Demontis et al., IEEE TDSC Secure learning for Android malware 2004 2014 2006 2013 2014 2017 2016 2017 2017 2017 2014

Attacks

Single-step attacks [Goodfellow et al., 2014] Adaptive Attacks [Athalye et al., 2018] Optimization Attacks [Carlini and Wagner., 2017] Multiple-step attacks [Kurakin et al., 2014] FGSM-based Adversarial Training [Kurakin et al., 2015] Defensive distillation [Papernot et al., 2016] Randomness, denoising [Xie et al., 2018; Liao et al, 2018]

Defenses Attacks / Defenses Cycle

slide-61
SLIDE 61

61

The Support of GPU

⚫ We conducted on a NVIDIA DGX-1 server with 8 Tesla P100 GPUs. ⚫ We establish a comprehensive, rigorous, and coherent benchmark to evaluate adversarial robustness

slide-62
SLIDE 62

62

When will AI become TRUE?

⚫ TRUE AI

➢Trustworthy:the system should be trustworthy in developers, deployers and end-users in AI systems’ life cycle: ➢Robust:resilience to attack and security, fall back plan and general safety, accuracy, reliability and reproducibility

➢Understandable: traceability, explainability and communication ➢Ethical:Ensuring adherence to ethical principles and values, e.g., Respect

for human autonomy, Prevention of harm, Fairness, Explicability

slide-63
SLIDE 63

63

Conclusion

⚫ We still have a long way from the revolution of AI. ⚫ The third-generation AI — Trustworthy AI.

Logic Reasoning Expert System Statistical Learning

Logistic Regression SVM

Deep Learning

CNN, RNN DRL, GAN

AI with understanding

Trustworthy Explainable Knowledge embedding Safety ……

Post-Deep Learning Era