Not all Neurons are created equal: Towards a feature level Deep - - PowerPoint PPT Presentation

not all neurons are created equal
SMART_READER_LITE
LIVE PREVIEW

Not all Neurons are created equal: Towards a feature level Deep - - PowerPoint PPT Presentation

Not all Neurons are created equal: Towards a feature level Deep Neural Network Test Coverage Metric Nils Wenzler - CSC2125: Topics in Software Engineering Winter 2019 Problem DNN Problem Does it work? Does it really work? DNN Problem


slide-1
SLIDE 1

Not all Neurons are created equal:

Towards a feature level Deep Neural Network Test Coverage Metric

Nils Wenzler - CSC2125: Topics in Software Engineering Winter 2019

slide-2
SLIDE 2

Problem

DNN

slide-3
SLIDE 3

Problem

DNN

Does it work? Does it really work?

slide-4
SLIDE 4

Problem

DNN Steer left!

slide-5
SLIDE 5

Problem

DNN Steer right!

slide-6
SLIDE 6

Problem

DNN Go straight!

slide-7
SLIDE 7

Problem

DNN

Did I test it enough? Did I test it in the right way?

slide-8
SLIDE 8

Structure

  • 1. Problem
  • 2. Current DNN Test Coverage Metrics
  • 3. α-Bin Coverage
  • 4. Practical Evaluation
slide-9
SLIDE 9

General Approach

Use a test coverage metric for

  • Building test suites that
  • Cover all significant behaviours of a

deep neural network

Not a proof of correctness but evidence towards correctness!

slide-10
SLIDE 10

Current DNN Test Coverage Metrics

slide-11
SLIDE 11

Current DNN Test Coverage Metrics

  • High research interest
  • White-box testing
  • Focused on single neurons
slide-12
SLIDE 12

Current DNN Test Coverage Metrics

𝑚𝑝𝑥𝑜: lowest output value during training ℎ𝑗𝑕ℎ𝑜: highest output value during training

slide-13
SLIDE 13

Current DNN Test Coverage Metrics

𝑚𝑝𝑥𝑜 ℎ𝑗𝑕ℎ𝑜

Neural Coverage

0.2

k-multisection Neuron Coverage

k=6

Neuron Boundary Coverage Strong Neuron Activation Cov.

slide-14
SLIDE 14

Structure

  • 1. Problem
  • 2. Current DNN Test Coverage Metrics
  • 3. α-Bin Coverage
  • 4. Practical Evaluation
slide-15
SLIDE 15

Yet another metric?

Number of neurons per layer in AlexNet Less then 1 ‰ of total coverage metric!

slide-16
SLIDE 16

Not all Neurons are created equal

Current metrics put equal emphasis on each neuron, but:

Is a first layer neuron as important as an output layer neuron?

Make use of domain specific knowledge concerning layer architectures!

slide-17
SLIDE 17

𝑚𝑝𝑥𝑜 ℎ𝑗𝑕ℎ𝑜

Neural Coverage

0.2

k-multisection Neuron Coverage

k=6

Neuron Boundary Coverage Strong Neuron Activation Cov. Bin Coverage

# bins dependend on layer

slide-18
SLIDE 18

α-Bin Coverage

Equally distribute so-called bins throughout layers. Each layer contributes approximate same share to coverage metric.

𝑚𝑝𝑥𝑜 ℎ𝑗𝑕ℎ𝑜

k-multisection Neuron Coverage

k=6

Bin Coverage

# bins dependend on layer

slide-19
SLIDE 19

α-Bin Coverage

Let 𝑀𝑗 denote the number of neurons in Layer i. Let 𝑀𝑛𝑏𝑦 be the maximum of all 𝑀𝑗. Let α ∈ (0, ∞]. The minimum number of bins per layer for α-Bin Coverage is defined as: The number of bins per neuron in Layer i is defined as:

𝐶𝑗𝑜𝑡 = 𝑀𝑛𝑏𝑦 ⋅ α 𝑙𝑗 = 𝐶𝑗𝑜𝑡 𝑀𝑗

slide-20
SLIDE 20

Structure

  • 1. Problem
  • 2. Current DNN Test Coverage Metrics
  • 3. α-Bin Coverage
  • 4. Practical Evaluation
slide-21
SLIDE 21

Practical Evaluation

The main questions:

  • 1. Can α-Bin Coverage be implemented in a practically feasible way?
  • 2. Can α-Bin Coverage be optimized with a greedy search approach?
  • 3. How does α-Bin Coverage relate to other DNN coverage metrics?
  • 4. Can α-Bin Coverage be used to find wrong behaviours?
slide-22
SLIDE 22

Practical Evaluation

The main questions:

  • 1. Can α-Bin Coverage be implemented in a practically feasible way?
  • 2. Can α-Bin Coverage be optimized with a greedy search approach?
  • 3. How does α-Bin Coverage relate to other DNN coverage metrics?
  • 4. Can α-Bin Coverage be used to find wrong behaviours?
slide-23
SLIDE 23

Practically feasible?

Test setup (1/2):

  • 10 layer DNN inspired by Nvidea End to

End approach using ReLu

  • Trained on 45,500 publicly available

labeled images

  • Implemented in Python using Tensorflow
slide-24
SLIDE 24

Practically feasible?

Test setup (2/2):

  • Created greedy optimizer that uses image transforms to optimize

coverage metric

  • Compare behaviour of

α-Bin Coverage & Neuron Coverage

slide-25
SLIDE 25

Performance

Determining 𝑚𝑝𝑥𝑜 and ℎ𝑗𝑕ℎ𝑜 only needs to be done once and can be approximated through random sampling. Calculating α-Bin Coverage incrementally: constant time (dependend on network size).

Determine 𝑚𝑝𝑥𝑜 and ℎ𝑗𝑕ℎ𝑜 Select random image Greedy search transforms Add transforms to image Evaluate coverage Add image to test suite Iterate on transforms

slide-26
SLIDE 26

Greedy search: Transforms

Transformations: Translation, Brightness, Contrast, Blur

slide-27
SLIDE 27

Practical Evaluation

The main questions:

  • 1. Can α-Bin Coverage be implemented in a practically feasible way?
  • 2. Can α-Bin Coverage be optimized with a greedy search approach?
  • 3. How does α-Bin Coverage relate to other DNN coverage metrics?
  • 4. Can α-Bin Coverage be used to find wrong behaviours?
slide-28
SLIDE 28

Greedy Optimization: Bin Coverage

slide-29
SLIDE 29

Greedy Optimization: Bin Coverage

ReLu Activations: Neuron Boundary Coverage is practically limited at 50%

slide-30
SLIDE 30

Greedy Optimization: Bin Coverage

Obtain 74% 0.05-Bin Coverage with ~220 images

slide-31
SLIDE 31

Greedy Optimization: Neuron Coverage

slide-32
SLIDE 32

Neuron Coverage Optimization: Layer View

slide-33
SLIDE 33

Neuron Coverage Optimization: Layer View

Output layer is „fully tested“ for an image with a steering angle > 11.5°

slide-34
SLIDE 34

Bin Coverage Optimization: Layer View

slide-35
SLIDE 35

Bin Coverage Optimization Layer View

Output layer is „fully tested“ after testing 3656 images which correspond to 0.2° steps in -360° to +360°

slide-36
SLIDE 36

Practical Evaluation

The main questions:

  • 1. Can α-Bin Coverage be implemented in a practically feasible way?
  • 2. Can α-Bin Coverage be optimized with a greedy search approach?
  • 3. How does α-Bin Coverage relate to other DNN coverage metrics?
  • 4. Can α-Bin Coverage be used to find wrong behaviours?
slide-37
SLIDE 37

Deviation from target labels in test suite

Output: 234° Target: 160°

Example:

Transformed Image

slide-38
SLIDE 38

Conclusions

  • Current DNN test coverage metrics deal all neurons equally
  • This introduces an intrinsic focus on the neurons of low layers in

modern architectures

  • α-Bin Coverage is a practically feasible approach to equally distribute

a test coverage metric over all layers

  • First evidence shows that α-Bin Coverage can be used for finding

erroneous behaviours and creating test suites automatically

slide-39
SLIDE 39

Let‘s discuss!

Some points to consider:

  • Only one model in evaluation
  • Limited number of test runs
  • Only one domain
  • Why greedy search?
  • What is this strange α value? Why do we need it?
  • How about classification tasks?
slide-40
SLIDE 40

Greedy search

Stack transformations on randomly selected images to optimize coverage metric. Add an image to test suite if it significantly increases coverage metric Transformations: Translation, Brightness, Contrast, Blur