Not all Neurons are created equal: Towards a feature level Deep - - PowerPoint PPT Presentation

▶

Sep 03, 2023 360 likes •772 views

Not all Neurons are created equal: Towards a feature level Deep Neural Network Test Coverage Metric Nils Wenzler - CSC2125: Topics in Software Engineering Winter 2019 Problem DNN Problem Does it work? Does it really work? DNN Problem

SLIDE 1

Not all Neurons are created equal:

Towards a feature level Deep Neural Network Test Coverage Metric

Nils Wenzler - CSC2125: Topics in Software Engineering Winter 2019

SLIDE 2

Problem

DNN

SLIDE 3

Problem

DNN

฀

Does it work? Does it really work?

SLIDE 4

Problem

DNN Steer left!

SLIDE 5

Problem

DNN Steer right!

SLIDE 6

Problem

DNN Go straight!

SLIDE 7

Problem

DNN

฀

Did I test it enough? Did I test it in the right way?

SLIDE 8

Structure

1. Problem
2. Current DNN Test Coverage Metrics
3. α-Bin Coverage
4. Practical Evaluation

SLIDE 9

General Approach

Use a test coverage metric for

Building test suites that
Cover all significant behaviours of a

deep neural network

Not a proof of correctness but evidence towards correctness!

SLIDE 10

Current DNN Test Coverage Metrics

SLIDE 11

Current DNN Test Coverage Metrics

High research interest
White-box testing
Focused on single neurons

SLIDE 12

Current DNN Test Coverage Metrics

𝑚𝑝𝑥𝑜: lowest output value during training ℎ𝑗𝑕ℎ𝑜: highest output value during training

SLIDE 13

Current DNN Test Coverage Metrics

𝑚𝑝𝑥𝑜 ℎ𝑗𝑕ℎ𝑜

Neural Coverage

0.2

k-multisection Neuron Coverage

k=6

Neuron Boundary Coverage Strong Neuron Activation Cov.

SLIDE 14

Structure

1. Problem
2. Current DNN Test Coverage Metrics
3. α-Bin Coverage
4. Practical Evaluation

SLIDE 15

Yet another metric?

Number of neurons per layer in AlexNet Less then 1 ‰ of total coverage metric!

SLIDE 16

Not all Neurons are created equal

Current metrics put equal emphasis on each neuron, but:

Is a first layer neuron as important as an output layer neuron?

Make use of domain specific knowledge concerning layer architectures!

SLIDE 17

𝑚𝑝𝑥𝑜 ℎ𝑗𝑕ℎ𝑜

Neural Coverage

0.2

k-multisection Neuron Coverage

k=6

Neuron Boundary Coverage Strong Neuron Activation Cov. Bin Coverage

# bins dependend on layer

SLIDE 18

α-Bin Coverage

Equally distribute so-called bins throughout layers. Each layer contributes approximate same share to coverage metric.

𝑚𝑝𝑥𝑜 ℎ𝑗𝑕ℎ𝑜

k-multisection Neuron Coverage

k=6

Bin Coverage

# bins dependend on layer

SLIDE 19

α-Bin Coverage

Let 𝑀𝑗 denote the number of neurons in Layer i. Let 𝑀𝑛𝑏𝑦 be the maximum of all 𝑀𝑗. Let α ∈ (0, ∞]. The minimum number of bins per layer for α-Bin Coverage is defined as: The number of bins per neuron in Layer i is defined as:

𝐶𝑗𝑜𝑡 = 𝑀𝑛𝑏𝑦 ⋅ α 𝑙𝑗 = 𝐶𝑗𝑜𝑡 𝑀𝑗

SLIDE 20

Structure

1. Problem
2. Current DNN Test Coverage Metrics
3. α-Bin Coverage
4. Practical Evaluation

SLIDE 21

Practical Evaluation

The main questions:

1. Can α-Bin Coverage be implemented in a practically feasible way?
2. Can α-Bin Coverage be optimized with a greedy search approach?
3. How does α-Bin Coverage relate to other DNN coverage metrics?
4. Can α-Bin Coverage be used to find wrong behaviours?

SLIDE 22

Practical Evaluation

The main questions:

1. Can α-Bin Coverage be implemented in a practically feasible way?
2. Can α-Bin Coverage be optimized with a greedy search approach?
3. How does α-Bin Coverage relate to other DNN coverage metrics?
4. Can α-Bin Coverage be used to find wrong behaviours?

SLIDE 23

Practically feasible?

Test setup (1/2):

10 layer DNN inspired by Nvidea End to

End approach using ReLu

Trained on 45,500 publicly available

labeled images

Implemented in Python using Tensorflow

SLIDE 24

Practically feasible?

Test setup (2/2):

Created greedy optimizer that uses image transforms to optimize

coverage metric

Compare behaviour of

α-Bin Coverage & Neuron Coverage

SLIDE 25

Performance

Determining 𝑚𝑝𝑥𝑜 and ℎ𝑗𝑕ℎ𝑜 only needs to be done once and can be approximated through random sampling. Calculating α-Bin Coverage incrementally: constant time (dependend on network size).

Determine 𝑚𝑝𝑥𝑜 and ℎ𝑗𝑕ℎ𝑜 Select random image Greedy search transforms Add transforms to image Evaluate coverage Add image to test suite Iterate on transforms

SLIDE 26

Greedy search: Transforms

Transformations: Translation, Brightness, Contrast, Blur

SLIDE 27

Practical Evaluation

The main questions:

1. Can α-Bin Coverage be implemented in a practically feasible way?
2. Can α-Bin Coverage be optimized with a greedy search approach?
3. How does α-Bin Coverage relate to other DNN coverage metrics?
4. Can α-Bin Coverage be used to find wrong behaviours?

SLIDE 28

Greedy Optimization: Bin Coverage

SLIDE 29

Greedy Optimization: Bin Coverage

ReLu Activations: Neuron Boundary Coverage is practically limited at 50%

SLIDE 30

Greedy Optimization: Bin Coverage

Obtain 74% 0.05-Bin Coverage with ~220 images

SLIDE 31

Greedy Optimization: Neuron Coverage

SLIDE 32

Neuron Coverage Optimization: Layer View

SLIDE 33

Neuron Coverage Optimization: Layer View

Output layer is „fully tested“ for an image with a steering angle > 11.5°

SLIDE 34

Bin Coverage Optimization: Layer View

SLIDE 35

Bin Coverage Optimization Layer View

Output layer is „fully tested“ after testing 3656 images which correspond to 0.2° steps in -360° to +360°

SLIDE 36

Practical Evaluation

The main questions:

1. Can α-Bin Coverage be implemented in a practically feasible way?
2. Can α-Bin Coverage be optimized with a greedy search approach?
3. How does α-Bin Coverage relate to other DNN coverage metrics?
4. Can α-Bin Coverage be used to find wrong behaviours?

SLIDE 37

Deviation from target labels in test suite

Output: 234° Target: 160°

Example:

Transformed Image

SLIDE 38

Conclusions

Current DNN test coverage metrics deal all neurons equally
This introduces an intrinsic focus on the neurons of low layers in

modern architectures

α-Bin Coverage is a practically feasible approach to equally distribute

a test coverage metric over all layers

First evidence shows that α-Bin Coverage can be used for finding

erroneous behaviours and creating test suites automatically

SLIDE 39

Let‘s discuss!

Some points to consider:

Only one model in evaluation
Limited number of test runs
Only one domain
Why greedy search?
What is this strange α value? Why do we need it?
How about classification tasks?

SLIDE 40

Greedy search

Stack transformations on randomly selected images to optimize coverage metric. Add an image to test suite if it significantly increases coverage metric Transformations: Translation, Brightness, Contrast, Blur