An Introduction to Neural Network Rule Extraction Algorithms By - - PowerPoint PPT Presentation

▶

Mar 05, 2023 236 likes •514 views

An Introduction to Neural Network Rule Extraction Algorithms By Sarah Jackson Can we trust magic? Neural Networks Machine learning black boxes Magical, unexplainable results Problems People won't trust Neural Networks since

SLIDE 1

An Introduction to Neural Network Rule Extraction Algorithms

By Sarah Jackson

SLIDE 2

Can we trust magic?

✗ Neural Networks

✗ Machine learning black boxes ✗ Magical, unexplainable results

✗ Problems

✗ People won't trust Neural Networks since it

is difficult for them to understand them

✗ End result isn't always the only thing we

are looking for

✗ Unacceptable risk for certain scenarios

SLIDE 3

Why do we want them then?

✗ Neural Networks have been shown to

accurately classify data

✗ Neural Networks are capable of learning

and classifying in ways that other machine learning techniques may not be

SLIDE 4

Who cares about rules?

✗ Rules help to bridge the gap between

connectionist and symbolic methods

✗ Rule extraction from Neural Networks will

increase their acceptance

✗ Rules will also improve usefulness of data

gathered from Neural Networks

SLIDE 5

What do we do with these rules?

✗ Validation

✗ We can tell something has been learned

✗ Integration

✗ Can be used with symbolic systems

✗ Theory discovery

✗ May not have been seen otherwise

✗ Explanation ability

✗ Allows exploration of knowledge in network

SLIDE 6

Are the rules good?

✗ Accuracy

✗ Correctly classify unseen examples

✗ Fidelity

✗ Same behavior as Neural Network

✗ Consistency

✗ Classify unseen examples the same

✗ Comprehensibility

✗ Size of rule set and number of clauses per

rule

SLIDE 7

How does extraction work?

✗ Knowledge in Neural Networks represented

by numerical weights

✗ Extraction algorithms attempt to directly or

indirectly analyze the numerical data

✗ Neural Network behavior is explained

through new methods

SLIDE 8

Decompositional Algorithms

✗ Knowledge is extracted from each node in the

network individually

✗ Each node's rules are based on previous layers ✗ Usually simply described and accurate ✗ Require threshold approximation for each node ✗ Restricted generalization and scalability

✗ Special training procedure ✗ Special network architecture

✗ Require sigmoidal transfer functions for hidden

nodes

SLIDE 9

Global Algorithms

✗ Describe output nodes as functions of input

nodes

✗ Internal structure of network is not

important

✗ Represent networks as decision trees ✗ Extract rules from constructed decision

trees

✗ May not be efficient as complexity of

network grows

SLIDE 10

Combinatorial Algorithms

✗ Uses aspects of decompositional and

global algorithms

✗ Network architecture and value of weights

are necessary

✗ Attempts to gain advantages of each

without the disadvantages

SLIDE 11

TREPAN

✗ Trees Parroting Networks ✗ Global method

✗ Represents network knowledge through a

decision tree

✗ Uses same construction as C4.5 and

CART

✗ Uses breadth-first search to construct the

tree instead of depth-first search

SLIDE 12

TREPAN

✗ Classes used for decision tree are those

defined by the neural network

✗ List of leaf nodes kept with related data

✗ Subset of training data ✗ Set of complementary data ✗ Set of constraints

✗ Data sets used to determine if node should

be further divided or left as terminal leaf

✗ Data sets meet constraints

SLIDE 13

TREPAN

✗ Nodes are removed from list when split or

become terminal leaves

✗ Never added to list again ✗ Children are added to list

✗ Decision function determines type of decision

tree constructed

✗ M-of-N – Each node represents an m-of-n test ✗ 1-of-N – Each node represents a 1-of-n test ✗ Simple – Each node represents a test for one

attribute (true of false)

SLIDE 14

TREPAN

✗ Comparison on UCI Tic-Tac-Toe Data

✗ 27 inputs, 20 hidden nodes, 2 outputs

SLIDE 15

TREPAN

✗ Typically, shortest tree is easiest to understand ✗ M-of-N has fewest nodes, but is very difficult to

understand

✗ TREPAN provides higher quality information

SLIDE 16

TREPAN

SLIDE 17

TREPAN

SLIDE 18

TREPAN

SLIDE 19

Another Global Algorithm

✗ Only uses training data to construct decision tree

✗ TREPAN uses training data and may use

artificially generated data

✗ Uses CN2 and C4.5 algorithms

SLIDE 20

BDT

✗ Bound Decomposition Tree

✗ Decomposition Algorithm

✗ Designed with goals of no retraining, high

accuracy and low complexity

✗ Algorithm works for Multi-Layer

Perceptrons

SLIDE 21

BDT

✗ Maximum upper bounds on any neuron

✗ All inputs that have positive weight have a

value of 1

✗ Inputs with negative weight have a value of

✗ Minimum lower bounds on any neuron

✗ Only inputs that have negative weight have

a value of 1

✗ Inputs with positive weight have a value of

SLIDE 22

BDT

✗ Each neuron has its own minimum and maximum

bounds

✗ Minimum is found by adding the bias plus all

negative weights

✗ Maximum is found by adding the bias plus all

positive weights

Weight Min Bound Max Bound I1

0.25
0.25

I2 0.65 0.65 I3

0.48
0.48

I4 0.72 0.72 Bias (-1) 1

1
1
1.73

0.37

SLIDE 23

BDT

✗ Each neuron (cube) is divided into two subcubes

based on the first input

✗ One subcube assumes 0 as the value and the

ther assumes 1

✗ Remaining inputs are used to construct the input

vectors for each subcube

✗ Bounds are calculated for each subcube

✗ Positive subcube – lower bound is positive ✗ Negative subcube – upper bound is negative ✗ Uncertain subcube – lower bound is negative and

upper bound is positive

SLIDE 24

BDT

✗ Positive subcubes will always fire

✗ Represents a rule for the neuron

✗ Negative subcubes will never fire ✗ Uncertain subcubes must be further

subdivided until positive and/or negative subcubes are reached

✗ Rules for a neuron are the set of all input

vectors on positive subcubes

✗ Can have a Δ over 0 to prune the neuron

SLIDE 25

BDT

SLIDE 26

Sources

Milare, R., De Carvalho, A., & Monard, M. (2002). An Approach to Explain Neural Networks Using Symbolic Algorithms. International Journal of Computational Intelligence and Applications. 2(4), 365-376. Heh, J. S., Chen, J. C., & Chang, M. (2008). Designing a decompositional rule extraction algorithm for neural networks with bound decomposition tree. Neural Computing and Applications. 17, 297- 309. Nobre, C., Martinelle, E., Braga, A., De Carvalho, A., Rezende, S., Braga, J. L. & Ludermir,

T. (1999). Knowledge Extraction: A Comparison between Symbolic and Connectionist
Methods. International Journal of Neural Systems . 9(3), 257-264.