[PPT] - 26:198:722 Expert Systems I Knowledge representation I Knowledge PowerPoint Presentation

SLIDE 1

26:198:722 Expert Systems

I Knowledge representation I Knowledge acquisition I Machine learning I ID3 & C4.5

SLIDE 2

Knowledge Representation

Recall:

I Knowledge engineering

F Knowledge acquisition

NKnowledge elicitation

F Knowledge representation

NProduction rules NSemantic networks NFrames

SLIDE 3

Knowledge Representation

I Representation is more than just

encoding (encrypting)

I Coding preserves structural ambiguity I Communication assumes prior

knowledge

I Representation implies organization

SLIDE 4

Knowledge Representation

I Representation

F A set of syntactic and semantic

conventions that make it possible to describe things (Winston)

I Description

F makes use of the conventions of a

representation to describe some particular thing

I Syntax v. semantics

SLIDE 5

Knowledge Representation

I STRIPS

F Predicate-argument expressions

Nat (robot, roomA)

F World models F Operator tables

Npush (X, Y, Z)

² Preconditions

at (robot, Y), at (X, Y)

² Delete list

at (robot, Y), at (X, Y)

² Add list

at (robot, Z), at (X, Z)

SLIDE 6

Knowledge Representation

I STRIPS

F maintained lists of goals F selected goal to work on next F searched for applicable operators F matched goals against formulas in add lists F set up preconditions as sub-goals F used means-end analysis

SLIDE 7

Knowledge Representation

I STRIPS - lessons

F Heuristic search F Uniform representation F Problem reduction

I Procedural semantics

SLIDE 8

Knowledge Representation

I MYCIN

F Assists physicians who are not experts in

the field of antibiotics in treating blood infections

F Consists of

NKnowledge base NDynamic patient database NConsultation program NExplanation program NKnowledge acquisition program

SLIDE 9

Knowledge Representation

I MYCIN

F Production rules

NPremises

² Conjunctions of conditions

NActions

² Conclusions or instructions

F Patient information stored in context tree F Certainty factors for uncertain reasoning F Backward chaining control structure

(based on AND/OR tree)

SLIDE 10

Knowledge Representation

I MYCIN

F Evaluation

NPanel of experts approved 72% of

recommendations

NGood as experts NBetter than non-experts NKnowledge base incomplete (400 rules) NRequired more computing power than available

in hospitals

NDoctors did not like the user interface

SLIDE 11

Knowledge Acquisition

I Stages

F Identification F Conceptualization F Formalization F Implementation F Testing

I KADS I Ontological analysis

SLIDE 12

Knowledge Acquisition

I Expert system shells

F EMYCIN F TEIRESIAS

NRule models (meta-rules) NSchemas for data types NDomain-specific knowledge NRepresentation-specific knowledge NRepresentation-independent knowledge NExplain-Test-Review

SLIDE 13

Knowledge Acquisition

I Methods and tools

F Structured interview F Unstructured interview F Case studies

N Retrospective v. observational N Familiar v. unfamiliar

F Concurrent protocols

N Verbalization, “thinking aloud”

F Tape recording F Video recording

SLIDE 14

Knowledge Acquisition

I Methods and tools

F Automated knowledge acquisition

N Domain models N Graphical interfaces N Visual programming language

SLIDE 15

Knowledge Acquisition

I Different types of knowledge

F Procedural knowledge

N Rules, strategies, agendas, procedures

F Declarative knowledge

N Concepts, objects, facts

F Meta-knowledge

N Knowledge about other types of knowledge and how to

use them

F Structural knowledge

N Rules sets, concept relationships, concept to object

relationships

SLIDE 16

Knowledge Acquisition

I Sources of knowledge

F Experts F End-users F Multiple experts (panels) F Reports F Books F Regulations F Guidelines

SLIDE 17

Knowledge Acquisition

I Major difficulties with elicitation

F Expert may

Nbe unaware of the knowledge used Nbe unable to verbalize the knowledge used Nprovide irrelevant knowledge Nprovide incomplete knowledge Nprovide incorrect knowledge Nprovide inconsistent knowledge

SLIDE 18

Knowledge Acquisition

I “The more competent domain experts

become, the less able they are to describe the knowledge they used to solve problems” (Waterman)

SLIDE 19

Knowledge Acquisition

I Detailed guidelines for conducting

structured and unstructured interviews and both retrospective and

bservational case studies are given in

Durkin (Chapter 17)

SLIDE 20

Knowledge Acquisition

I Technique Capabilities

Interviews

Case Studies

Retrospective Observational Knowledge Unstructured Structured Familiar Unfamiliar Familiar Unfamiliar Facts Poor Good Fair Average Good Excellent Concepts Excellent Excellent Average Average Good Good Objects Good Excellent Average Average Good Good Rules Fair Average Average Average Good Excellent Strategies Average Average Good Good Excellent Excellent Heuristics Fair Average Excellent Good Good Poor Structures Fair Excellent Average Average Average Average

SLIDE 21

Knowledge Acquisition

I Analyzing the knowledge collected

F Producing transcripts F Interpreting transcripts

N Chunking

F Analyzing transcripts

N Knowledge dictionaries N Graphical techniques

² Cognitive maps ² Inference networks ² Flowcharts ² Decision trees

SLIDE 22

Machine Learning

I Rote learning I Supervised learning

F Induction

NConcept learning NDescriptive generalization

I Unsupervised learning

SLIDE 23

Machine Learning

I META-DENDRAL

F RULEMOD

NRemoving redundancy NMerging rules NMaking rules more specific NMaking rules more general NSelecting final rules

SLIDE 24

Machine Learning

I META-DENDRAL

F Version spaces

NPartial ordering NBoundary sets NCandidate elimination algorithm

N Monotonic, non-heuristic N Results independent of order of presentation N Each training instance examine only once N Discarded hypotheses never reconsidered N Learning is properly incremental

SLIDE 25

Machine Learning

I Decision trees and production rules

F Decision trees are an alternative way of structuring rules F Efficient algorithms exist for constructing decision trees F There is a whole family of such learning systems:

N CLS (1966) N ID3 (1979) N ACLS (1982) N ASSISTANT (1984) N IND (1990) N C4.5 (1993) - and C5.0

F Decision trees can be converted to rules later

SLIDE 26

Machine Learning

I Entropy

F Let X be a variable with states x1 - - - xn F Define the entropy of X by F N.B.

( ) ( )

( )

2 1

H( ) p log p

n i i i

X x x

=

= −∑

( ) ( ) ( ) ( ) ( )

10 2 10

log ln log log 2 ln 2 x x x = =

SLIDE 27

Machine Learning

I Entropy

F Consider flipping a perfect coin:

e.g., n = 2

X : x1 , x2

p(x1 ) = p(x2 ) = 1/2

SLIDE 28

Machine Learning

I Entropy

( ) ( )

( )

( ) ( )

2 1

1 1 1 1 log log 2 2 2 2 2 2 1 1 2 2

H( ) p log p 1

1 1

n i i i

X x x

=

− + − +

=

= −       =               =    

− −

∑

SLIDE 29

Machine Learning

I Entropy

F Consider n equiprobable outcomes

( ) ( )

( )

( ) ( )

2 1 2 1 2 2 1

H( ) p log p 1 1 log 1 log log

n i i i n i n i

X x x n n n n n

= = =

−

=

= −   =     =

∑ ∑ ∑

SLIDE 30

Machine Learning

I Entropy

F Consider flipping a totally biased coin:

e.g., n = 2

X : x1 , x2

p(x1 ) = 1 p(x2 ) = 0

SLIDE 31

Machine Learning

I Entropy (by L’Hopital’s rule)

( ) ( )

( )

( ) ( ) ( )

2 1

log log 2 2 log2

H( ) p log p

1 0 0

n i i i

X x x

=

− + − +

=

= − =     =    

∑

SLIDE 32

Machine Learning

I Entropy

FEntropy is a measure of chaos or

disorder

F H(X) is maximum for equiprobable

utcomes

SLIDE 33

Machine Learning

I Entropy

F X: x1 - - - xm and Y: y1 - - - yn be two

variables

F If X and Y are independent

( ) ( )

( )

2 1 1

H( , ) p , log p ,

m n i j i j i j

X Y x y x y

= =

= −∑∑ H( , ) H( ) H( ) X Y X Y = +

SLIDE 34

Machine Learning

I Conditional Entropy

F Partial conditional entropy of Y given X is

in state xi :

F Full conditional entropy of Y given X

( ) ( )

( )

2 1

H( ) p log p

n i j i j i j

Y x y x y x

=

= −∑

( )

1

H( ) p H( )

m i i i

Y X x Y x

=

= ⋅

∑

SLIDE 35

Machine Learning

I Binary Logarithms 1 0.0000 2 1.0000 3 1.5850 4 2.0000 5 2.3219 6 2.5850 7 2.8074 8 3.0000

SLIDE 36

Machine Learning

I ID3

F Builds a decision tree first, then rules F Given a set of attributes, and a decision,

recursively selects attributes to be the root of the tree based on Information Gain:

H(decision) - H(decision | attribute)

F Favors attributes with many outcomes F Is not guaranteed to find the simplest decision tree F Is not incremental

SLIDE 37

Machine Learning

I C4.5

F Selects attributes based on Information

gain ratio:

(H(decision) - H(decision | attribute)) / H(attribute)

F Uses pruning heuristics to simplify decision

trees

Nto simplify Nto reduce dependence on training set

F Tunes the resulting rule(s)

SLIDE 38

Machine Learning

I C4.5 rule tuning

F Derive initial rules by enumerating paths through

the decision tree

F Generalize the rules by possibly deleting

unnecessary conditions

F Group rules according to target classes and delete

any that do not contribute to overall performance

n the class

F Order the sets of rules for the target classes and

choose a default class

SLIDE 39

Machine Learning

I Rule tuning

F Rule tuning may be useful for rules derived

by a variety of other means besides C4.5

NEvaluate the contribution of individual rules NEvaluate the performance of the rule set as a

whole

SLIDE 40

Machine Learning

I A data set for classification (Quinlan)

------------Attributes-------------
--Decision---

Height Hair Eyes Attractiveness

1 short blond blue + 2 tall blond brown

3 tall

red blue + 4 short dark blue

5 tall

dark blue

6 tall

blond blue + 7 tall dark brown

8 short

blond brown

SLIDE 41

Machine Learning

I A data set for classification (Quinlan)

F H(decision) = H(Attractiveness)

=

2 2

3 3 5 5 log log 0.955 8 8 8 8     − − =        

SLIDE 42

Machine Learning

I A data set for classification (Quinlan)

F Height:

N short: 1, 4, 8 p(+|short) = 1/3

p(-|short) = 2/3

N tall: 2, 3, 5, 6, 7 p(+|tall) = 2/5

p(-|tall) = 3/5

F H(decision|attribute) = H(Attractiveness|Height) = F Information gain =

2 2 2 2

3 1 1 2 2 5 2 2 3 3 log log log log 0.951 8 3 3 3 3 8 5 5 5 5             − − + − − =                         0.955 - 0.951 = 0.004

SLIDE 43

Machine Learning

I A data set for classification (Quinlan)

F Hair:

N blond: 1, 2, 6, 8 p(+|blond) = 2/4

p(-|blond) = 2/4

N red: 3

p(+|red) = 1/1 p(-|red) = 0/1

N dark: 4, 5, 7

p(+|dark) = 0/3 p(-|dark) = 3/3

F H(decision|attribute) = H(Attractiveness|Hair) = F Information gain =

[ ] [ ] [ ]

4 1 3 1 0.500 8 8 8 + + = 0.955 - 0.500 = 0.455

SLIDE 44

Machine Learning

I A data set for classification (Quinlan)

F Eyes:

N blue: 1, 3, 4, 5, 6 p(+|blue) = 3/5

p(-|blue) = 2/5

N brown: 2, 7, 8 p(+|brown) = 0/3 p(-|brown) = 3/3

F H(decision|attribute) = H(Attractiveness|Eyes) = F Information gain =

[ ]

2 2

5 3 3 2 2 3 log log 0.607 8 5 5 5 5 8       − − + =             0.955 - 0.607 = 0.348

SLIDE 45

Machine Learning

I A data set for classification (Quinlan)

F Hence Hair is chosen as the best choice

for the root of the tree

F Now we recursively repeat this process for

the (three) resulting branches

F In this case, the branches for Hair: red and

Hair: dark are already completely classified, and we need to work only on the sub-table for Hair: blond

SLIDE 46

Machine Learning

I A data set for classification (Quinlan)

--------Attributes----------
--Decision---

Height Eyes Attractiveness

1 short blue + 2 tall brown

6 tall

blue + 8 short brown

F H(decision) = H(Attractiveness)

=

2 2

2 2 2 2 log log 1 4 4 4 4     − − =        

SLIDE 47

Machine Learning

I A data set for classification (Quinlan)

F Height:

N short: 1, 8

p(+|short) = 1/2 p(-|short) = 1/2

N tall: 2, 6

p(+|tall) = 1/2 p(-|tall) = 1/2

F H(decision|attribute) = H(Attractiveness|Height) = F Information gain =

[ ] [ ]

2 2 1 1 1 4 4 + = 1-1=0

SLIDE 48

Machine Learning

I A data set for classification (Quinlan)

F Eyes:

N blue: 1, 6

p(+|blue) = 2/2 p(-|blue) = 0/2

N brown: 2, 8

p(+|brown) = 0/2 p(-|brown) = 2/2

F H(decision|attribute) = H(Attractiveness|Eyes) = F Information gain =

[ ] [ ]

2 2 4 4 + = 1-0=1

SLIDE 49

Machine Learning

I A data set for classification (Quinlan)

F Hence Eyes is chosen as the best root of

this subtree

F The final tree is Hair Eyes 3 4, 5, 7 2, 8 1, 6

blond red dark blue brown + + +

SLIDE 50

Machine Learning

I A data set for classification (Quinlan)

F We may now build rules from this decision

tree

N R1: (Hair, dark) --> (Attractiveness, -) N R2: (Hair, red) --> (Attractiveness, +) N R3: (Hair, blond) & (Eyes, blue) --> (Attractiveness, +) N R4: (Hair, blond) & (Eyes, brown) --> (Attractiveness, -)

F Note that height is irrelevant

SLIDE 51

Machine Learning

I A data set for classification (Quinlan)

F Dropping conditions from rules

NRules 1 and 2 have only one condition NRule 3: neither condition can be dropped (case

5 needs the first condition and case 2 needs the second condition)

NRule 4: we can drop the first condition

N R4’: (Eyes, brown) --> (Attractiveness, -)

SLIDE 52

Machine Learning

I A data set for classification (Quinlan)

F Dropping conditions from rules

N Linear

² Scan rule left to right ² Try to drop conditions one at a time ² If possible, drop for good ² Iterate (n conditions, n attempts)

N Exponential

² Scan rule left to right ² Try to drop conditions one at a time ² Then try to drop pairs, triples,etc. (n conditions, 2^n-2

attempts)

SLIDE 53

Machine Learning

I A data set for classification (Quinlan)

F Now consider Information gain ratio

F For initial root of tree we already know F H(decision)

= H(Attractiveness) =

2 2

3 3 5 5 log log 0.955 8 8 8 8     − − =        

SLIDE 54

Machine Learning

I A data set for classification (Quinlan)

2 2 2 2

3 1 1 2 2 5 2 2 3 3 log log log log 0.951 8 3 3 3 3 8 5 5 5 5             − − + − − =                        

[ ] [ ] [ ]

4 1 3 1 0.500 8 8 8 + + =

[ ]

2 2

5 3 3 2 2 3 log log 0.607 8 5 5 5 5 8       − − + =            

SLIDE 55

Machine Learning

I A data set for classification (Quinlan)

F H(attribute) = H(Height) = F H(attribute) = H(Hair) = F H(attribute) = H(Eyes) =

2 2

3 3 5 5 log log 0.955 8 8 8 8     − − =        

2 2 2

4 4 1 1 3 3 log log log 1.406 8 8 8 8 8 8       − − − =            

2 2

5 5 3 3 log log 0.955 8 8 8 8     − − =        

SLIDE 56

Machine Learning

I A data set for classification (Quinlan)

F Hence the Information gain ratios are

NHeight: 0.004 NHair: 0.324 NEyes: 0.364

F By this criterion, Eyes is chosen as the

best root available

F The branch for Eyes: brown is already

completely classified, and we need to work

nly on the sub-table for Eyes: blue

SLIDE 57

Machine Learning

I A data set for classification (Quinlan)

---Attributes----
--Decision---

Height Hair Attractiveness

1 short blond + 3 tall red + 4 short dark

5 tall

dark

6 tall

blond + F H(decision) = H(Attractiveness)

=

2 2

3 3 2 2 log log 0.971 5 5 5 5     − − =        

SLIDE 58

Machine Learning

I A data set for classification (Quinlan)

F Height:

N short: 1, 4

p(+|short) = 1/2 p(-|short) = 1/2

N tall: 3, 5, 6

p(+|tall) = 2/3 p(-|tall) = 1/3

F H(decision|attribute) = H(Attractiveness|Height) = F H(Height) =

2 2 2 2

2 1 1 1 1 3 2 2 1 1 log log log log 0.951 5 2 2 2 2 5 3 3 3 3             − − + − − =                        

2 2

2 2 3 3 log log 0.971 5 5 5 5     − − =        

SLIDE 59

Machine Learning

I A data set for classification (Quinlan)

F Hair:

N blond: 1, 6

p(+|blond) = 2/2 p(-|blond) = 0/2

N red: 3

p(+|red) = 1/1 p(-|red) = 0/1

N dark: 4,5

p(+|dark) = 0/2 p(-|dark) = 2/2

F H(decision|attribute) = H(Attractiveness|Hair) = F H(Hair) =

2 2 2

2 2 1 1 2 2 log log log 0.793 5 5 5 5 5 5       − − − =            

[ ] [ ] [ ]

2 1 2 5 5 5 + + =

SLIDE 60

Machine Learning

I A data set for classification (Quinlan)

F Hence the Information gain ratios are

NHeight: 0.021 NHair: 1.224

F By this criterion, Hair is chosen as the best

root available

Eyes Hair 3 2, 7, 8 4, 5 1, 6

blond red dark blue brown + +

+

SLIDE 61

Machine Learning

I A data set for classification (Quinlan)

F We may now build rules from this decision

tree

N R1: (Eyes, brown) --> (Attractiveness, -) N R2: (Eyes, blue) & (Hair, blond) --> (Attractiveness, +) N R3: (Eyes, blue) & (Hair, red) --> (Attractiveness, +) N R4: (Eyes, blue) & (Hair, dark) --> (Attractiveness, -)

F These are different rules F Note that after dropping conditions,

however, they are the same - this is NOT generally true

SLIDE 62

Machine Learning

I ID3 & C4.5

F What if too many cases?

N Windowing

F What if the data is incomplete? F What if the data is inconsistent? F What if the data is continuous?

N Binarization N Discretization

F Incremental algorithms? F Pruning?