26:198:722 Expert Systems I Knowledge representation I Knowledge - - PowerPoint PPT Presentation
26:198:722 Expert Systems I Knowledge representation I Knowledge - - PowerPoint PPT Presentation
26:198:722 Expert Systems I Knowledge representation I Knowledge acquisition I Machine learning I ID3 & C4.5 Knowledge Representation Recall: I Knowledge engineering F Knowledge acquisition N Knowledge elicitation F Knowledge representation N
Knowledge Representation
Recall:
I Knowledge engineering
F Knowledge acquisition
NKnowledge elicitation
F Knowledge representation
NProduction rules NSemantic networks NFrames
Knowledge Representation
I Representation is more than just
encoding (encrypting)
I Coding preserves structural ambiguity I Communication assumes prior
knowledge
I Representation implies organization
Knowledge Representation
I Representation
F A set of syntactic and semantic
conventions that make it possible to describe things (Winston)
I Description
F makes use of the conventions of a
representation to describe some particular thing
I Syntax v. semantics
Knowledge Representation
I STRIPS
F Predicate-argument expressions
Nat (robot, roomA)
F World models F Operator tables
Npush (X, Y, Z)
² Preconditions
at (robot, Y), at (X, Y)
² Delete list
at (robot, Y), at (X, Y)
² Add list
at (robot, Z), at (X, Z)
Knowledge Representation
I STRIPS
F maintained lists of goals F selected goal to work on next F searched for applicable operators F matched goals against formulas in add lists F set up preconditions as sub-goals F used means-end analysis
Knowledge Representation
I STRIPS - lessons
F Heuristic search F Uniform representation F Problem reduction
I Procedural semantics
Knowledge Representation
I MYCIN
F Assists physicians who are not experts in
the field of antibiotics in treating blood infections
F Consists of
NKnowledge base NDynamic patient database NConsultation program NExplanation program NKnowledge acquisition program
Knowledge Representation
I MYCIN
F Production rules
NPremises
² Conjunctions of conditions
NActions
² Conclusions or instructions
F Patient information stored in context tree F Certainty factors for uncertain reasoning F Backward chaining control structure
(based on AND/OR tree)
Knowledge Representation
I MYCIN
F Evaluation
NPanel of experts approved 72% of
recommendations
NGood as experts NBetter than non-experts NKnowledge base incomplete (400 rules) NRequired more computing power than available
in hospitals
NDoctors did not like the user interface
Knowledge Acquisition
I Stages
F Identification F Conceptualization F Formalization F Implementation F Testing
I KADS I Ontological analysis
Knowledge Acquisition
I Expert system shells
F EMYCIN F TEIRESIAS
NRule models (meta-rules) NSchemas for data types NDomain-specific knowledge NRepresentation-specific knowledge NRepresentation-independent knowledge NExplain-Test-Review
Knowledge Acquisition
I Methods and tools
F Structured interview F Unstructured interview F Case studies
N Retrospective v. observational N Familiar v. unfamiliar
F Concurrent protocols
N Verbalization, “thinking aloud”
F Tape recording F Video recording
Knowledge Acquisition
I Methods and tools
F Automated knowledge acquisition
N Domain models N Graphical interfaces N Visual programming language
Knowledge Acquisition
I Different types of knowledge
F Procedural knowledge
N Rules, strategies, agendas, procedures
F Declarative knowledge
N Concepts, objects, facts
F Meta-knowledge
N Knowledge about other types of knowledge and how to
use them
F Structural knowledge
N Rules sets, concept relationships, concept to object
relationships
Knowledge Acquisition
I Sources of knowledge
F Experts F End-users F Multiple experts (panels) F Reports F Books F Regulations F Guidelines
Knowledge Acquisition
I Major difficulties with elicitation
F Expert may
Nbe unaware of the knowledge used Nbe unable to verbalize the knowledge used Nprovide irrelevant knowledge Nprovide incomplete knowledge Nprovide incorrect knowledge Nprovide inconsistent knowledge
Knowledge Acquisition
I “The more competent domain experts
become, the less able they are to describe the knowledge they used to solve problems” (Waterman)
Knowledge Acquisition
I Detailed guidelines for conducting
structured and unstructured interviews and both retrospective and
- bservational case studies are given in
Durkin (Chapter 17)
Knowledge Acquisition
I Technique Capabilities
Interviews
Case Studies
Retrospective Observational Knowledge Unstructured Structured Familiar Unfamiliar Familiar Unfamiliar Facts Poor Good Fair Average Good Excellent Concepts Excellent Excellent Average Average Good Good Objects Good Excellent Average Average Good Good Rules Fair Average Average Average Good Excellent Strategies Average Average Good Good Excellent Excellent Heuristics Fair Average Excellent Good Good Poor Structures Fair Excellent Average Average Average Average
Knowledge Acquisition
I Analyzing the knowledge collected
F Producing transcripts F Interpreting transcripts
N Chunking
F Analyzing transcripts
N Knowledge dictionaries N Graphical techniques
² Cognitive maps ² Inference networks ² Flowcharts ² Decision trees
Machine Learning
I Rote learning I Supervised learning
F Induction
NConcept learning NDescriptive generalization
I Unsupervised learning
Machine Learning
I META-DENDRAL
F RULEMOD
NRemoving redundancy NMerging rules NMaking rules more specific NMaking rules more general NSelecting final rules
Machine Learning
I META-DENDRAL
F Version spaces
NPartial ordering NBoundary sets NCandidate elimination algorithm
N Monotonic, non-heuristic N Results independent of order of presentation N Each training instance examine only once N Discarded hypotheses never reconsidered N Learning is properly incremental
Machine Learning
I Decision trees and production rules
F Decision trees are an alternative way of structuring rules F Efficient algorithms exist for constructing decision trees F There is a whole family of such learning systems:
N CLS (1966) N ID3 (1979) N ACLS (1982) N ASSISTANT (1984) N IND (1990) N C4.5 (1993) - and C5.0
F Decision trees can be converted to rules later
Machine Learning
I Entropy
F Let X be a variable with states x1 - - - xn F Define the entropy of X by F N.B.
( ) ( )
( )
2 1
H( ) p log p
n i i i
X x x
=
= −∑
( ) ( ) ( ) ( ) ( )
10 2 10
log ln log log 2 ln 2 x x x = =
Machine Learning
I Entropy
F Consider flipping a perfect coin:
e.g., n = 2
X : x1 , x2
p(x1 ) = p(x2 ) = 1/2
Machine Learning
I Entropy
( ) ( )
( )
( ) ( )
2 1
1 1 1 1 log log 2 2 2 2 2 2 1 1 2 2
H( ) p log p 1
1 1
n i i i
X x x
=
− + − +
=
= − = =
− −
∑
Machine Learning
I Entropy
F Consider n equiprobable outcomes
( ) ( )
( )
( ) ( )
2 1 2 1 2 2 1
H( ) p log p 1 1 log 1 log log
n i i i n i n i
X x x n n n n n
= = =
−
=
= − = =
∑ ∑ ∑
Machine Learning
I Entropy
F Consider flipping a totally biased coin:
e.g., n = 2
X : x1 , x2
p(x1 ) = 1 p(x2 ) = 0
Machine Learning
I Entropy (by L’Hopital’s rule)
( ) ( )
( )
( ) ( ) ( )
2 1
log log 2 2 log2
H( ) p log p
1 0 0
n i i i
X x x
=
− + − +
=
= − = =
∑
Machine Learning
I Entropy
FEntropy is a measure of chaos or
disorder
F H(X) is maximum for equiprobable
- utcomes
Machine Learning
I Entropy
F X: x1 - - - xm and Y: y1 - - - yn be two
variables
F If X and Y are independent
( ) ( )
( )
2 1 1
H( , ) p , log p ,
m n i j i j i j
X Y x y x y
= =
= −∑∑ H( , ) H( ) H( ) X Y X Y = +
Machine Learning
I Conditional Entropy
F Partial conditional entropy of Y given X is
in state xi :
F Full conditional entropy of Y given X
( ) ( )
( )
2 1
H( ) p log p
n i j i j i j
Y x y x y x
=
= −∑
( )
1
H( ) p H( )
m i i i
Y X x Y x
=
= ⋅
∑
Machine Learning
I Binary Logarithms 1 0.0000 2 1.0000 3 1.5850 4 2.0000 5 2.3219 6 2.5850 7 2.8074 8 3.0000
Machine Learning
I ID3
F Builds a decision tree first, then rules F Given a set of attributes, and a decision,
recursively selects attributes to be the root of the tree based on Information Gain:
H(decision) - H(decision | attribute)
F Favors attributes with many outcomes F Is not guaranteed to find the simplest decision tree F Is not incremental
Machine Learning
I C4.5
F Selects attributes based on Information
gain ratio:
(H(decision) - H(decision | attribute)) / H(attribute)
F Uses pruning heuristics to simplify decision
trees
Nto simplify Nto reduce dependence on training set
F Tunes the resulting rule(s)
Machine Learning
I C4.5 rule tuning
F Derive initial rules by enumerating paths through
the decision tree
F Generalize the rules by possibly deleting
unnecessary conditions
F Group rules according to target classes and delete
any that do not contribute to overall performance
- n the class
F Order the sets of rules for the target classes and
choose a default class
Machine Learning
I Rule tuning
F Rule tuning may be useful for rules derived
by a variety of other means besides C4.5
NEvaluate the contribution of individual rules NEvaluate the performance of the rule set as a
whole
Machine Learning
I A data set for classification (Quinlan)
- ------------Attributes-------------
- --Decision---
Height Hair Eyes Attractiveness
1 short blond blue + 2 tall blond brown
- 3 tall
red blue + 4 short dark blue
- 5 tall
dark blue
- 6 tall
blond blue + 7 tall dark brown
- 8 short
blond brown
Machine Learning
I A data set for classification (Quinlan)
F H(decision) = H(Attractiveness)
=
2 2
3 3 5 5 log log 0.955 8 8 8 8 − − =
Machine Learning
I A data set for classification (Quinlan)
F Height:
N short: 1, 4, 8 p(+|short) = 1/3
p(-|short) = 2/3
N tall: 2, 3, 5, 6, 7 p(+|tall) = 2/5
p(-|tall) = 3/5
F H(decision|attribute) = H(Attractiveness|Height) = F Information gain =
2 2 2 2
3 1 1 2 2 5 2 2 3 3 log log log log 0.951 8 3 3 3 3 8 5 5 5 5 − − + − − = 0.955 - 0.951 = 0.004
Machine Learning
I A data set for classification (Quinlan)
F Hair:
N blond: 1, 2, 6, 8 p(+|blond) = 2/4
p(-|blond) = 2/4
N red: 3
p(+|red) = 1/1 p(-|red) = 0/1
N dark: 4, 5, 7
p(+|dark) = 0/3 p(-|dark) = 3/3
F H(decision|attribute) = H(Attractiveness|Hair) = F Information gain =
[ ] [ ] [ ]
4 1 3 1 0.500 8 8 8 + + = 0.955 - 0.500 = 0.455
Machine Learning
I A data set for classification (Quinlan)
F Eyes:
N blue: 1, 3, 4, 5, 6 p(+|blue) = 3/5
p(-|blue) = 2/5
N brown: 2, 7, 8 p(+|brown) = 0/3 p(-|brown) = 3/3
F H(decision|attribute) = H(Attractiveness|Eyes) = F Information gain =
[ ]
2 2
5 3 3 2 2 3 log log 0.607 8 5 5 5 5 8 − − + = 0.955 - 0.607 = 0.348
Machine Learning
I A data set for classification (Quinlan)
F Hence Hair is chosen as the best choice
for the root of the tree
F Now we recursively repeat this process for
the (three) resulting branches
F In this case, the branches for Hair: red and
Hair: dark are already completely classified, and we need to work only on the sub-table for Hair: blond
Machine Learning
I A data set for classification (Quinlan)
- --------Attributes----------
- --Decision---
Height Eyes Attractiveness
1 short blue + 2 tall brown
- 6 tall
blue + 8 short brown
- F H(decision) = H(Attractiveness)
=
2 2
2 2 2 2 log log 1 4 4 4 4 − − =
Machine Learning
I A data set for classification (Quinlan)
F Height:
N short: 1, 8
p(+|short) = 1/2 p(-|short) = 1/2
N tall: 2, 6
p(+|tall) = 1/2 p(-|tall) = 1/2
F H(decision|attribute) = H(Attractiveness|Height) = F Information gain =
[ ] [ ]
2 2 1 1 1 4 4 + = 1-1=0
Machine Learning
I A data set for classification (Quinlan)
F Eyes:
N blue: 1, 6
p(+|blue) = 2/2 p(-|blue) = 0/2
N brown: 2, 8
p(+|brown) = 0/2 p(-|brown) = 2/2
F H(decision|attribute) = H(Attractiveness|Eyes) = F Information gain =
[ ] [ ]
2 2 4 4 + = 1-0=1
Machine Learning
I A data set for classification (Quinlan)
F Hence Eyes is chosen as the best root of
this subtree
F The final tree is Hair Eyes 3 4, 5, 7 2, 8 1, 6
blond red dark blue brown + + +
Machine Learning
I A data set for classification (Quinlan)
F We may now build rules from this decision
tree
N R1: (Hair, dark) --> (Attractiveness, -) N R2: (Hair, red) --> (Attractiveness, +) N R3: (Hair, blond) & (Eyes, blue) --> (Attractiveness, +) N R4: (Hair, blond) & (Eyes, brown) --> (Attractiveness, -)
F Note that height is irrelevant
Machine Learning
I A data set for classification (Quinlan)
F Dropping conditions from rules
NRules 1 and 2 have only one condition NRule 3: neither condition can be dropped (case
5 needs the first condition and case 2 needs the second condition)
NRule 4: we can drop the first condition
N R4’: (Eyes, brown) --> (Attractiveness, -)
Machine Learning
I A data set for classification (Quinlan)
F Dropping conditions from rules
N Linear
² Scan rule left to right ² Try to drop conditions one at a time ² If possible, drop for good ² Iterate (n conditions, n attempts)
N Exponential
² Scan rule left to right ² Try to drop conditions one at a time ² Then try to drop pairs, triples,etc. (n conditions, 2^n-2
attempts)
Machine Learning
I A data set for classification (Quinlan)
F Now consider Information gain ratio
F For initial root of tree we already know F H(decision)
= H(Attractiveness) =
2 2
3 3 5 5 log log 0.955 8 8 8 8 − − =
Machine Learning
I A data set for classification (Quinlan)
F H(decision|attribute) = H(Attractiveness|Height) = F H(decision|attribute) = H(Attractiveness|Hair) = F H(decision|attribute) = H(Attractiveness|Eyes) =
2 2 2 2
3 1 1 2 2 5 2 2 3 3 log log log log 0.951 8 3 3 3 3 8 5 5 5 5 − − + − − =
[ ] [ ] [ ]
4 1 3 1 0.500 8 8 8 + + =
[ ]
2 2
5 3 3 2 2 3 log log 0.607 8 5 5 5 5 8 − − + =
Machine Learning
I A data set for classification (Quinlan)
F H(attribute) = H(Height) = F H(attribute) = H(Hair) = F H(attribute) = H(Eyes) =
2 2
3 3 5 5 log log 0.955 8 8 8 8 − − =
2 2 2
4 4 1 1 3 3 log log log 1.406 8 8 8 8 8 8 − − − =
2 2
5 5 3 3 log log 0.955 8 8 8 8 − − =
Machine Learning
I A data set for classification (Quinlan)
F Hence the Information gain ratios are
NHeight: 0.004 NHair: 0.324 NEyes: 0.364
F By this criterion, Eyes is chosen as the
best root available
F The branch for Eyes: brown is already
completely classified, and we need to work
- nly on the sub-table for Eyes: blue
Machine Learning
I A data set for classification (Quinlan)
- ---Attributes----
- --Decision---
Height Hair Attractiveness
1 short blond + 3 tall red + 4 short dark
- 5 tall
dark
- 6 tall
blond + F H(decision) = H(Attractiveness)
=
2 2
3 3 2 2 log log 0.971 5 5 5 5 − − =
Machine Learning
I A data set for classification (Quinlan)
F Height:
N short: 1, 4
p(+|short) = 1/2 p(-|short) = 1/2
N tall: 3, 5, 6
p(+|tall) = 2/3 p(-|tall) = 1/3
F H(decision|attribute) = H(Attractiveness|Height) = F H(Height) =
2 2 2 2
2 1 1 1 1 3 2 2 1 1 log log log log 0.951 5 2 2 2 2 5 3 3 3 3 − − + − − =
2 2
2 2 3 3 log log 0.971 5 5 5 5 − − =
Machine Learning
I A data set for classification (Quinlan)
F Hair:
N blond: 1, 6
p(+|blond) = 2/2 p(-|blond) = 0/2
N red: 3
p(+|red) = 1/1 p(-|red) = 0/1
N dark: 4,5
p(+|dark) = 0/2 p(-|dark) = 2/2
F H(decision|attribute) = H(Attractiveness|Hair) = F H(Hair) =
2 2 2
2 2 1 1 2 2 log log log 0.793 5 5 5 5 5 5 − − − =
[ ] [ ] [ ]
2 1 2 5 5 5 + + =
Machine Learning
I A data set for classification (Quinlan)
F Hence the Information gain ratios are
NHeight: 0.021 NHair: 1.224
F By this criterion, Hair is chosen as the best
root available
Eyes Hair 3 2, 7, 8 4, 5 1, 6
blond red dark blue brown + +
- +
Machine Learning
I A data set for classification (Quinlan)
F We may now build rules from this decision
tree
N R1: (Eyes, brown) --> (Attractiveness, -) N R2: (Eyes, blue) & (Hair, blond) --> (Attractiveness, +) N R3: (Eyes, blue) & (Hair, red) --> (Attractiveness, +) N R4: (Eyes, blue) & (Hair, dark) --> (Attractiveness, -)
F These are different rules F Note that after dropping conditions,
however, they are the same - this is NOT generally true
Machine Learning
I ID3 & C4.5
F What if too many cases?
N Windowing
F What if the data is incomplete? F What if the data is inconsistent? F What if the data is continuous?
N Binarization N Discretization
F Incremental algorithms? F Pruning?