SLIDE 1 For Tuesday
- Reach chapter 18, sections 1-4
- Homework:
– Chapter 12, exercise 7
SLIDE 3 And Besides Logic?
SLIDE 4 Semantic Networks
- Use graphs to represent concepts and the
relations between them.
- Simplest networks are ISA hierarchies
- Must be careful to make a type/token
distinction:
Garfield isa Cat Cat(Garfield) Cat isa Feline "x (Cat (x) Feline(x))
- Restricted shorthand for a logical
representation.
SLIDE 5 Semantic Nets/Frames
- Labeled links can represent arbitrary
relations between objects and/or concepts.
- Nodes with links can also be viewed as
frames with slots that point to other objects and/or concepts.
SLIDE 6
First Order Representation
Rel(Alive,Animals,T) Rel(Flies,Animals,F) Birds Animals Mammals Animals Rel(Flies,Birds,T) Rel(Legs,Birds,2) Rel(Legs,Mammals,4) Penguins Birds Cats Mammals Bats Mammals Rel(Flies,Penguins,F) Rel(Legs,Bats,2) Rel(Flies,Bats,T) Opus Penguins Bill Cats Pat Bats Name(Opus,"Opus") Name(Bill,"Bill") Friend(Opus,Bill) Friend(Bill,Opus) Name(Pat,"Pat")
SLIDE 7 Inheritance
- Inheritance is a specific type of inference that allows
properties of objects to be inferred from properties of categories to which the object belongs.
– Is Bill alive? – Yes, since Bill is a cat, cats are mammals, mammals are animals, and animals are alive.
- Such inference can be performed by a simple graph
traversal algorithm and implemented very efficiently.
- However, it is basically a form of logical inference
"x (Cat(x) Mammal(x)) "x (Mammal(x) Animal(x)) "x (Animal(x) Alive(x)) Cat(Bill) |- Alive(Bill)
SLIDE 8 Backward or Forward
- Can work either way
- Either can be inefficient
- Usually depends on branching factors
SLIDE 9 Semantic of Links
- Must be careful to distinguish different
types of links.
- Links between tokens and tokens are
different than links between types and types and links between tokens and types.
SLIDE 10
Link Types
Link Type Semantics Example
A subset B A B Cats Mammals A member B A B Bill Cats A R B R(A,B) Bill Age 12 A R B "x, x A R(x,B) Birds Legs 2 A R B "x y, x A y B R(x,y) Birds Parent Birds
SLIDE 11 Inheritance with Exceptions
- Information specified for a type gives the
default value for a relation, but this may be
- ver-ridden by a more specific type.
– Tweety is a bird. Does Tweety fly? Birds fly. Yes. – Opus is a penguin. Does Opus fly? Penguin's don't fly. No.
SLIDE 12 Multiple Inheritance
- If hierarchy is not a tree but a directed
acyclic graph (DAG) then different inheritance paths may result in different defaults being inherited.
SLIDE 13 Nonmonotonicity
- In normal monotonic logic, adding more
sentences to a KB only entails more conclusions.
if KB |- P then KB {S} |- P
- Inheritance with exceptions is not
monotonic (it is nonmonotonic)
– Bird(Opus) – Fly(Opus)? yes – Penguin(Opus) – Fly(Opus)? no
SLIDE 14
- Nonmonotonic logics attempt to formalize
default reasoning by allow default rules of the form:
– If P and concluding Q is consistent, then conclude Q. – If Bird(X) then if consistent Fly(x)
SLIDE 15 Defaults with Negation as Failure
- Prolog negation as failure can be used to
implement default inference.
fly(X) :- bird(X), not(ab(X)). ab(X) :- penguin(X). ab(X) :- ostrich(X). bird(opus). ? fly(opus). Yes penguin(opus). ? fly(opus). No
SLIDE 16 Machine Learning
SLIDE 17 Machine Learning
- Defintion by Herb Simon: “Any process by
which a system improves performance.”
SLIDE 18 Tasks
– medical diagnosis, credit-card applications or transactions, investments, DNA sequences, spoken words, handwritten letters, astronomical images
- Problem solving, planning, and acting
– solving calculus problems, playing checkers, chess, or backgamon, balancing a pole, driving a car
SLIDE 19 Performance
- How can we measure performance?
- That is, what kinds of things do we want to
get out of the learning process, and how do we tell whether we’re getting them?
SLIDE 20 Performance Measures
- Classification accuracy
- Solution correctness and quality
- Speed of performance
SLIDE 21 Why Study Learning?
- (Other than your professor’s interest in it)
SLIDE 22 Study Learning Because ...
- We want computer systems with new capabilities
– Develop systems that are too difficult or impossible to construct manually because they require specific detailed knowledge or skills tuned to a particular complex task (knowledge acquisition bottleneck). – Develop systems that can automatically adapt and customize themselves to the needs of individual users through experience, e.g. a personalized news or mail filter, personalized tutoring. – Discover knowledge and patterns in databases, data mining, e.g. discovering purchasing patterns for marketing purposes.
SLIDE 23 Study Learning Because ...
- Understand human and biological learning
and teaching better.
– Power law of practice. – Relative difficulty of learning disjunctive concepts.
– Initial algorithms and theory in place. – Growing amounts of on-line data. – Computational power available.
SLIDE 24 Designing a Learning System
- Choose the training experience.
- Choose what exactly is to be learned, i.e.
the target function.
- Choose how to represent the target function.
- Choose a learning algorithm to learn the
target function from the experience.
- Must distinguish between the learner and
the performance element.
SLIDE 25 Architecture of a Learner
Performance System Critic Generalizer Experiment Generator
trace of behavior training instances learned function new problem
SLIDE 26 Training Experience Issues
- Direct or Indirect Experience
– Direct: Chess boards labeled with correct move extracted from record of expert play. – Indirect: Potentially arbitrary sequences of moves and final games results.
– How do we assign blame to individual choices
- r moves when given only indirect feedback?
SLIDE 27 More on Training Experience
– “Random” examples outside of learner’s control (negative examples available?) – Selected examples chosen by a benevolent teacher (near misses available?) – Ability to query oracle about correct classifications. – Ability to design and run experiments to collect one's
- wn data.
- Distribution of training data:
– Generally assume training data is representative of the examples to be judged on when tested for final performance.
SLIDE 28 Supervision of Learning
- Supervised
- Unsupervised
- Reinforcement
SLIDE 29 Concept Learning
- The most studied task in machine learning
is inferring a function that classifies examples represented in some language as members or non-members of a concept from pre-classified training examples.
- This is called concept learning, or
classification.
SLIDE 30
Simple Example
Example Size Color Shape Class 1 small red circle positive 2 big red circle positive 3 small red triangle negative 4 big blue circle negative
SLIDE 31 Concept Learning Definitions
- An instance is a description of a specific item. X is
the space of all instances (instance space).
- The target concept, c(x), is a binary function over
instances.
- A training example is an instance labeled with its
correct value for c(x) (positive or negative). D is the set of all training examples.
- The hypothesis space, H, is the set of functions,
h(x), that the learner can consider as possible definitions of c(x).
- The goal of concept learning is to find an h in H
such that for all <x, c(x)> in D, h(x)= c(x).
SLIDE 32 Sample Hypothesis Space
- Consider a hypothesis language defined by a
conjunction of constraints.
- For instances described by n features consider a
vector of n constraints, <c1,c2,...c> where each ci is either:
– ?, indicating that any value is possible for the ith feature – A specific value from the domain of the ith feature – , indicating no value is acceptable
- Sample hypotheses in this language:
– <big, red, ?> – <?,?,?> (most general hypothesis) – <,,> (most specific hypothesis)
SLIDE 33 Inductive Learning Hypothesis
- Any hypothesis that is found to approximate
the target function well over a a sufficiently large set of training examples will also approximate the target function well over
- ther unobserved examples.
– Assumes that the training and test examples are drawn from the same general distribution. – This is fundamentally an unprovable hypothesis unless additional assumptions are made about the target concept.
SLIDE 34 Concept Learning As Search
- Concept learning can be viewed as searching the
space of hypotheses for one (or more) consistent with the training instances.
- Consider an instance space consisting of n binary
features, which therefore has 2n instances.
- For conjunctive hypotheses, there are 4 choices for
each feature: T, F, , ?, so there are 4n syntactically distinct hypotheses, but any hypothesis with a is the empty hypothesis, so there are 3n + 1 semantically distinct hypotheses.
SLIDE 35 Search cont.
- The target concept could in principle be any of the
22^n (2 to the 2 to the n) possible binary functions
- n n binary inputs.
- Frequently, the hypothesis space is very large or
even infinite and intractable to search exhaustively.
SLIDE 36 Learning by Enumeration
- For any finite or countably infinite hypothesis space, one
can simply enumerate and test hypotheses one by one until
- ne is found that is consistent with the training data.
For each h in H do initialize consistent to true For each <x, c(x)> in D do if h(x)¹c(x) then set consistent to false If consistent then return h
- This algorithm is guaranteed to terminate with a consistent
hypothesis if there is one; however it is obviously intractable for most practical hypothesis spaces, which are at least exponentially large.
SLIDE 37 Finding a Maximally Specific Hypothesis (FIND-S)
- Can use the generality ordering to find a
most specific hypothesis consistent with a set of positive training examples by starting with the most specific hypothesis in H and generalizing it just enough each time it fails to cover a positive example.
SLIDE 38
Initialize h = <,,…,> For each positive training instance x For each attribute ai If the constraint on ai in h is satisfied by x Then do nothing Else If ai = Then set ai in h to its value in x Else set a i to ``?'' Initialize consistent := true For each negative training instance x if h(x)=1 then set consistent := false If consistent then return h
SLIDE 39
Example Trace
h = <,,> Encounter <small, red, circle> as positive h = <small, red, circle> Encounter <big, red, circle> as positive h = <?, red, circle> Check to ensure consistency with any negative examples: Negative: <small, red, triangle> Negative: <big, blue, circle>
SLIDE 40 Comments on FIND-S
- For conjunctive feature vectors, the most
specific hypothesis that covers a set of positives is unique and found by FIND-S.
- If the most specific hypothesis consistent
with the positives is inconsistent with a negative training example, then there is no conjunctive hypothesis consistent with the data since by definition it cannot be made any more specific and still cover all of the positives.
SLIDE 41
Example
Positives: <big, red, circle>, <small, blue, circle> Negatives: <small, red, circle> FIND-S -> <?, ?, circle> which matches negative
SLIDE 42 Inductive Bias
- A hypothesis space that does not not include
every possible binary function on the instance space incorporates a bias in the type of concepts it can learn.
- Any means that a concept learning system
uses to choose between two functions that are both consistent with the training data is called inductive bias.
SLIDE 43 Forms of Inductive Bias
– The language for representing concepts defines a hypothesis space that does not include all possible functions (e.g. conjunctive descriptions).
– The language is expressive enough to represent all possible functions (e.g. disjunctive normal form) but the search algorithm embodies a preference for certain consistent functions over
- thers (e.g. syntactic simplicity).
SLIDE 44 Unbiased Learning
- For instances described by n attributes each
with m values, there are mn instances and therefore 2m^n possible binary functions.
- For m=2, n=10, there are 3.4 x 1038
functions, of which only 59,049 can be represented by conjunctions (a small percentage indeed!).
- However unbiased learning is futile since if
we consider all possible functions then simply memorizing the data without any effective generalization is an option.
SLIDE 45 Lessons
- Function approximation can be viewed as a
search through a pre-defined space of hypotheses (a representation language) for a hypothesis which best fits the training data.
- Different learning methods assume different
hypothesis spaces or employ different search techniques.
SLIDE 46 Varying Learning Methods
- Can vary the representation:
– Numerical function – Rules or logicial functions – Nearest neighbor (case based)
- Can vary the search algorithm:
– Gradient descent – Divide and conquer – Genetic algorithm
SLIDE 47 Evaluation of Learning Methods
- Experimental: Conduct well controlled
experiments that compare various methods on benchmark problems, gather data on their performance (e.g. accuracy, run-time), and analyze the results for significant differences.
- Theoretical: Analyze algorithms mathematically
and prove theorems about their computational complexity, ability to produce hypotheses that fit the training data, or number of examples needed to produce a hypothesis that accurately generalizes to unseen data (sample complexity).
SLIDE 48 Empirical Evaluation
- Training and Testing
- Leave-One-Out
- Cross-validation
- Learning Curves