[PPT] - For Tuesday Reach chapter 18, sections 1-4 Homework: Chapter 12, PowerPoint Presentation

SLIDE 1

For Tuesday

Reach chapter 18, sections 1-4
Homework:

– Chapter 12, exercise 7

SLIDE 2

Program 3

Any questions?

SLIDE 3

And Besides Logic?

Semantic networks
Frames

SLIDE 4

Semantic Networks

Use graphs to represent concepts and the

relations between them.

Simplest networks are ISA hierarchies
Must be careful to make a type/token

distinction:

Garfield isa Cat Cat(Garfield) Cat isa Feline "x (Cat (x)  Feline(x))

Restricted shorthand for a logical

representation.

SLIDE 5

Semantic Nets/Frames

Labeled links can represent arbitrary

relations between objects and/or concepts.

Nodes with links can also be viewed as

frames with slots that point to other objects and/or concepts.

SLIDE 6

First Order Representation

Rel(Alive,Animals,T) Rel(Flies,Animals,F) Birds  Animals Mammals  Animals Rel(Flies,Birds,T) Rel(Legs,Birds,2) Rel(Legs,Mammals,4) Penguins  Birds Cats  Mammals Bats  Mammals Rel(Flies,Penguins,F) Rel(Legs,Bats,2) Rel(Flies,Bats,T) Opus  Penguins Bill  Cats Pat  Bats Name(Opus,"Opus") Name(Bill,"Bill") Friend(Opus,Bill) Friend(Bill,Opus) Name(Pat,"Pat")

SLIDE 7

Inheritance

Inheritance is a specific type of inference that allows

properties of objects to be inferred from properties of categories to which the object belongs.

– Is Bill alive? – Yes, since Bill is a cat, cats are mammals, mammals are animals, and animals are alive.

Such inference can be performed by a simple graph

traversal algorithm and implemented very efficiently.

However, it is basically a form of logical inference

"x (Cat(x)  Mammal(x)) "x (Mammal(x)  Animal(x)) "x (Animal(x)  Alive(x)) Cat(Bill) |- Alive(Bill)

SLIDE 8

Backward or Forward

Can work either way
Either can be inefficient
Usually depends on branching factors

SLIDE 9

Semantic of Links

Must be careful to distinguish different

types of links.

Links between tokens and tokens are

different than links between types and types and links between tokens and types.

SLIDE 10

Link Types

Link Type Semantics Example

A subset B A  B Cats  Mammals A member B A  B Bill  Cats A R B R(A,B) Bill Age 12 A R B "x, x  A  R(x,B) Birds Legs 2 A R B "x  y, x  A  y  B  R(x,y) Birds Parent Birds

SLIDE 11

Inheritance with Exceptions

Information specified for a type gives the

default value for a relation, but this may be

ver-ridden by a more specific type.

– Tweety is a bird. Does Tweety fly? Birds fly. Yes. – Opus is a penguin. Does Opus fly? Penguin's don't fly. No.

SLIDE 12

Multiple Inheritance

If hierarchy is not a tree but a directed

acyclic graph (DAG) then different inheritance paths may result in different defaults being inherited.

Nixon Diamond

SLIDE 13

Nonmonotonicity

In normal monotonic logic, adding more

sentences to a KB only entails more conclusions.

if KB |- P then KB  {S} |- P

Inheritance with exceptions is not

monotonic (it is nonmonotonic)

– Bird(Opus) – Fly(Opus)? yes – Penguin(Opus) – Fly(Opus)? no

SLIDE 14

Nonmonotonic logics attempt to formalize

default reasoning by allow default rules of the form:

– If P and concluding Q is consistent, then conclude Q. – If Bird(X) then if consistent Fly(x)

SLIDE 15

Defaults with Negation as Failure

Prolog negation as failure can be used to

implement default inference.

fly(X) :- bird(X), not(ab(X)). ab(X) :- penguin(X). ab(X) :- ostrich(X). bird(opus). ? fly(opus). Yes penguin(opus). ? fly(opus). No

SLIDE 16

Machine Learning

What do you think it is?

SLIDE 17

Machine Learning

Defintion by Herb Simon: “Any process by

which a system improves performance.”

SLIDE 18

Tasks

Classification:

– medical diagnosis, credit-card applications or transactions, investments, DNA sequences, spoken words, handwritten letters, astronomical images

Problem solving, planning, and acting

– solving calculus problems, playing checkers, chess, or backgamon, balancing a pole, driving a car

SLIDE 19

Performance

How can we measure performance?
That is, what kinds of things do we want to

get out of the learning process, and how do we tell whether we’re getting them?

SLIDE 20

Performance Measures

Classification accuracy
Solution correctness and quality
Speed of performance

SLIDE 21

Why Study Learning?

(Other than your professor’s interest in it)

SLIDE 22

Study Learning Because ...

We want computer systems with new capabilities

– Develop systems that are too difficult or impossible to construct manually because they require specific detailed knowledge or skills tuned to a particular complex task (knowledge acquisition bottleneck). – Develop systems that can automatically adapt and customize themselves to the needs of individual users through experience, e.g. a personalized news or mail filter, personalized tutoring. – Discover knowledge and patterns in databases, data mining, e.g. discovering purchasing patterns for marketing purposes.

SLIDE 23

Study Learning Because ...

Understand human and biological learning

and teaching better.

– Power law of practice. – Relative difficulty of learning disjunctive concepts.

Time is right:

– Initial algorithms and theory in place. – Growing amounts of on-line data. – Computational power available.

SLIDE 24

Designing a Learning System

Choose the training experience.
Choose what exactly is to be learned, i.e.

the target function.

Choose how to represent the target function.
Choose a learning algorithm to learn the

target function from the experience.

Must distinguish between the learner and

the performance element.

SLIDE 25

Architecture of a Learner

Performance System Critic Generalizer Experiment Generator

trace of behavior training instances learned function new problem

SLIDE 26

Training Experience Issues

Direct or Indirect Experience

– Direct: Chess boards labeled with correct move extracted from record of expert play. – Indirect: Potentially arbitrary sequences of moves and final games results.

Credit/Blame assignment:

– How do we assign blame to individual choices

r moves when given only indirect feedback?

SLIDE 27

More on Training Experience

Source of training data:

– “Random” examples outside of learner’s control (negative examples available?) – Selected examples chosen by a benevolent teacher (near misses available?) – Ability to query oracle about correct classifications. – Ability to design and run experiments to collect one's

wn data.
Distribution of training data:

– Generally assume training data is representative of the examples to be judged on when tested for final performance.

SLIDE 28

Supervision of Learning

Supervised
Unsupervised
Reinforcement

SLIDE 29

Concept Learning

The most studied task in machine learning

is inferring a function that classifies examples represented in some language as members or non-members of a concept from pre-classified training examples.

This is called concept learning, or

classification.

SLIDE 30

Simple Example

Example Size Color Shape Class 1 small red circle positive 2 big red circle positive 3 small red triangle negative 4 big blue circle negative

SLIDE 31

Concept Learning Definitions

An instance is a description of a specific item. X is

the space of all instances (instance space).

The target concept, c(x), is a binary function over

instances.

A training example is an instance labeled with its

correct value for c(x) (positive or negative). D is the set of all training examples.

The hypothesis space, H, is the set of functions,

h(x), that the learner can consider as possible definitions of c(x).

The goal of concept learning is to find an h in H

such that for all <x, c(x)> in D, h(x)= c(x).

SLIDE 32

Sample Hypothesis Space

Consider a hypothesis language defined by a

conjunction of constraints.

For instances described by n features consider a

vector of n constraints, <c1,c2,...c> where each ci is either:

– ?, indicating that any value is possible for the ith feature – A specific value from the domain of the ith feature – , indicating no value is acceptable

Sample hypotheses in this language:

– <big, red, ?> – <?,?,?> (most general hypothesis) – <,,> (most specific hypothesis)

SLIDE 33

Inductive Learning Hypothesis

Any hypothesis that is found to approximate

the target function well over a a sufficiently large set of training examples will also approximate the target function well over

ther unobserved examples.

– Assumes that the training and test examples are drawn from the same general distribution. – This is fundamentally an unprovable hypothesis unless additional assumptions are made about the target concept.

SLIDE 34

Concept Learning As Search

Concept learning can be viewed as searching the

space of hypotheses for one (or more) consistent with the training instances.

Consider an instance space consisting of n binary

features, which therefore has 2n instances.

For conjunctive hypotheses, there are 4 choices for

each feature: T, F, , ?, so there are 4n syntactically distinct hypotheses, but any hypothesis with a  is the empty hypothesis, so there are 3n + 1 semantically distinct hypotheses.

SLIDE 35

Search cont.

The target concept could in principle be any of the

22^n (2 to the 2 to the n) possible binary functions

n n binary inputs.
Frequently, the hypothesis space is very large or

even infinite and intractable to search exhaustively.

SLIDE 36

Learning by Enumeration

For any finite or countably infinite hypothesis space, one

can simply enumerate and test hypotheses one by one until

ne is found that is consistent with the training data.

For each h in H do initialize consistent to true For each <x, c(x)> in D do if h(x)¹c(x) then set consistent to false If consistent then return h

This algorithm is guaranteed to terminate with a consistent

hypothesis if there is one; however it is obviously intractable for most practical hypothesis spaces, which are at least exponentially large.

SLIDE 37

Finding a Maximally Specific Hypothesis (FIND-S)

Can use the generality ordering to find a

most specific hypothesis consistent with a set of positive training examples by starting with the most specific hypothesis in H and generalizing it just enough each time it fails to cover a positive example.

SLIDE 38

Initialize h = <,,…,> For each positive training instance x For each attribute ai If the constraint on ai in h is satisfied by x Then do nothing Else If ai = Then set ai in h to its value in x Else set a i to ``?'' Initialize consistent := true For each negative training instance x if h(x)=1 then set consistent := false If consistent then return h

SLIDE 39

Example Trace

h = <,,> Encounter <small, red, circle> as positive h = <small, red, circle> Encounter <big, red, circle> as positive h = <?, red, circle> Check to ensure consistency with any negative examples: Negative: <small, red, triangle>  Negative: <big, blue, circle> 

SLIDE 40

Comments on FIND-S

For conjunctive feature vectors, the most

specific hypothesis that covers a set of positives is unique and found by FIND-S.

If the most specific hypothesis consistent

with the positives is inconsistent with a negative training example, then there is no conjunctive hypothesis consistent with the data since by definition it cannot be made any more specific and still cover all of the positives.

SLIDE 41

Example

Positives: <big, red, circle>, <small, blue, circle> Negatives: <small, red, circle> FIND-S -> <?, ?, circle> which matches negative

SLIDE 42

Inductive Bias

A hypothesis space that does not not include

every possible binary function on the instance space incorporates a bias in the type of concepts it can learn.

Any means that a concept learning system

uses to choose between two functions that are both consistent with the training data is called inductive bias.

SLIDE 43

Forms of Inductive Bias

Language bias:

– The language for representing concepts defines a hypothesis space that does not include all possible functions (e.g. conjunctive descriptions).

Search bias:

– The language is expressive enough to represent all possible functions (e.g. disjunctive normal form) but the search algorithm embodies a preference for certain consistent functions over

thers (e.g. syntactic simplicity).

SLIDE 44

Unbiased Learning

For instances described by n attributes each

with m values, there are mn instances and therefore 2m^n possible binary functions.

For m=2, n=10, there are 3.4 x 1038

functions, of which only 59,049 can be represented by conjunctions (a small percentage indeed!).

However unbiased learning is futile since if

we consider all possible functions then simply memorizing the data without any effective generalization is an option.

SLIDE 45

Lessons

Function approximation can be viewed as a

search through a pre-defined space of hypotheses (a representation language) for a hypothesis which best fits the training data.

Different learning methods assume different

hypothesis spaces or employ different search techniques.

SLIDE 46

Varying Learning Methods

Can vary the representation:

– Numerical function – Rules or logicial functions – Nearest neighbor (case based)

Can vary the search algorithm:

– Gradient descent – Divide and conquer – Genetic algorithm

SLIDE 47

Evaluation of Learning Methods

Experimental: Conduct well controlled

experiments that compare various methods on benchmark problems, gather data on their performance (e.g. accuracy, run-time), and analyze the results for significant differences.

Theoretical: Analyze algorithms mathematically

and prove theorems about their computational complexity, ability to produce hypotheses that fit the training data, or number of examples needed to produce a hypothesis that accurately generalizes to unseen data (sample complexity).

SLIDE 48

Empirical Evaluation

Training and Testing
Leave-One-Out
Cross-validation
Learning Curves