[PPT] - Computational Learning Theory: Positive and negative learnability PowerPoint Presentation

SLIDE 1

Machine Learning

Computational Learning Theory: Positive and negative learnability results

1

Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others

SLIDE 2

Computational Learning Theory

The Theory of Generalization
Probably Approximately Correct (PAC) learning
Positive and negative learnability results
Agnostic Learning
Shattering and the VC dimension

2

SLIDE 3

This lecture: Computational Learning Theory

The Theory of Generalization
Probably Approximately Correct (PAC) learning
Positive and negative learnability results
Agnostic Learning
Shattering and the VC dimension

3

SLIDE 4

What can be learned

General conjunctions are PAC learnable

– 𝐼 = Number of conjunctions of 𝑜 variables = 3!

ln 𝐼 = 𝑜 ln(3)

– Number of examples needed 𝑛 > "

# 𝑜 ln 3 + ln " $

4

𝑛 > 1 𝜗 ln 𝐼 + ln 1 𝜀

SLIDE 5

If we want to guarantee a 95% chance of learning a hypothesis of at least 90%

accuracy, with 𝑜 = 10 Boolean variables, we need m >

!"

! "."$ #$% !" &

%.$

= 140 examples

If 𝑜 = 100, this goes to 1129. (linearly increases with n)
Increasing the confidence to 99% will cost 1145 examples (logarithmic with 𝜀)

These results hold for any consistent learner

What can be learned

General conjunctions are PAC learnable

– 𝐼 = Number of conjunctions of 𝑜 variables = 3!

ln 𝐼 = 𝑜 ln(3)

– Number of examples needed 𝑛 > "

# 𝑜 ln 3 + ln " $

5

𝑛 > 1 𝜗 ln 𝐼 + ln 1 𝜀

SLIDE 6

If we want to guarantee a 95% chance of learning a hypothesis of at least 90%

accuracy, with 𝑜 = 10 Boolean variables, we need m >

!"

! "."$ #$% !" &

%.$

= 140 examples

If 𝑜 = 100, this goes to 1129. (linearly increases with n)
Increasing the confidence to 99% will cost 1145 examples (logarithmic with 𝜀)

These results hold for any consistent learner

What can be learned

General conjunctions are PAC learnable

– 𝐼 = Number of conjunctions of 𝑜 variables = 3!

ln 𝐼 = 𝑜 ln(3)

– Number of examples needed 𝑛 > "

# 𝑜 ln 3 + ln " $

6

𝑛 > 1 𝜗 ln 𝐼 + ln 1 𝜀

SLIDE 7

What can be learned

General conjunctions are PAC learnable

– 𝐼 = Number of conjunctions of 𝑜 variables = 3!

ln 𝐼 = 𝑜 ln(3)

– Number of examples needed 𝑛 > "

# 𝑜 ln 3 + ln " $

7

If we want to guarantee a 95% chance of learning a hypothesis of at least 90%

accuracy, with 𝑜 = 10 Boolean variables, we need m >

!"

! "."$ #$% !" &

%.$

= 140 examples

If 𝑜 = 100, this goes to 1129. (linearly increases with n)
Increasing the confidence to 99% will cost 1145 examples (logarithmic with 𝜀)

These results hold for any consistent learner 𝑛 > 1 𝜗 ln 𝐼 + ln 1 𝜀

SLIDE 8

What can be learned

General conjunctions are PAC learnable

– 𝐼 = Number of conjunctions of 𝑜 variables = 3!

ln 𝐼 = 𝑜 ln(3)

– Number of examples needed 𝑛 > "

# 𝑜 ln 3 + ln " $

8

𝑛 > 1 𝜗 ln 𝐼 + ln 1 𝜀

If we want to guarantee a 95% chance of learning a hypothesis of at least 90%

accuracy, with 𝑜 = 10 Boolean variables, we need m >

!"

! "."$ #$% !" &

%.$

= 140 examples

If 𝑜 = 100, this goes to 1129. (linearly increases with n)
Increasing the confidence to 99% will cost 1145 examples (logarithmic with 𝜀)

These results hold for any consistent learner

SLIDE 9

What can be learned

3-CNF

9

Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation) 𝑛 > 1 𝜗 ln 𝐼 + ln 1 𝜀

𝑚"" ∨ 𝑚"6 ∨ 𝑚"7 ∧ 𝑚6" ∨ 𝑚66 ∨ 𝑚67

SLIDE 10

What can be learned

3-CNF

10

Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation) 𝑛 > 1 𝜗 ln 𝐼 + ln 1 𝜀

𝑚"" ∨ 𝑚"6 ∨ 𝑚"7 ∧ 𝑚6" ∨ 𝑚66 ∨ 𝑚67

SLIDE 11

What can be learned

3-CNF

11

Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)

What is the sample complexity? That is, if we had a consistent learner, how many examples will it need to guarantee PAC learnability? We need the size of the hypothesis space. How many 3CNFs are there?

Number of conjuncts =
A 3-CNF is a conjunction with these many variables.
|H| = Number of 3-CNFs =
log(|H|) = O(n3)

log(|H|) is polynomial in n ) the sample complexity is also polynomial in n 𝑛 > 1 𝜗 ln 𝐼 + ln 1 𝜀

𝑚"" ∨ 𝑚"6 ∨ 𝑚"7 ∧ 𝑚6" ∨ 𝑚66 ∨ 𝑚67

SLIDE 12

What can be learned

3-CNF

12

Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)

What is the sample complexity? That is, if we had a consistent learner, how many examples will it need to guarantee PAC learnability? We need the size of the hypothesis space. How many 3CNFs are there?

Number of conjuncts = 𝑃

2𝑜 !

A 3-CNF is a conjunction with these many variables.
|𝐼| = Number of 3-CNFs = 𝑃 2 "# (
log |𝐼| = 𝑃(𝑜!)

log(|H|) is polynomial in n ) the sample complexity is also polynomial in n 𝑛 > 1 𝜗 ln 𝐼 + ln 1 𝜀

𝑚"" ∨ 𝑚"6 ∨ 𝑚"7 ∧ 𝑚6" ∨ 𝑚66 ∨ 𝑚67

SLIDE 13

What can be learned

3-CNF

13

Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)

What is the sample complexity? That is, if we had a consistent learner, how many examples will it need to guarantee PAC learnability? We need the size of the hypothesis space. How many 3CNFs are there?

Number of conjuncts = 𝑃

2𝑜 !

A 3-CNF is a conjunction with these many variables.
|𝐼| = Number of 3-CNFs = 𝑃 2 "# (
log |𝐼| = 𝑃(𝑜!)

log(|H|) is polynomial in n ) the sample complexity is also polynomial in n 𝑛 > 1 𝜗 ln 𝐼 + ln 1 𝜀

𝑚"" ∨ 𝑚"6 ∨ 𝑚"7 ∧ 𝑚6" ∨ 𝑚66 ∨ 𝑚67

SLIDE 14

What can be learned

3-CNF

14

Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)

What is the sample complexity? That is, if we had a consistent learner, how many examples will it need to guarantee PAC learnability? We need the size of the hypothesis space. How many 3CNFs are there?

Number of conjuncts = 𝑃

2𝑜 !

A 3-CNF is a conjunction with these many variables.
|𝐼| = Number of 3-CNFs = 𝑃 2 "# (
log |𝐼| = 𝑃(𝑜!)

log(|H|) is polynomial in n ) the sample complexity is also polynomial in n 𝑛 > 1 𝜗 ln 𝐼 + ln 1 𝜀

𝑚"" ∨ 𝑚"6 ∨ 𝑚"7 ∧ 𝑚6" ∨ 𝑚66 ∨ 𝑚67

SLIDE 15

What can be learned

3-CNF

15

Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)

What is the sample complexity? That is, if we had a consistent learner, how many examples will it need to guarantee PAC learnability? We need the size of the hypothesis space. How many 3CNFs are there?

Number of conjuncts = 𝑃

2𝑜 !

A 3-CNF is a conjunction with these many variables.
|𝐼| = Number of 3-CNFs = 𝑃 2 "# (
log |𝐼| = 𝑃(𝑜!)

log(|H|) is polynomial in n ) the sample complexity is also polynomial in n 𝑛 > 1 𝜗 ln 𝐼 + ln 1 𝜀

𝑚"" ∨ 𝑚"6 ∨ 𝑚"7 ∧ 𝑚6" ∨ 𝑚66 ∨ 𝑚67

SLIDE 16

What can be learned

3-CNF

16

Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)

What is the sample complexity? That is, if we had a consistent learner, how many examples will it need to guarantee PAC learnability? We need the size of the hypothesis space. How many 3CNFs are there?

Number of conjuncts = 𝑃

2𝑜 !

A 3-CNF is a conjunction with these many variables.
|𝐼| = Number of 3-CNFs = 𝑃 2 "# (
log |𝐼| = 𝑃(𝑜!)

log(|H|) is polynomial in n ) the sample complexity is also polynomial in n 𝑛 > 1 𝜗 ln 𝐼 + ln 1 𝜀

𝑚"" ∨ 𝑚"6 ∨ 𝑚"7 ∧ 𝑚6" ∨ 𝑚66 ∨ 𝑚67

SLIDE 17

What can be learned

3-CNF

17

Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)

What is the sample complexity? That is, if we had a consistent learner, how many examples will it need to guarantee PAC learnability? We need the size of the hypothesis space. How many 3CNFs are there?

Number of conjuncts = 𝑃

2𝑜 !

A 3-CNF is a conjunction with these many variables.
|𝐼| = Number of 3-CNFs = 𝑃 2 "# (
log |𝐼| = 𝑃(𝑜!)

𝑛 > 1 𝜗 ln 𝐼 + ln 1 𝜀

𝑚"" ∨ 𝑚"6 ∨ 𝑚"7 ∧ 𝑚6" ∨ 𝑚66 ∨ 𝑚67

SLIDE 18

What can be learned

3-CNF

18

Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)

What is the sample complexity? That is, if we had a consistent learner, how many examples will it need to guarantee PAC learnability? We need the size of the hypothesis space. How many 3CNFs are there?

Number of conjuncts = 𝑃

2𝑜 !

A 3-CNF is a conjunction with these many variables.
|𝐼| = Number of 3-CNFs = 𝑃 2 "# (
log |𝐼| = 𝑃(𝑜!)

𝑛 > 1 𝜗 ln 𝐼 + ln 1 𝜀 log (|𝐼|) is polynomial in 𝑜 ⇒ the sample complexity is also polynomial in n

𝑚"" ∨ 𝑚"6 ∨ 𝑚"7 ∧ 𝑚6" ∨ 𝑚66 ∨ 𝑚67

SLIDE 19

What can be learned

3-CNF

19

Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)

What is the sample complexity? That is, if we had a consistent learner, how many examples will it need to guarantee PAC learnability? We need the size of the hypothesis space. How many 3CNFs are there?

Number of conjuncts = 𝑃

2𝑜 !

A 3-CNF is a conjunction with these many variables.
|𝐼| = Number of 3-CNFs = 𝑃 2 "# (
log |𝐼| = 𝑃(𝑜!)

log (|𝐼|) is polynomial in 𝑜 ⇒ the sample complexity is also polynomial in 𝑜 For PAC learnability, we still need an efficient algorithm that will find a consistent hypothesis. Exercise: Find one 𝑛 > 1 𝜗 ln 𝐼 + ln 1 𝜀

𝑚"" ∨ 𝑚"6 ∨ 𝑚"7 ∧ 𝑚6" ∨ 𝑚66 ∨ 𝑚67

SLIDE 20

What can be learned

General Boolean functions

How many Boolean functions exist with 𝑜 variables?

So log(|𝐼|) is exponential.

General Boolean functions are not PAC learnable

20

𝑛 > 1 𝜗 ln 𝐼 + ln 1 𝜀

SLIDE 21

What can be learned

General Boolean functions

How many Boolean functions exist with 𝑜 variables? 26!

So log(|𝐼|) is exponential.

General Boolean functions are not PAC learnable

21

𝑛 > 1 𝜗 ln 𝐼 + ln 1 𝜀

SLIDE 22

What can be learned

General Boolean functions

How many Boolean functions exist with 𝑜 variables? 26!

So log(|𝐼|) is exponential.

General Boolean functions are not PAC learnable

22

𝑛 > 1 𝜗 ln 𝐼 + ln 1 𝜀

SLIDE 23

Sample Complexity

k-CNF: Conjunctions of any number of clauses where each disjunctive

clause has at most k literals.

k-clause-CNF: Conjunctions of at most k disjunctive clauses.
k-DNF: Disjunctions of any number of terms where each conjunctive

term has at most k literals.

k-term-DNF: Disjunctions of at most k conjunctive terms.

f = T

1∨T2 ∨..∨.Tm

Ti = l1∧l2 ∧...∧lk

Suppose we want to learn a 2-term-DNF What should our hypothesis class be?

All these classes can be learned using a polynomial size sample

23

𝑔 = 𝐷$ ∧ 𝐷) ∧ ⋯ ∧ 𝐷* 𝐷+ = 𝑚$ ∨ 𝑚) ∨ ⋯ ∨ 𝑚, ln k−clause−CNF = 𝑃 𝑙𝑜

SLIDE 24

Sample Complexity

k-CNF: Conjunctions of any number of clauses where each disjunctive

clause has at most k literals.

k-clause-CNF: Conjunctions of at most k disjunctive clauses.
k-DNF: Disjunctions of any number of terms where each conjunctive

term has at most k literals.

k-term-DNF: Disjunctions of at most k conjunctive terms.

Suppose we want to learn a 2-term-DNF What should our hypothesis class be?

All these classes can be learned using a polynomial size sample

24

𝑔 = 𝐷$ ∧ 𝐷) ∧ ⋯ ∧ 𝐷* 𝐷+ = 𝑚$ ∨ 𝑚) ∨ ⋯ ∨ 𝑚, ln k−clause−CNF = 𝑃 𝑙𝑜 𝑔 = 𝑈

$ ∨ 𝑈) ∨ ⋯ ∨ 𝑈 ,

𝐷+ = 𝑚$ ∧ 𝑚) ∧ ⋯ ∧ 𝑚*

SLIDE 25

Sample Complexity

k-CNF: Conjunctions of any number of clauses where each disjunctive

clause has at most k literals.

k-clause-CNF: Conjunctions of at most k disjunctive clauses.
k-DNF: Disjunctions of any number of terms where each conjunctive

term has at most k literals.

k-term-DNF: Disjunctions of at most k conjunctive terms.

Suppose we want to learn a 2-term-DNF What should our hypothesis class be?

All these classes can be learned using a polynomial size sample

25

Exercise: Prove that the above four classes of functions have polynomial sample complexity

𝑔 = 𝐷$ ∧ 𝐷) ∧ ⋯ ∧ 𝐷* 𝐷+ = 𝑚$ ∨ 𝑚) ∨ ⋯ ∨ 𝑚, ln k−clause−CNF = 𝑃 𝑙𝑜 𝑔 = 𝑈

$ ∨ 𝑈) ∨ ⋯ ∨ 𝑈 ,

𝐷+ = 𝑚$ ∧ 𝑚) ∧ ⋯ ∧ 𝑚*

SLIDE 26

Sample Complexity

k-CNF: Conjunctions of any number of clauses where each disjunctive

clause has at most k literals.

k-clause-CNF: Conjunctions of at most k disjunctive clauses.
k-DNF: Disjunctions of any number of terms where each conjunctive

term has at most k literals.

k-term-DNF: Disjunctions of at most k conjunctive terms.

Suppose we want to learn a 2-term-DNF What should our hypothesis class be?

All these classes can be learned using a polynomial size sample

26

𝑔 = 𝐷$ ∧ 𝐷) ∧ ⋯ ∧ 𝐷* 𝐷+ = 𝑚$ ∨ 𝑚) ∨ ⋯ ∨ 𝑚, ln k−clause−CNF = 𝑃 𝑙𝑜 𝑔 = 𝑈

$ ∨ 𝑈) ∨ ⋯ ∨ 𝑈 ,

𝐷+ = 𝑚$ ∧ 𝑚) ∧ ⋯ ∧ 𝑚*

SLIDE 27

Computational Complexity

Though sample complexity is polynomial, the computational

complexity is prohibitive in this case

Determining whether there is a 2-term DNF consistent with a set of

training data is NP-hard

That is, the class of k-term-DNF is not efficiently (properly) PAC

learnable due to computational complexity

But, we have seen an algorithm for learning k-CNF
And, k-CNF is a superset of k-term-DNF

(That is, every k-term-DNF can be written as a k-CNF)

27

Suppose we want to learn a 2-term-DNF

SLIDE 28

Computational Complexity

Though sample complexity is polynomial, the computational

complexity is prohibitive in this case

– Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard – That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity

But, we have seen an algorithm for learning k-CNF
And, k-CNF is a superset of k-term-DNF

(That is, every k-term-DNF can be written as a k-CNF)

28

Suppose we want to learn a 2-term-DNF

SLIDE 29

Computational Complexity

Though sample complexity is polynomial, the computational

complexity is prohibitive in this case

– Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard – That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity

But, we have seen an algorithm for learning k-CNF

– And, k-CNF is a superset of k-term-DNF

(That is, every k-term-DNF can be written as a k-CNF)

𝑈

" ∨ 𝑈6 ∨ 𝑈7 =

4

G∈I

",K∈I #,L∈I $

𝑦 ∨ 𝑧 ∨ 𝑨

29

Suppose we want to learn a 2-term-DNF

(It was an exercise a few slides back.)

SLIDE 30

Computational Complexity

30

Suppose we want to learn a 2-term-DNF

Though sample complexity is polynomial, the computational

complexity is prohibitive in this case

– Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard – That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity

But, we have seen an algorithm for learning k-CNF

– And, k-CNF is a superset of k-term-DNF

(That is, every k-term-DNF can be written as a k-CNF)

𝑈

" ∨ 𝑈6 ∨ 𝑈7 =

4

G∈I

",K∈I #,L∈I $

𝑦 ∨ 𝑧 ∨ 𝑨

SLIDE 31

Computational Complexity

31

Suppose we want to learn a 2-term-DNF

Though sample complexity is polynomial, the computational

complexity is prohibitive in this case

– Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard – That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity

But, we have seen an algorithm for learning k-CNF

– And, k-CNF is a superset of k-term-DNF

(That is, every k-term-DNF can be written as a k-CNF)

𝑈

" ∨ 𝑈6 ∨ 𝑈7 =

4

G∈I

",K∈I #,L∈I $

𝑦 ∨ 𝑧 ∨ 𝑨

Example: 𝑏 ∧ 𝑐 ∧ 𝑑 ∨ 𝑒 ∧ 𝑓 ∧ 𝑔 = 𝑏 ∨ 𝑒 ∧ 𝑏 ∨ 𝑓 ∧ 𝑏 ∨ 𝑔 ∧ 𝑐 ∨ 𝑒 ∧ ⋯ ∧ 𝑑 ∨ 𝑔

SLIDE 32

Computational Complexity

Though sample complexity is polynomial, the computational

complexity is prohibitive in this case

– Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard – That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity

But, we have seen an algorithm for learning k-CNF

– And, k-CNF is a superset of k-term-DNF

(That is, every k-term-DNF can be written as a k-CNF)

32

Suppose we want to learn a 2-term-DNF

That is, the concept class C = k-term-DNF can be learned using H = k-CNF as the hypothesis space

SLIDE 33

Computational Complexity

Though sample complexity is polynomial, the computational

complexity is prohibitive in this case

– Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard – That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity

But, we have seen an algorithm for learning k-CNF

– And, k-CNF is a superset of k-term-DNF

(That is, every k-term-DNF can be written as a k-CNF)

33

Suppose we want to learn a 2-term-DNF

That is, the concept class C = k-term-DNF can be learned using H = k-CNF as the hypothesis space

C

SLIDE 34

Computational Complexity

Though sample complexity is polynomial, the computational

complexity is prohibitive in this case

– Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard – That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity

But, we have seen an algorithm for learning k-CNF

– And, k-CNF is a superset of k-term-DNF

(That is, every k-term-DNF can be written as a k-CNF)

34

Suppose we want to learn a 2-term-DNF

That is, the concept class C = k-term-DNF can be learned using H = k-CNF as the hypothesis space

C H

SLIDE 35

Computational Complexity

Though sample complexity is polynomial, the computational

complexity is prohibitive in this case

– Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard – That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity

But, we have seen an algorithm for learning k-CNF

– And, k-CNF is a superset of k-term-DNF

(That is, every k-term-DNF can be written as a k-CNF)

35

Suppose we want to learn a 2-term-DNF

That is, the concept class C = k-term-DNF can be learned using H = k-CNF as the hypothesis space The lesson: Importance of representation Concepts that cannot be learned using one representation can sometimes be learned using a different, more expressive, representation

SLIDE 36

Computational Complexity

Though sample complexity is polynomial, the computational

complexity is prohibitive in this case

– Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard – That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity

But, we have seen an algorithm for learning k-CNF

– And, k-CNF is a superset of k-term-DNF

(That is, every k-term-DNF can be written as a k-CNF)

36

Suppose we want to learn a 2-term-DNF

That is, the concept class C = k-term-DNF can be learned using H = k-CNF as the hypothesis space The lesson: Importance of representation Concepts that cannot be learned using one representation can sometimes be learned using a different, more expressive, representation

We have seen this idea before: Linear classifiers for conjunctions

SLIDE 37

Negative Results – Examples

Two types of non-learnability results

1. Complexity Theoretic (computational complexity bad)

– Showing that various concepts classes cannot be learned, based on well- accepted assumptions from computational complexity theory – Takes the form “A concept class C cannot be learned unless P=NP”

2. Information Theoretic (sample complexity bad)

– The concept class is sufficiently rich that a polynomial number of examples may not be sufficient to distinguish a particular target concept – Both type involve “representation dependent” arguments – The proof shows that a given class cannot be learned by algorithms using hypotheses from the same class. (So?)

37

SLIDE 38

Negative Results – Examples

Two types of non-learnability results

1. Complexity Theoretic (computational complexity bad)

– Showing that various concepts classes cannot be learned, based on well- accepted assumptions from computational complexity theory – Takes the form “A concept class C cannot be learned unless P=NP”

2. Information Theoretic (sample complexity bad)

– The concept class is sufficiently rich that a polynomial number of examples may not be sufficient to distinguish a particular target concept – Both type involve “representation dependent” arguments – The proof shows that a given class cannot be learned by algorithms using hypotheses from the same class. (So?)

38

SLIDE 39

Negative Results – Examples

Two types of non-learnability results

1. Complexity Theoretic (computational complexity bad)

– Showing that various concepts classes cannot be learned, based on well- accepted assumptions from computational complexity theory – Takes the form “A concept class C cannot be learned unless P=NP”

2. Information Theoretic (sample complexity bad)

– The concept class is sufficiently rich that a polynomial number of examples may not be sufficient to distinguish a particular target concept – The proof typically shows that a given class cannot be learned by algorithms using hypotheses from the same class. (Is this always a problem?)

39

SLIDE 40

Negative Results for Learning

Complexity Theoretic

– k-term DNF, for k>1 (k-clause CNF, k>1) – Neural Networks of fixed architecture (3 nodes; n inputs) – “read-once” Boolean formulas – Quantified conjunctive concepts

Information Theoretic

– Arbitrary Boolean functions (DNF Formulas or CNF Formulas) – Deterministic Finite Automata – Context Free Grammars

40