Computational Learning Theory: Positive and negative learnability - - PowerPoint PPT Presentation

β–Ά
computational learning theory positive and negative
SMART_READER_LITE
LIVE PREVIEW

Computational Learning Theory: Positive and negative learnability - - PowerPoint PPT Presentation

Computational Learning Theory: Positive and negative learnability results Machine Learning 1 Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others Computational Learning Theory The Theory of Generalization Probably


slide-1
SLIDE 1

Machine Learning

Computational Learning Theory: Positive and negative learnability results

1

Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others

slide-2
SLIDE 2

Computational Learning Theory

  • The Theory of Generalization
  • Probably Approximately Correct (PAC) learning
  • Positive and negative learnability results
  • Agnostic Learning
  • Shattering and the VC dimension

2

slide-3
SLIDE 3

This lecture: Computational Learning Theory

  • The Theory of Generalization
  • Probably Approximately Correct (PAC) learning
  • Positive and negative learnability results
  • Agnostic Learning
  • Shattering and the VC dimension

3

slide-4
SLIDE 4

What can be learned

General conjunctions are PAC learnable

– 𝐼 = Number of conjunctions of π‘œ variables = 3!

ln 𝐼 = π‘œ ln(3)

– Number of examples needed 𝑛 > "

# π‘œ ln 3 + ln " $

4

𝑛 > 1 πœ— ln 𝐼 + ln 1 πœ€

slide-5
SLIDE 5
  • If we want to guarantee a 95% chance of learning a hypothesis of at least 90%

accuracy, with π‘œ = 10 Boolean variables, we need m >

!"

! "."$ #$% !" &

%.$

= 140 examples

  • If π‘œ = 100, this goes to 1129. (linearly increases with n)
  • Increasing the confidence to 99% will cost 1145 examples (logarithmic with πœ€)

These results hold for any consistent learner

What can be learned

General conjunctions are PAC learnable

– 𝐼 = Number of conjunctions of π‘œ variables = 3!

ln 𝐼 = π‘œ ln(3)

– Number of examples needed 𝑛 > "

# π‘œ ln 3 + ln " $

5

𝑛 > 1 πœ— ln 𝐼 + ln 1 πœ€

slide-6
SLIDE 6
  • If we want to guarantee a 95% chance of learning a hypothesis of at least 90%

accuracy, with π‘œ = 10 Boolean variables, we need m >

!"

! "."$ #$% !" &

%.$

= 140 examples

  • If π‘œ = 100, this goes to 1129. (linearly increases with n)
  • Increasing the confidence to 99% will cost 1145 examples (logarithmic with πœ€)

These results hold for any consistent learner

What can be learned

General conjunctions are PAC learnable

– 𝐼 = Number of conjunctions of π‘œ variables = 3!

ln 𝐼 = π‘œ ln(3)

– Number of examples needed 𝑛 > "

# π‘œ ln 3 + ln " $

6

𝑛 > 1 πœ— ln 𝐼 + ln 1 πœ€

slide-7
SLIDE 7

What can be learned

General conjunctions are PAC learnable

– 𝐼 = Number of conjunctions of π‘œ variables = 3!

ln 𝐼 = π‘œ ln(3)

– Number of examples needed 𝑛 > "

# π‘œ ln 3 + ln " $

7

  • If we want to guarantee a 95% chance of learning a hypothesis of at least 90%

accuracy, with π‘œ = 10 Boolean variables, we need m >

!"

! "."$ #$% !" &

%.$

= 140 examples

  • If π‘œ = 100, this goes to 1129. (linearly increases with n)
  • Increasing the confidence to 99% will cost 1145 examples (logarithmic with πœ€)

These results hold for any consistent learner 𝑛 > 1 πœ— ln 𝐼 + ln 1 πœ€

slide-8
SLIDE 8

What can be learned

General conjunctions are PAC learnable

– 𝐼 = Number of conjunctions of π‘œ variables = 3!

ln 𝐼 = π‘œ ln(3)

– Number of examples needed 𝑛 > "

# π‘œ ln 3 + ln " $

8

𝑛 > 1 πœ— ln 𝐼 + ln 1 πœ€

  • If we want to guarantee a 95% chance of learning a hypothesis of at least 90%

accuracy, with π‘œ = 10 Boolean variables, we need m >

!"

! "."$ #$% !" &

%.$

= 140 examples

  • If π‘œ = 100, this goes to 1129. (linearly increases with n)
  • Increasing the confidence to 99% will cost 1145 examples (logarithmic with πœ€)

These results hold for any consistent learner

slide-9
SLIDE 9

What can be learned

3-CNF

9

Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation) 𝑛 > 1 πœ— ln 𝐼 + ln 1 πœ€

π‘š"" ∨ π‘š"6 ∨ π‘š"7 ∧ π‘š6" ∨ π‘š66 ∨ π‘š67

slide-10
SLIDE 10

What can be learned

3-CNF

10

Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation) 𝑛 > 1 πœ— ln 𝐼 + ln 1 πœ€

π‘š"" ∨ π‘š"6 ∨ π‘š"7 ∧ π‘š6" ∨ π‘š66 ∨ π‘š67

slide-11
SLIDE 11

What can be learned

3-CNF

11

Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)

What is the sample complexity? That is, if we had a consistent learner, how many examples will it need to guarantee PAC learnability? We need the size of the hypothesis space. How many 3CNFs are there?

  • Number of conjuncts =
  • A 3-CNF is a conjunction with these many variables.
  • |H| = Number of 3-CNFs =
  • log(|H|) = O(n3)

log(|H|) is polynomial in n ) the sample complexity is also polynomial in n 𝑛 > 1 πœ— ln 𝐼 + ln 1 πœ€

π‘š"" ∨ π‘š"6 ∨ π‘š"7 ∧ π‘š6" ∨ π‘š66 ∨ π‘š67

slide-12
SLIDE 12

What can be learned

3-CNF

12

Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)

What is the sample complexity? That is, if we had a consistent learner, how many examples will it need to guarantee PAC learnability? We need the size of the hypothesis space. How many 3CNFs are there?

  • Number of conjuncts = 𝑃

2π‘œ !

  • A 3-CNF is a conjunction with these many variables.
  • |𝐼| = Number of 3-CNFs = 𝑃 2 "# (
  • log |𝐼| = 𝑃(π‘œ!)

log(|H|) is polynomial in n ) the sample complexity is also polynomial in n 𝑛 > 1 πœ— ln 𝐼 + ln 1 πœ€

π‘š"" ∨ π‘š"6 ∨ π‘š"7 ∧ π‘š6" ∨ π‘š66 ∨ π‘š67

slide-13
SLIDE 13

What can be learned

3-CNF

13

Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)

What is the sample complexity? That is, if we had a consistent learner, how many examples will it need to guarantee PAC learnability? We need the size of the hypothesis space. How many 3CNFs are there?

  • Number of conjuncts = 𝑃

2π‘œ !

  • A 3-CNF is a conjunction with these many variables.
  • |𝐼| = Number of 3-CNFs = 𝑃 2 "# (
  • log |𝐼| = 𝑃(π‘œ!)

log(|H|) is polynomial in n ) the sample complexity is also polynomial in n 𝑛 > 1 πœ— ln 𝐼 + ln 1 πœ€

π‘š"" ∨ π‘š"6 ∨ π‘š"7 ∧ π‘š6" ∨ π‘š66 ∨ π‘š67

slide-14
SLIDE 14

What can be learned

3-CNF

14

Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)

What is the sample complexity? That is, if we had a consistent learner, how many examples will it need to guarantee PAC learnability? We need the size of the hypothesis space. How many 3CNFs are there?

  • Number of conjuncts = 𝑃

2π‘œ !

  • A 3-CNF is a conjunction with these many variables.
  • |𝐼| = Number of 3-CNFs = 𝑃 2 "# (
  • log |𝐼| = 𝑃(π‘œ!)

log(|H|) is polynomial in n ) the sample complexity is also polynomial in n 𝑛 > 1 πœ— ln 𝐼 + ln 1 πœ€

π‘š"" ∨ π‘š"6 ∨ π‘š"7 ∧ π‘š6" ∨ π‘š66 ∨ π‘š67

slide-15
SLIDE 15

What can be learned

3-CNF

15

Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)

What is the sample complexity? That is, if we had a consistent learner, how many examples will it need to guarantee PAC learnability? We need the size of the hypothesis space. How many 3CNFs are there?

  • Number of conjuncts = 𝑃

2π‘œ !

  • A 3-CNF is a conjunction with these many variables.
  • |𝐼| = Number of 3-CNFs = 𝑃 2 "# (
  • log |𝐼| = 𝑃(π‘œ!)

log(|H|) is polynomial in n ) the sample complexity is also polynomial in n 𝑛 > 1 πœ— ln 𝐼 + ln 1 πœ€

π‘š"" ∨ π‘š"6 ∨ π‘š"7 ∧ π‘š6" ∨ π‘š66 ∨ π‘š67

slide-16
SLIDE 16

What can be learned

3-CNF

16

Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)

What is the sample complexity? That is, if we had a consistent learner, how many examples will it need to guarantee PAC learnability? We need the size of the hypothesis space. How many 3CNFs are there?

  • Number of conjuncts = 𝑃

2π‘œ !

  • A 3-CNF is a conjunction with these many variables.
  • |𝐼| = Number of 3-CNFs = 𝑃 2 "# (
  • log |𝐼| = 𝑃(π‘œ!)

log(|H|) is polynomial in n ) the sample complexity is also polynomial in n 𝑛 > 1 πœ— ln 𝐼 + ln 1 πœ€

π‘š"" ∨ π‘š"6 ∨ π‘š"7 ∧ π‘š6" ∨ π‘š66 ∨ π‘š67

slide-17
SLIDE 17

What can be learned

3-CNF

17

Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)

What is the sample complexity? That is, if we had a consistent learner, how many examples will it need to guarantee PAC learnability? We need the size of the hypothesis space. How many 3CNFs are there?

  • Number of conjuncts = 𝑃

2π‘œ !

  • A 3-CNF is a conjunction with these many variables.
  • |𝐼| = Number of 3-CNFs = 𝑃 2 "# (
  • log |𝐼| = 𝑃(π‘œ!)

𝑛 > 1 πœ— ln 𝐼 + ln 1 πœ€

π‘š"" ∨ π‘š"6 ∨ π‘š"7 ∧ π‘š6" ∨ π‘š66 ∨ π‘š67

slide-18
SLIDE 18

What can be learned

3-CNF

18

Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)

What is the sample complexity? That is, if we had a consistent learner, how many examples will it need to guarantee PAC learnability? We need the size of the hypothesis space. How many 3CNFs are there?

  • Number of conjuncts = 𝑃

2π‘œ !

  • A 3-CNF is a conjunction with these many variables.
  • |𝐼| = Number of 3-CNFs = 𝑃 2 "# (
  • log |𝐼| = 𝑃(π‘œ!)

𝑛 > 1 πœ— ln 𝐼 + ln 1 πœ€ log (|𝐼|) is polynomial in π‘œ β‡’ the sample complexity is also polynomial in n

π‘š"" ∨ π‘š"6 ∨ π‘š"7 ∧ π‘š6" ∨ π‘š66 ∨ π‘š67

slide-19
SLIDE 19

What can be learned

3-CNF

19

Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)

What is the sample complexity? That is, if we had a consistent learner, how many examples will it need to guarantee PAC learnability? We need the size of the hypothesis space. How many 3CNFs are there?

  • Number of conjuncts = 𝑃

2π‘œ !

  • A 3-CNF is a conjunction with these many variables.
  • |𝐼| = Number of 3-CNFs = 𝑃 2 "# (
  • log |𝐼| = 𝑃(π‘œ!)

log (|𝐼|) is polynomial in π‘œ β‡’ the sample complexity is also polynomial in π‘œ For PAC learnability, we still need an efficient algorithm that will find a consistent hypothesis. Exercise: Find one 𝑛 > 1 πœ— ln 𝐼 + ln 1 πœ€

π‘š"" ∨ π‘š"6 ∨ π‘š"7 ∧ π‘š6" ∨ π‘š66 ∨ π‘š67

slide-20
SLIDE 20

What can be learned

General Boolean functions

  • How many Boolean functions exist with π‘œ variables?

So log(|𝐼|) is exponential.

  • General Boolean functions are not PAC learnable

20

𝑛 > 1 πœ— ln 𝐼 + ln 1 πœ€

slide-21
SLIDE 21

What can be learned

General Boolean functions

  • How many Boolean functions exist with π‘œ variables? 26!

So log(|𝐼|) is exponential.

  • General Boolean functions are not PAC learnable

21

𝑛 > 1 πœ— ln 𝐼 + ln 1 πœ€

slide-22
SLIDE 22

What can be learned

General Boolean functions

  • How many Boolean functions exist with π‘œ variables? 26!

So log(|𝐼|) is exponential.

  • General Boolean functions are not PAC learnable

22

𝑛 > 1 πœ— ln 𝐼 + ln 1 πœ€

slide-23
SLIDE 23

Sample Complexity

  • k-CNF: Conjunctions of any number of clauses where each disjunctive

clause has at most k literals.

  • k-clause-CNF: Conjunctions of at most k disjunctive clauses.
  • k-DNF: Disjunctions of any number of terms where each conjunctive

term has at most k literals.

  • k-term-DNF: Disjunctions of at most k conjunctive terms.

f = T

1∨T2 ∨..∨.Tm

Ti = l1∧l2 ∧...∧lk

Suppose we want to learn a 2-term-DNF What should our hypothesis class be?

All these classes can be learned using a polynomial size sample

23

𝑔 = 𝐷$ ∧ 𝐷) ∧ β‹― ∧ 𝐷* 𝐷+ = π‘š$ ∨ π‘š) ∨ β‹― ∨ π‘š, ln kβˆ’clauseβˆ’CNF = 𝑃 π‘™π‘œ

slide-24
SLIDE 24

Sample Complexity

  • k-CNF: Conjunctions of any number of clauses where each disjunctive

clause has at most k literals.

  • k-clause-CNF: Conjunctions of at most k disjunctive clauses.
  • k-DNF: Disjunctions of any number of terms where each conjunctive

term has at most k literals.

  • k-term-DNF: Disjunctions of at most k conjunctive terms.

Suppose we want to learn a 2-term-DNF What should our hypothesis class be?

All these classes can be learned using a polynomial size sample

24

𝑔 = 𝐷$ ∧ 𝐷) ∧ β‹― ∧ 𝐷* 𝐷+ = π‘š$ ∨ π‘š) ∨ β‹― ∨ π‘š, ln kβˆ’clauseβˆ’CNF = 𝑃 π‘™π‘œ 𝑔 = π‘ˆ

$ ∨ π‘ˆ) ∨ β‹― ∨ π‘ˆ ,

𝐷+ = π‘š$ ∧ π‘š) ∧ β‹― ∧ π‘š*

slide-25
SLIDE 25

Sample Complexity

  • k-CNF: Conjunctions of any number of clauses where each disjunctive

clause has at most k literals.

  • k-clause-CNF: Conjunctions of at most k disjunctive clauses.
  • k-DNF: Disjunctions of any number of terms where each conjunctive

term has at most k literals.

  • k-term-DNF: Disjunctions of at most k conjunctive terms.

Suppose we want to learn a 2-term-DNF What should our hypothesis class be?

All these classes can be learned using a polynomial size sample

25

Exercise: Prove that the above four classes of functions have polynomial sample complexity

𝑔 = 𝐷$ ∧ 𝐷) ∧ β‹― ∧ 𝐷* 𝐷+ = π‘š$ ∨ π‘š) ∨ β‹― ∨ π‘š, ln kβˆ’clauseβˆ’CNF = 𝑃 π‘™π‘œ 𝑔 = π‘ˆ

$ ∨ π‘ˆ) ∨ β‹― ∨ π‘ˆ ,

𝐷+ = π‘š$ ∧ π‘š) ∧ β‹― ∧ π‘š*

slide-26
SLIDE 26

Sample Complexity

  • k-CNF: Conjunctions of any number of clauses where each disjunctive

clause has at most k literals.

  • k-clause-CNF: Conjunctions of at most k disjunctive clauses.
  • k-DNF: Disjunctions of any number of terms where each conjunctive

term has at most k literals.

  • k-term-DNF: Disjunctions of at most k conjunctive terms.

Suppose we want to learn a 2-term-DNF What should our hypothesis class be?

All these classes can be learned using a polynomial size sample

26

𝑔 = 𝐷$ ∧ 𝐷) ∧ β‹― ∧ 𝐷* 𝐷+ = π‘š$ ∨ π‘š) ∨ β‹― ∨ π‘š, ln kβˆ’clauseβˆ’CNF = 𝑃 π‘™π‘œ 𝑔 = π‘ˆ

$ ∨ π‘ˆ) ∨ β‹― ∨ π‘ˆ ,

𝐷+ = π‘š$ ∧ π‘š) ∧ β‹― ∧ π‘š*

slide-27
SLIDE 27

Computational Complexity

  • Though sample complexity is polynomial, the computational

complexity is prohibitive in this case

  • Determining whether there is a 2-term DNF consistent with a set of

training data is NP-hard

  • That is, the class of k-term-DNF is not efficiently (properly) PAC

learnable due to computational complexity

  • But, we have seen an algorithm for learning k-CNF
  • And, k-CNF is a superset of k-term-DNF

(That is, every k-term-DNF can be written as a k-CNF)

27

Suppose we want to learn a 2-term-DNF

slide-28
SLIDE 28

Computational Complexity

  • Though sample complexity is polynomial, the computational

complexity is prohibitive in this case

– Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard – That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity

  • But, we have seen an algorithm for learning k-CNF
  • And, k-CNF is a superset of k-term-DNF

(That is, every k-term-DNF can be written as a k-CNF)

28

Suppose we want to learn a 2-term-DNF

slide-29
SLIDE 29

Computational Complexity

  • Though sample complexity is polynomial, the computational

complexity is prohibitive in this case

– Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard – That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity

  • But, we have seen an algorithm for learning k-CNF

– And, k-CNF is a superset of k-term-DNF

(That is, every k-term-DNF can be written as a k-CNF)

π‘ˆ

" ∨ π‘ˆ6 ∨ π‘ˆ7 =

4

G∈I

",K∈I #,L∈I $

𝑦 ∨ 𝑧 ∨ 𝑨

29

Suppose we want to learn a 2-term-DNF

(It was an exercise a few slides back.)

slide-30
SLIDE 30

Computational Complexity

30

Suppose we want to learn a 2-term-DNF

  • Though sample complexity is polynomial, the computational

complexity is prohibitive in this case

– Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard – That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity

  • But, we have seen an algorithm for learning k-CNF

– And, k-CNF is a superset of k-term-DNF

(That is, every k-term-DNF can be written as a k-CNF)

π‘ˆ

" ∨ π‘ˆ6 ∨ π‘ˆ7 =

4

G∈I

",K∈I #,L∈I $

𝑦 ∨ 𝑧 ∨ 𝑨

slide-31
SLIDE 31

Computational Complexity

31

Suppose we want to learn a 2-term-DNF

  • Though sample complexity is polynomial, the computational

complexity is prohibitive in this case

– Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard – That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity

  • But, we have seen an algorithm for learning k-CNF

– And, k-CNF is a superset of k-term-DNF

(That is, every k-term-DNF can be written as a k-CNF)

π‘ˆ

" ∨ π‘ˆ6 ∨ π‘ˆ7 =

4

G∈I

",K∈I #,L∈I $

𝑦 ∨ 𝑧 ∨ 𝑨

Example: 𝑏 ∧ 𝑐 ∧ 𝑑 ∨ 𝑒 ∧ 𝑓 ∧ 𝑔 = 𝑏 ∨ 𝑒 ∧ 𝑏 ∨ 𝑓 ∧ 𝑏 ∨ 𝑔 ∧ 𝑐 ∨ 𝑒 ∧ β‹― ∧ 𝑑 ∨ 𝑔

slide-32
SLIDE 32

Computational Complexity

  • Though sample complexity is polynomial, the computational

complexity is prohibitive in this case

– Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard – That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity

  • But, we have seen an algorithm for learning k-CNF

– And, k-CNF is a superset of k-term-DNF

(That is, every k-term-DNF can be written as a k-CNF)

32

Suppose we want to learn a 2-term-DNF

That is, the concept class C = k-term-DNF can be learned using H = k-CNF as the hypothesis space

slide-33
SLIDE 33

Computational Complexity

  • Though sample complexity is polynomial, the computational

complexity is prohibitive in this case

– Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard – That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity

  • But, we have seen an algorithm for learning k-CNF

– And, k-CNF is a superset of k-term-DNF

(That is, every k-term-DNF can be written as a k-CNF)

33

Suppose we want to learn a 2-term-DNF

That is, the concept class C = k-term-DNF can be learned using H = k-CNF as the hypothesis space

C

slide-34
SLIDE 34

Computational Complexity

  • Though sample complexity is polynomial, the computational

complexity is prohibitive in this case

– Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard – That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity

  • But, we have seen an algorithm for learning k-CNF

– And, k-CNF is a superset of k-term-DNF

(That is, every k-term-DNF can be written as a k-CNF)

34

Suppose we want to learn a 2-term-DNF

That is, the concept class C = k-term-DNF can be learned using H = k-CNF as the hypothesis space

C H

slide-35
SLIDE 35

Computational Complexity

  • Though sample complexity is polynomial, the computational

complexity is prohibitive in this case

– Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard – That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity

  • But, we have seen an algorithm for learning k-CNF

– And, k-CNF is a superset of k-term-DNF

(That is, every k-term-DNF can be written as a k-CNF)

35

Suppose we want to learn a 2-term-DNF

That is, the concept class C = k-term-DNF can be learned using H = k-CNF as the hypothesis space The lesson: Importance of representation Concepts that cannot be learned using one representation can sometimes be learned using a different, more expressive, representation

slide-36
SLIDE 36

Computational Complexity

  • Though sample complexity is polynomial, the computational

complexity is prohibitive in this case

– Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard – That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity

  • But, we have seen an algorithm for learning k-CNF

– And, k-CNF is a superset of k-term-DNF

(That is, every k-term-DNF can be written as a k-CNF)

36

Suppose we want to learn a 2-term-DNF

That is, the concept class C = k-term-DNF can be learned using H = k-CNF as the hypothesis space The lesson: Importance of representation Concepts that cannot be learned using one representation can sometimes be learned using a different, more expressive, representation

We have seen this idea before: Linear classifiers for conjunctions

slide-37
SLIDE 37

Negative Results – Examples

Two types of non-learnability results

  • 1. Complexity Theoretic (computational complexity bad)

– Showing that various concepts classes cannot be learned, based on well- accepted assumptions from computational complexity theory – Takes the form β€œA concept class C cannot be learned unless P=NP”

  • 2. Information Theoretic (sample complexity bad)

– The concept class is sufficiently rich that a polynomial number of examples may not be sufficient to distinguish a particular target concept – Both type involve β€œrepresentation dependent” arguments – The proof shows that a given class cannot be learned by algorithms using hypotheses from the same class. (So?)

37

slide-38
SLIDE 38

Negative Results – Examples

Two types of non-learnability results

  • 1. Complexity Theoretic (computational complexity bad)

– Showing that various concepts classes cannot be learned, based on well- accepted assumptions from computational complexity theory – Takes the form β€œA concept class C cannot be learned unless P=NP”

  • 2. Information Theoretic (sample complexity bad)

– The concept class is sufficiently rich that a polynomial number of examples may not be sufficient to distinguish a particular target concept – Both type involve β€œrepresentation dependent” arguments – The proof shows that a given class cannot be learned by algorithms using hypotheses from the same class. (So?)

38

slide-39
SLIDE 39

Negative Results – Examples

Two types of non-learnability results

  • 1. Complexity Theoretic (computational complexity bad)

– Showing that various concepts classes cannot be learned, based on well- accepted assumptions from computational complexity theory – Takes the form β€œA concept class C cannot be learned unless P=NP”

  • 2. Information Theoretic (sample complexity bad)

– The concept class is sufficiently rich that a polynomial number of examples may not be sufficient to distinguish a particular target concept – The proof typically shows that a given class cannot be learned by algorithms using hypotheses from the same class. (Is this always a problem?)

39

slide-40
SLIDE 40

Negative Results for Learning

  • Complexity Theoretic

– k-term DNF, for k>1 (k-clause CNF, k>1) – Neural Networks of fixed architecture (3 nodes; n inputs) – β€œread-once” Boolean formulas – Quantified conjunctive concepts

  • Information Theoretic

– Arbitrary Boolean functions (DNF Formulas or CNF Formulas) – Deterministic Finite Automata – Context Free Grammars

40