Machine Learning
Computational Learning Theory: Positive and negative learnability results
1
Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others
Computational Learning Theory: Positive and negative learnability - - PowerPoint PPT Presentation
Computational Learning Theory: Positive and negative learnability results Machine Learning 1 Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others Computational Learning Theory The Theory of Generalization Probably
1
Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others
2
3
# π ln 3 + ln " $
4
π > 1 π ln πΌ + ln 1 π
accuracy, with π = 10 Boolean variables, we need m >
!"
! "."$ #$% !" &
%.$
= 140 examples
These results hold for any consistent learner
# π ln 3 + ln " $
5
π > 1 π ln πΌ + ln 1 π
accuracy, with π = 10 Boolean variables, we need m >
!"
! "."$ #$% !" &
%.$
= 140 examples
These results hold for any consistent learner
# π ln 3 + ln " $
6
π > 1 π ln πΌ + ln 1 π
# π ln 3 + ln " $
7
accuracy, with π = 10 Boolean variables, we need m >
!"
! "."$ #$% !" &
%.$
= 140 examples
These results hold for any consistent learner π > 1 π ln πΌ + ln 1 π
# π ln 3 + ln " $
8
π > 1 π ln πΌ + ln 1 π
accuracy, with π = 10 Boolean variables, we need m >
!"
! "."$ #$% !" &
%.$
= 140 examples
These results hold for any consistent learner
9
Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation) π > 1 π ln πΌ + ln 1 π
10
Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation) π > 1 π ln πΌ + ln 1 π
11
Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)
log(|H|) is polynomial in n ) the sample complexity is also polynomial in n π > 1 π ln πΌ + ln 1 π
12
Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)
log(|H|) is polynomial in n ) the sample complexity is also polynomial in n π > 1 π ln πΌ + ln 1 π
13
Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)
log(|H|) is polynomial in n ) the sample complexity is also polynomial in n π > 1 π ln πΌ + ln 1 π
14
Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)
log(|H|) is polynomial in n ) the sample complexity is also polynomial in n π > 1 π ln πΌ + ln 1 π
15
Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)
log(|H|) is polynomial in n ) the sample complexity is also polynomial in n π > 1 π ln πΌ + ln 1 π
16
Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)
log(|H|) is polynomial in n ) the sample complexity is also polynomial in n π > 1 π ln πΌ + ln 1 π
17
Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)
π > 1 π ln πΌ + ln 1 π
18
Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)
π > 1 π ln πΌ + ln 1 π log (|πΌ|) is polynomial in π β the sample complexity is also polynomial in n
19
Subset of CNFs: Each conjunct can have at most three literals (i.e a variable or its negation)
log (|πΌ|) is polynomial in π β the sample complexity is also polynomial in π For PAC learnability, we still need an efficient algorithm that will find a consistent hypothesis. Exercise: Find one π > 1 π ln πΌ + ln 1 π
20
π > 1 π ln πΌ + ln 1 π
21
π > 1 π ln πΌ + ln 1 π
22
π > 1 π ln πΌ + ln 1 π
f = T
1β¨T2 β¨..β¨.Tm
Ti = l1β§l2 β§...β§lk
23
π = π·$ β§ π·) β§ β― β§ π·* π·+ = π$ β¨ π) β¨ β― β¨ π, ln kβclauseβCNF = π ππ
24
π = π·$ β§ π·) β§ β― β§ π·* π·+ = π$ β¨ π) β¨ β― β¨ π, ln kβclauseβCNF = π ππ π = π
$ β¨ π) β¨ β― β¨ π ,
π·+ = π$ β§ π) β§ β― β§ π*
25
π = π·$ β§ π·) β§ β― β§ π·* π·+ = π$ β¨ π) β¨ β― β¨ π, ln kβclauseβCNF = π ππ π = π
$ β¨ π) β¨ β― β¨ π ,
π·+ = π$ β§ π) β§ β― β§ π*
26
π = π·$ β§ π·) β§ β― β§ π·* π·+ = π$ β¨ π) β¨ β― β¨ π, ln kβclauseβCNF = π ππ π = π
$ β¨ π) β¨ β― β¨ π ,
π·+ = π$ β§ π) β§ β― β§ π*
27
β Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard β That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity
28
β Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard β That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity
β And, k-CNF is a superset of k-term-DNF
" β¨ π6 β¨ π7 =
GβI
",KβI #,LβI $
29
(It was an exercise a few slides back.)
30
β Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard β That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity
β And, k-CNF is a superset of k-term-DNF
" β¨ π6 β¨ π7 =
GβI
",KβI #,LβI $
31
β Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard β That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity
β And, k-CNF is a superset of k-term-DNF
" β¨ π6 β¨ π7 =
GβI
",KβI #,LβI $
Example: π β§ π β§ π β¨ π β§ π β§ π = π β¨ π β§ π β¨ π β§ π β¨ π β§ π β¨ π β§ β― β§ π β¨ π
β Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard β That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity
β And, k-CNF is a superset of k-term-DNF
32
β Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard β That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity
β And, k-CNF is a superset of k-term-DNF
33
β Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard β That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity
β And, k-CNF is a superset of k-term-DNF
34
β Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard β That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity
β And, k-CNF is a superset of k-term-DNF
35
β Determining whether there is a 2-term DNF consistent with a set of training data is NP-hard β That is, the class of k-term-DNF is not efficiently PAC learnable due to computational complexity
β And, k-CNF is a superset of k-term-DNF
36
We have seen this idea before: Linear classifiers for conjunctions
β Showing that various concepts classes cannot be learned, based on well- accepted assumptions from computational complexity theory β Takes the form βA concept class C cannot be learned unless P=NPβ
β The concept class is sufficiently rich that a polynomial number of examples may not be sufficient to distinguish a particular target concept β Both type involve βrepresentation dependentβ arguments β The proof shows that a given class cannot be learned by algorithms using hypotheses from the same class. (So?)
37
β Showing that various concepts classes cannot be learned, based on well- accepted assumptions from computational complexity theory β Takes the form βA concept class C cannot be learned unless P=NPβ
β The concept class is sufficiently rich that a polynomial number of examples may not be sufficient to distinguish a particular target concept β Both type involve βrepresentation dependentβ arguments β The proof shows that a given class cannot be learned by algorithms using hypotheses from the same class. (So?)
38
β Showing that various concepts classes cannot be learned, based on well- accepted assumptions from computational complexity theory β Takes the form βA concept class C cannot be learned unless P=NPβ
β The concept class is sufficiently rich that a polynomial number of examples may not be sufficient to distinguish a particular target concept β The proof typically shows that a given class cannot be learned by algorithms using hypotheses from the same class. (Is this always a problem?)
39
40