CS 730/730W/830: Intro AI Naive Bayes Boosting 1 handout: slides - - PowerPoint PPT Presentation

cs 730 730w 830 intro ai
SMART_READER_LITE
LIVE PREVIEW

CS 730/730W/830: Intro AI Naive Bayes Boosting 1 handout: slides - - PowerPoint PPT Presentation

CS 730/730W/830: Intro AI Naive Bayes Boosting 1 handout: slides asst 5 milestone was due Wheeler Ruml (UNH) Lecture 22, CS 730 1 / 14 Supervised Learning: Summary So Far learning as function approximation Naive Bayes Boosting k -NN:


slide-1
SLIDE 1

CS 730/730W/830: Intro AI

Naive Bayes Boosting

Wheeler Ruml (UNH) Lecture 22, CS 730 – 1 / 14

1 handout: slides asst 5 milestone was due

slide-2
SLIDE 2

Supervised Learning: Summary So Far

Naive Bayes Boosting

Wheeler Ruml (UNH) Lecture 22, CS 730 – 2 / 14

learning as function approximation k-NN: distance function (any attributes), any labels Neural network: numeric attributes, numeric or binary labels Regression: incremental training with LMS 3-Layer ANN: train with BackProp Inductive Logic Programming: logical concepts Decision Trees: easier with discrete attributes and labels

slide-3
SLIDE 3

Naive Bayes

Naive Bayes ■ Bayes’ Theorem ■ The NB Model ■ The NB Classifier ■ Break Boosting

Wheeler Ruml (UNH) Lecture 22, CS 730 – 3 / 14

slide-4
SLIDE 4

Bayes’ Theorem

Naive Bayes ■ Bayes’ Theorem ■ The NB Model ■ The NB Classifier ■ Break Boosting

Wheeler Ruml (UNH) Lecture 22, CS 730 – 4 / 14

P(H|D) = P(H)P(D|H) P(D)

slide-5
SLIDE 5

Bayes’ Theorem

Naive Bayes ■ Bayes’ Theorem ■ The NB Model ■ The NB Classifier ■ Break Boosting

Wheeler Ruml (UNH) Lecture 22, CS 730 – 4 / 14

P(H|D) = P(H)P(D|H) P(D) P(H) = 0.0001 P(D|H) = 0.99 P(D) = 0.01 P(H|D) =

slide-6
SLIDE 6

Bayes’ Theorem

Naive Bayes ■ Bayes’ Theorem ■ The NB Model ■ The NB Classifier ■ Break Boosting

Wheeler Ruml (UNH) Lecture 22, CS 730 – 4 / 14

P(H|D) = P(H)P(D|H) P(D) P(H) = 0.0001 P(D|H) = 0.99 P(D) = 0.01 P(H|D) = If you don’t have P(D),

slide-7
SLIDE 7

Bayes’ Theorem

Naive Bayes ■ Bayes’ Theorem ■ The NB Model ■ The NB Classifier ■ Break Boosting

Wheeler Ruml (UNH) Lecture 22, CS 730 – 4 / 14

P(H|D) = P(H)P(D|H) P(D) P(H) = 0.0001 P(D|H) = 0.99 P(D) = 0.01 P(H|D) = If you don’t have P(D), somtimes it helps to note that P(D) = P(D|H)P(H) + P(D|¬H)P(¬H)

slide-8
SLIDE 8

A Naive Bayesian Model

Naive Bayes ■ Bayes’ Theorem ■ The NB Model ■ The NB Classifier ■ Break Boosting

Wheeler Ruml (UNH) Lecture 22, CS 730 – 5 / 14

Bayes’ Theorem: P(H|D) = P(H)P(D|H) P(D)

slide-9
SLIDE 9

A Naive Bayesian Model

Naive Bayes ■ Bayes’ Theorem ■ The NB Model ■ The NB Classifier ■ Break Boosting

Wheeler Ruml (UNH) Lecture 22, CS 730 – 5 / 14

Bayes’ Theorem: P(H|D) = P(H)P(D|H) P(D) naive model: P(D|H) = P(xi, . . . , xn|H) =

  • i

P(xi|H)

slide-10
SLIDE 10

A Naive Bayesian Model

Naive Bayes ■ Bayes’ Theorem ■ The NB Model ■ The NB Classifier ■ Break Boosting

Wheeler Ruml (UNH) Lecture 22, CS 730 – 5 / 14

Bayes’ Theorem: P(H|D) = P(H)P(D|H) P(D) naive model: P(D|H) = P(xi, . . . , xn|H) =

  • i

P(xi|H) attributes independent, given class

slide-11
SLIDE 11

A Naive Bayesian Model

Naive Bayes ■ Bayes’ Theorem ■ The NB Model ■ The NB Classifier ■ Break Boosting

Wheeler Ruml (UNH) Lecture 22, CS 730 – 5 / 14

Bayes’ Theorem: P(H|D) = P(H)P(D|H) P(D) naive model: P(D|H) = P(xi, . . . , xn|H) =

  • i

P(xi|H) attributes independent, given class P(H|x1, . . . , xn) = αP(H)

  • i

P(xi|H)

slide-12
SLIDE 12

The ‘Naive Bayes’ Classifier

Naive Bayes ■ Bayes’ Theorem ■ The NB Model ■ The NB Classifier ■ Break Boosting

Wheeler Ruml (UNH) Lecture 22, CS 730 – 6 / 14

P(H|x1, . . . , xn) = αP(H)

  • i

P(xi|H) attributes independent, given class maximum a posteriori = pick highest maximum likelihood = ignore prior watch for sparse data when learning! learning as density estimation

slide-13
SLIDE 13

Break

Naive Bayes ■ Bayes’ Theorem ■ The NB Model ■ The NB Classifier ■ Break Boosting

Wheeler Ruml (UNH) Lecture 22, CS 730 – 7 / 14

asst 5

exam 2

projects

slide-14
SLIDE 14

Boosting

Naive Bayes Boosting ■ Ensembles ■ AdaBoost ■ Behavior ■ Summary ■ EOLQs

Wheeler Ruml (UNH) Lecture 22, CS 730 – 8 / 14

slide-15
SLIDE 15

Ensemble Learning

Naive Bayes Boosting ■ Ensembles ■ AdaBoost ■ Behavior ■ Summary ■ EOLQs

Wheeler Ruml (UNH) Lecture 22, CS 730 – 9 / 14

committees, ensembles weak vs strong learners reduce variance, expand hypothesis space (eg, half-spaces)

slide-16
SLIDE 16

AdaBoost

Naive Bayes Boosting ■ Ensembles ■ AdaBoost ■ Behavior ■ Summary ■ EOLQs

Wheeler Ruml (UNH) Lecture 22, CS 730 – 10 / 14

N examples, T rounds, L a weak learner on weighted examples p ← uniform distribution over the N examples for t = 1 to T do ht ← call L with weights p ǫt ← ht’s weighted misclassification probability if ǫt = 0, return ht αt ← 1

2 ln( 1−ǫt ǫt )

for each example i if ht(i) is correct, pi ← pie−αt else, pi ← pieαt normalize p to sum to 1 return the h weighted by the α to classify, choose label with highest sum of weighted votes

slide-17
SLIDE 17

Boosting Function

Naive Bayes Boosting ■ Ensembles ■ AdaBoost ■ Behavior ■ Summary ■ EOLQs

Wheeler Ruml (UNH) Lecture 22, CS 730 – 11 / 14

slide-18
SLIDE 18

Behavior

Naive Bayes Boosting ■ Ensembles ■ AdaBoost ■ Behavior ■ Summary ■ EOLQs

Wheeler Ruml (UNH) Lecture 22, CS 730 – 12 / 14

doesn’t overfit (maximizes margin even when no error)

  • utliers get high weight, can be inspected

problems:

not enough data

hypothesis class too small

boosting: learner too weak, too strong

slide-19
SLIDE 19

Supervised Learning: Summary

Naive Bayes Boosting ■ Ensembles ■ AdaBoost ■ Behavior ■ Summary ■ EOLQs

Wheeler Ruml (UNH) Lecture 22, CS 730 – 13 / 14

k-NN: distance function (any attributes), any labels Neural network: numeric attributes, numeric or binary labels Perceptron: equivalent to linear regression 3-Layer ANN: BackProp learning Decision Trees: easier with discrete attributes and labels Inductive Logic Programming: logical concepts Naive Bayes: easier with discrete attributes and labels Boosting: general wrapper to improve performance Didn’t cover: RBFs, EBL, SVMs

slide-20
SLIDE 20

EOLQs

Naive Bayes Boosting ■ Ensembles ■ AdaBoost ■ Behavior ■ Summary ■ EOLQs

Wheeler Ruml (UNH) Lecture 22, CS 730 – 14 / 14

What question didn’t you get to ask today?

What’s still confusing?

What would you like to hear more about? Please write down your most pressing question about AI and put it in the box on your way out. Thanks!