BBM406 Fundamentals of Machine Learning Lecture 19: What is - - PowerPoint PPT Presentation

bbm406
SMART_READER_LITE
LIVE PREVIEW

BBM406 Fundamentals of Machine Learning Lecture 19: What is - - PowerPoint PPT Presentation

Photo byUnsplash user @nathananderson BBM406 Fundamentals of Machine Learning Lecture 19: What is Ensemble Learning? Bagging Random Forests Aykut Erdem // Hacettepe University // Fall 2019 Last time Decision Trees slide by David Sontag


slide-1
SLIDE 1

Aykut Erdem // Hacettepe University // Fall 2019

Lecture 19:

What is Ensemble Learning? Bagging Random Forests

BBM406

Fundamentals of 
 Machine Learning

Photo byUnsplash user @nathananderson

slide-2
SLIDE 2

Last time… Decision Trees

2

slide by David Sontag
slide-3
SLIDE 3

Last time… Information Gain

3

  • Decrease&in&entropy&(uncertainty)&aper&spliong&

X1 X2 Y T T T T F T T T T T F T F T T F F F In our running example: IG(X1) = H(Y) – H(Y|X1) = 0.65 – 0.33 IG(X1) > 0  we prefer the split!

slide by David Sontag
slide-4
SLIDE 4

Xj c1 c2

t1 t2

Last time… Continuous features

  • Binary tree, split on attribute X
  • One branch: X < t
  • Other branch: X ≥ t
  • Search through possible values of t
  • Seems hard!!!
  • But only a finite number of t’s are important:


  • Sort data according to X into {x1,...,xm}
  • Consider split points of the form xi + (xi+1 – xi )/2
  • Moreover, only splits between examples from different

classes matter! 
 


4

Xj c2 c1

t1 t2

slide by David Sontag
slide-5
SLIDE 5

Last time… Decision trees will overfit

  • Standard decision trees have no learning bias
  • Training set error is always zero!
  • (If there is no label noise)
  • Lots of variance
  • Must introduce some bias towards simpler

trees


  • Many strategies for picking simpler trees
  • Fixed depth
  • Fixed number of leaves

  • Random forests

5

slide by David Sontag
slide-6
SLIDE 6

Today

  • Ensemble Methods
  • Bagging
  • Random Forests

6

slide-7
SLIDE 7

Ensemble Methods

  • High level idea


– Generate multiple hypotheses


– Combine them to a single classifier


  • Two important questions


– How do we generate multiple hypotheses 


  • we have only one sample


– How do we combine the multiple hypotheses 


  • Majority, AdaBoost, ...

7

slide by Yishay Mansour
slide-8
SLIDE 8

Bias/Variance Tradeoff

8

Hastie, Tibshirani, Friedman “Elements of Statistical Learning” 2001

Bias/Variance&Tradeoff&

slide by David Sontag
slide-9
SLIDE 9

Bias/Variance Tradeoff

9

http://scott.fortmann-roe.com/docs/BiasVariance.html Graphical illustration of bias and variance.

slide by David Sontag
slide-10
SLIDE 10

Fighting the bias-variance tradeoff

  • Simple (a.k.a. weak) learners are good
  • e.g., naïve Bayes, logistic regression, decision

stumps (or shallow decision trees)

  • Low variance, don’t usually overfit
  • Simple (a.k.a. weak) learners are bad


– High bias, can’t solve hard learning problems

10

slide by Aarti Singh
slide-11
SLIDE 11

Reduce Variance Without Increasing Bias

  • Averaging reduces variance:
  • Average models to reduce model variance
  • One problem:
  • Only one training set
  • Where do multiple models come from?

11

(when prediction are independent)

slide by David Sontag
slide-12
SLIDE 12

Bagging (Bootstrap Aggregating)

  • Leo Breiman (1994)
  • Take repeated bootstrap samples from training set

D.

  • Bootstrap sampling: Given set D containing N

training examples, create D’ by drawing N examples at random with replacement from D.

  • Bagging:
  • Create k bootstrap samples D1 ... Dk.
  • Train distinct classifier on each Di.
  • Classify new instance by majority vote / average.

12

slide by David Sontag
slide-13
SLIDE 13

Bagging

  • Best case:
  • In practice:
  • models are correlated, so reduction is smaller 


than 1/N

  • variance of models trained on fewer training

cases usually somewhat larger

13

slide by David Sontag
slide-14
SLIDE 14

Bagging Example

14

slide by David Sontag
slide-15
SLIDE 15

CART* decision boundary

15

* A decision tree learning algorithm; very similar to ID3

slide by David Sontag
slide-16
SLIDE 16

100 bagged trees

16

  • Shades of blue/red indicate strength of vote for particular classification
slide by David Sontag
slide-17
SLIDE 17

Random Forests

17

slide-18
SLIDE 18

Random Forests

  • Ensemble method specifically designed for

decision tree classifiers

  • Introduce two sources of randomness: “Bagging”

and “Random input vectors”

  • Bagging method: each tree is grown using a

bootstrap sample of training data

  • Random vector method: At each node, best split is

chosen from a random sample of m attributes instead of all attributes

18

slide by David Sontag
slide-19
SLIDE 19

Classification tree

19 Data in feature space

?" ?" ?"

Classification tree training

[Criminisi et al., 2011]

slide by Nando de Freitas
slide-20
SLIDE 20

Use information gain to decide splits

20

Split&1& Split&2& Before&split& [Criminisi et al., 2011]

slide by Nando de Freitas
slide-21
SLIDE 21

Advanced: Gaussian information gain to decide splits

21

Before&split&

Split&1& Split&2&

[Criminisi et al., 2011]

slide by Nando de Freitas
slide-22
SLIDE 22

22

Split&1&

𝜾=1 𝜾=2 …

[Criminisi et al., 2011]

leaf% leaf% leaf%

Leaf model: probabilistic

Split node (train)

Node%weak%learner%

Split node (test)

slide by Nando de Freitas
slide-23
SLIDE 23

Alternative node decisions

23

axis aligned

  • riented line

conic section examples of weak learners

slide by Nando de Freitas
slide-24
SLIDE 24

Building a random tree

24

Building a random tree

slide by Nando de Freitas
slide-25
SLIDE 25

Random Forests algorithm

25

[From the book of Hastie, Friedman and Tibshirani]

slide by Nando de Freitas
slide-26
SLIDE 26

Randomization

26

slide by Nando de Freitas
slide-27
SLIDE 27

Building a forest (ensemble)

27

Tree t=1 t=2 t=3

slide by Nando de Freitas
slide-28
SLIDE 28

Effect of forest size

28

slide by Nando de Freitas
slide-29
SLIDE 29

Effect of forest size

29

slide by Nando de Freitas
slide-30
SLIDE 30

Effect of more classes and noise

30

Effect of more classes and noise

[Criminisi et al, 2011]

slide by Nando de Freitas
slide-31
SLIDE 31

Effect of more classes and noise

31

slide by Nando de Freitas
slide-32
SLIDE 32

Effect of tree depth (D)

32

Training'points:'4.class'mixed'

D=3 D=6 D=15

slide by Nando de Freitas

(underfitting) (overfitting)

slide-33
SLIDE 33

Effect of bagging

33

no bagging => max-margin

slide by Nando de Freitas
slide-34
SLIDE 34

Random Forests and the Kinect

34

slide by Nando de Freitas
slide-35
SLIDE 35

Random Forests and the Kinect

35

depth image body parts 3D joint proposals

[Jamie Shotton et al., 2011]

adopted from Nando de Freitas
slide-36
SLIDE 36

Random Forests and the Kinect

  • Use computer graphics to generate plenty of data

36

[Jamie Shotton et al., 2011]

synthetic (train & test) real (test)

adopted from Nando de Freitas
slide-37
SLIDE 37

Reduce Bias2 and Decrease Variance?

  • Bagging reduces variance by averaging
  • Bagging has little effect on bias
  • Can we average and reduce bias?
  • Yes: Boosting

37

slide by David Sontag
slide-38
SLIDE 38

Next Lecture:

Boosting

38