Aykut Erdem // Hacettepe University // Fall 2019
Lecture 19:
What is Ensemble Learning? Bagging Random Forests
BBM406
Fundamentals of Machine Learning
Photo byUnsplash user @nathananderson
BBM406 Fundamentals of Machine Learning Lecture 19: What is - - PowerPoint PPT Presentation
Photo byUnsplash user @nathananderson BBM406 Fundamentals of Machine Learning Lecture 19: What is Ensemble Learning? Bagging Random Forests Aykut Erdem // Hacettepe University // Fall 2019 Last time Decision Trees slide by David Sontag
Aykut Erdem // Hacettepe University // Fall 2019
Lecture 19:
What is Ensemble Learning? Bagging Random Forests
Photo byUnsplash user @nathananderson
2
slide by David Sontag3
X1 X2 Y T T T T F T T T T T F T F T T F F F In our running example: IG(X1) = H(Y) – H(Y|X1) = 0.65 – 0.33 IG(X1) > 0 we prefer the split!
slide by David SontagXj c1 c2
t1 t2
Last time… Continuous features
classes matter!
4
Xj c2 c1
t1 t2
slide by David SontagLast time… Decision trees will overfit
trees
5
slide by David Sontag6
– Generate multiple hypotheses
– Combine them to a single classifier
– How do we generate multiple hypotheses
– How do we combine the multiple hypotheses
7
slide by Yishay Mansour8
Hastie, Tibshirani, Friedman “Elements of Statistical Learning” 2001
Bias/Variance&Tradeoff&
slide by David Sontag9
http://scott.fortmann-roe.com/docs/BiasVariance.html Graphical illustration of bias and variance.
slide by David SontagFighting the bias-variance tradeoff
stumps (or shallow decision trees)
– High bias, can’t solve hard learning problems
10
slide by Aarti SinghReduce Variance Without Increasing Bias
11
(when prediction are independent)
slide by David SontagBagging (Bootstrap Aggregating)
D.
training examples, create D’ by drawing N examples at random with replacement from D.
12
slide by David Sontagthan 1/N
cases usually somewhat larger
13
slide by David Sontag14
slide by David Sontag15
* A decision tree learning algorithm; very similar to ID3
slide by David Sontag16
17
decision tree classifiers
and “Random input vectors”
bootstrap sample of training data
chosen from a random sample of m attributes instead of all attributes
18
slide by David Sontag19 Data in feature space
?" ?" ?"
Classification tree training
[Criminisi et al., 2011]
slide by Nando de FreitasUse information gain to decide splits
20
Split&1& Split&2& Before&split& [Criminisi et al., 2011]
slide by Nando de FreitasAdvanced: Gaussian information gain to decide splits
21
Before&split&
Split&1& Split&2&
[Criminisi et al., 2011]
slide by Nando de Freitas22
Split&1&
𝜾=1 𝜾=2 …
[Criminisi et al., 2011]
leaf% leaf% leaf%
Leaf model: probabilistic
Split node (train)
Node%weak%learner%
Split node (test)
slide by Nando de Freitas23
axis aligned
conic section examples of weak learners
slide by Nando de Freitas24
Building a random tree
slide by Nando de Freitas25
[From the book of Hastie, Friedman and Tibshirani]
slide by Nando de Freitas26
slide by Nando de Freitas27
Tree t=1 t=2 t=3
slide by Nando de Freitas28
slide by Nando de Freitas29
slide by Nando de FreitasEffect of more classes and noise
30
Effect of more classes and noise
[Criminisi et al, 2011]
slide by Nando de FreitasEffect of more classes and noise
31
slide by Nando de Freitas32
Training'points:'4.class'mixed'
D=3 D=6 D=15
slide by Nando de Freitas(underfitting) (overfitting)
33
no bagging => max-margin
slide by Nando de Freitas34
slide by Nando de Freitas35
depth image body parts 3D joint proposals
[Jamie Shotton et al., 2011]
adopted from Nando de Freitas36
[Jamie Shotton et al., 2011]
synthetic (train & test) real (test)
adopted from Nando de FreitasReduce Bias2 and Decrease Variance?
37
slide by David SontagBoosting
38