Machine Learning (CSE 446): Concepts & the “i.i.d.” Supervised Learning Paradigm
Sham M Kakade
c 2018 University of Washington cse446-staff@cs.washington.edu
1 / 17
Machine Learning (CSE 446): Concepts & the i.i.d. Supervised - - PowerPoint PPT Presentation
Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 17 Review 1 / 17 Decision Tree: Making a Prediction root n:p
1 / 17
1 / 17
root n:p ϕ1? n0:p0 ϕ2? ϕ3? 1 n101:p101 n100:p100 ϕ4? 1 n111:p111 n110:p110 1 n1:p1 n10:p10 1 n11:p11
2 / 17
3 / 17
4 / 17
4 / 17
5 / 17
◮ often have a “hypothesis class” F, where our algorithm chooses ˆ
◮ use test set, i.i.d. data sampled D, to estimate the the expected error. ◮ use a “Development set”, i.i.d. from D, for hyperparameter turning
6 / 17
◮ For binary classification, where y ∈ {0, 1}:
◮ For multi-class classification, where y is one of k-outcomes:
◮ For regression, where y ∈ R, we often use the square loss:
7 / 17
8 / 17
9 / 17
◮ for a given f, we just need a training set to estimate the bias of a coin (for binary
◮ BUT there is a (“very small”) chance this approximation fails (for “large N”) ◮ try enough hypothesis, and, by chance alone, one will look good. 10 / 17
◮ It is never true (in almost all cases) that ˆ
◮ It is usually a gross underestimate.
◮ our training error, ˆ
◮ our generalization error to be small
◮ It is usually easy to get one of these two to be small. ◮ Overfitting: this is the fundamental problem of ML. 11 / 17
12 / 17
◮ use test set, i.i.d. data sampled D, to estimate the the expected error. ◮ We get an unbiased estimate of the true error (and an accurate one for “reasonable”
◮ we should never use the test set during training, as this violates the approximation
◮ use a dev set, i.i.d. from D, for hyperparameter turning
◮ learn with training set (using different hyperparams); then check on your dev set. 13 / 17
14 / 17
15 / 17
16 / 17
17 / 17