Bias Variance Trade-off Intuition: If the model is too simple, the - - PowerPoint PPT Presentation

▶

Dec 13, 2022 208 likes •362 views

Carnegie Mellon University 10-701 Machine Learning Spring 2013 Bias Variance Trade-off Intuition: If the model is too simple, the solution is biased and does not fit the data If the model is too complex then it is very sensitive to

SLIDE 1

Carnegie Mellon University 10-701 Machine Learning Spring 2013

Bias Variance Trade-off

Intuition:

If the model is too simple, the solution is biased and does not fit the data If the model is too complex then it is very sensitive to small changes in the data

2/12/13 1 Introduction to Probability Theory

SLIDE 2

Carnegie Mellon University 10-701 Machine Learning Spring 2013

Bias

If you sample a dataset D multiple times you expect to learn a different h(x) Expected hypothesis is ED[h(x)] Bias: difference between the truth and what you expect to learn

Decreases with more complex models

2/12/13 2 Recitation 1: Statistics Intro

bias2 = Z

{ED[h(x)] − t(x)2}2p(x)dx

SLIDE 3

Carnegie Mellon University 10-701 Machine Learning Spring 2013

Variance

Variance: difference between what you learn from a particular dataset and what you expect to learn

Decreases with simpler models

2/12/13 3 Recitation 1: Statistics Intro

variance = Z

{ED[(h(x) − ¯ h(x))2]}p(x)dx ¯ h(x) = ED[h(x)]

SLIDE 4

Carnegie Mellon University 10-701 Machine Learning Spring 2013

Bias-Variance Tradeoff

The choice of hypothesis class introduces a learning bias

More complex class: less bias and more variance.

2/12/13 4 Recitation 1: Statistics Intro

SLIDE 5

Carnegie Mellon University 10-701 Machine Learning Spring 2013

Training error

Given a dataset Chose a loss function (L2 for regression for example)

Training set error:

2/12/13 5 Recitation 1: Statistics Intro

errortrain = 1 Ntrain

Ntrain

X

j=1

⇣ yi − w.xi ⌘2 errortrain = 1 Ntrain

Ntrain

X

j=1

⇣ I(yi 6= h(x)) ⌘

SLIDE 6

Carnegie Mellon University 10-701 Machine Learning Spring 2013

Training error as a function of complexity

2/12/13 6 Recitation 1: Statistics Intro

SLIDE 7

Carnegie Mellon University 10-701 Machine Learning Spring 2013

Prediction error

Training error is not necessary a good measure We care about the error over all inputs points:

2/12/13 7 Recitation 1: Statistics Intro

errortrue = Ex ⇣ I(y 6= h(x)) ⌘

SLIDE 8

Carnegie Mellon University 10-701 Machine Learning Spring 2013

Prediction error as a function of complexity

2/12/13 8 Recitation 1: Statistics Intro

SLIDE 9

Carnegie Mellon University 10-701 Machine Learning Spring 2013

Prediction error

Training error is not necessary a good measure We care about the error over all inputs points: Training error is an optimistically biased estimate of prediction error. You

ptimized with respect to training set.

2/12/13 9 Recitation 1: Statistics Intro

errortrue = Ex ⇣ I(y 6= h(x)) ⌘

SLIDE 10

Carnegie Mellon University 10-701 Machine Learning Spring 2013

Train-test

In practice:

Randomly divide the dataset into test and train. Use training data to optimize parameters. Test error:

2/12/13 10 Recitation 1: Statistics Intro

errortest = 1 Ntest

Ntest

X

i=1

⇣ I(yi 6= h(xi)) ⌘

SLIDE 11

Carnegie Mellon University 10-701 Machine Learning Spring 2013

Test error as a function of complexity

2/12/13 11 Recitation 1: Statistics Intro

SLIDE 12

Carnegie Mellon University 10-701 Machine Learning Spring 2013

Overfitting

Overfitting happens when we obtain a model h when there exist another solution h’ such that:

2/12/13 12 Recitation 1: Statistics Intro

[errortrain(h) < errortrain(h0)] ∧ [errortrue(h) > errortrue(h0)]

SLIDE 13

Carnegie Mellon University 10-701 Machine Learning Spring 2013

Error as a function of data size for fixed complexity

2/12/13 13 Recitation 1: Statistics Intro

SLIDE 14

Carnegie Mellon University 10-701 Machine Learning Spring 2013

Careful

Test set only unbiased if never ever do any learning on it (including parameter selection!).

2/12/13 14 Recitation 1: Statistics Intro