Learning Optimal Linear Regularizers Matthew Streeter Setup - - PowerPoint PPT Presentation
Learning Optimal Linear Regularizers Matthew Streeter Setup - - PowerPoint PPT Presentation
Learning Optimal Linear Regularizers Matthew Streeter Setup Want to produce a model Will minimize training loss + regularizer: L train () + R() Ultimately, we care about test loss: L test () Setup Want to produce a
Setup
- Want to produce a model θ
- Will minimize training loss + regularizer: Ltrain(θ) + R(θ)
- Ultimately, we care about test loss: Ltest(θ)
Setup
- Want to produce a model θ
- Will minimize training loss + regularizer: Ltrain(θ) + R(θ)
- Ultimately, we care about test loss: Ltest(θ)
- An optimal regularizer: R(θ) = Ltest(θ) - Ltrain(θ)
○ suggests that a good regularizer should upper bound the generalization gap
What makes a good regularizer?
- Want to find regularizer R that minimizes Ltest(θR)
What makes a good regularizer?
- Want to find regularizer R that minimizes Ltest(θR)
What makes a good regularizer?
- Want to find regularizer R that minimizes Ltest(θR)
What makes a good regularizer?
- Want to find regularizer R that minimizes Ltest(θR)
Approximate by maximizing over small set of models
(estimating test loss using validation set)
Learning linear regularizers
- Linear regularizer: R(θ) = λ * feature_vector(θ)
Learning linear regularizers
- Linear regularizer: R(θ) = λ * feature_vector(θ)
- LearnReg: given models with known training & validation loss, finds
best λ (in terms of approximation on previous slide)
Learning linear regularizers
- Linear regularizer: R(θ) = λ * feature_vector(θ)
Solves a sequence of linear programs
- LearnReg: given models with known training & validation loss, finds
best λ (in terms of approximation on previous slide)
Learning linear regularizers
- Linear regularizer: R(θ) = λ * feature_vector(θ)
Under certain assumptions, can “jump” to optimal λ given data from just 1 + |λ| models Solves a sequence of linear programs
- LearnReg: given models with known training & validation loss, finds
best λ (in terms of approximation on previous slide)
Learning linear regularizers
- Linear regularizer: R(θ) = λ * feature_vector(θ)
Under certain assumptions, can “jump” to optimal λ given data from just 1 + |λ| models
- TuneReg: uses LearnReg iteratively to do hyperparameter tuning
Solves a sequence of linear programs
- LearnReg: given models with known training & validation loss, finds
best λ (in terms of approximation on previous slide)
Hyperparameter tuning experiment
- Inception-v3 transfer learning problem, linear combination of 4 regularizers
Hyperparameter tuning experiment
- Inception-v3 transfer learning problem, linear combination of 4 regularizers
Hyperparameter tuning experiment
- Inception-v3 transfer learning problem, linear combination of 4 regularizers
LearnReg kicks in here
Hyperparameter tuning experiment
- Inception-v3 transfer learning problem, linear combination of 4 regularizers
LearnReg kicks in here