Learning Optimal Linear Regularizers Matthew Streeter Setup - - PowerPoint PPT Presentation

▶

Sep 20, 2022 318 likes •484 views

Learning Optimal Linear Regularizers Matthew Streeter Setup Want to produce a model Will minimize training loss + regularizer: L train () + R() Ultimately, we care about test loss: L test () Setup Want to produce a

SLIDE 1

Matthew Streeter

Learning Optimal Linear Regularizers

SLIDE 2

Setup

Want to produce a model θ
Will minimize training loss + regularizer: Ltrain(θ) + R(θ)
Ultimately, we care about test loss: Ltest(θ)

SLIDE 3

Setup

Want to produce a model θ
Will minimize training loss + regularizer: Ltrain(θ) + R(θ)
Ultimately, we care about test loss: Ltest(θ)
An optimal regularizer: R(θ) = Ltest(θ) - Ltrain(θ)

○ suggests that a good regularizer should upper bound the generalization gap

SLIDE 4

What makes a good regularizer?

Want to find regularizer R that minimizes Ltest(θR)

SLIDE 5

What makes a good regularizer?

Want to find regularizer R that minimizes Ltest(θR)

SLIDE 6

What makes a good regularizer?

Want to find regularizer R that minimizes Ltest(θR)

SLIDE 7

What makes a good regularizer?

Want to find regularizer R that minimizes Ltest(θR)

Approximate by maximizing over small set of models

(estimating test loss using validation set)

SLIDE 8

Learning linear regularizers

Linear regularizer: R(θ) = λ * feature_vector(θ)

SLIDE 9

Learning linear regularizers

Linear regularizer: R(θ) = λ * feature_vector(θ)
LearnReg: given models with known training & validation loss, finds

best λ (in terms of approximation on previous slide)

SLIDE 10

Learning linear regularizers

Linear regularizer: R(θ) = λ * feature_vector(θ)

Solves a sequence of linear programs

LearnReg: given models with known training & validation loss, finds

best λ (in terms of approximation on previous slide)

SLIDE 11

Learning linear regularizers

Linear regularizer: R(θ) = λ * feature_vector(θ)

Under certain assumptions, can “jump” to optimal λ given data from just 1 + |λ| models Solves a sequence of linear programs

LearnReg: given models with known training & validation loss, finds

best λ (in terms of approximation on previous slide)

SLIDE 12

Learning linear regularizers

Linear regularizer: R(θ) = λ * feature_vector(θ)

Under certain assumptions, can “jump” to optimal λ given data from just 1 + |λ| models

TuneReg: uses LearnReg iteratively to do hyperparameter tuning

Solves a sequence of linear programs

LearnReg: given models with known training & validation loss, finds

best λ (in terms of approximation on previous slide)

SLIDE 13

Hyperparameter tuning experiment

Inception-v3 transfer learning problem, linear combination of 4 regularizers

SLIDE 14

Hyperparameter tuning experiment

Inception-v3 transfer learning problem, linear combination of 4 regularizers

SLIDE 15

Hyperparameter tuning experiment

Inception-v3 transfer learning problem, linear combination of 4 regularizers

LearnReg kicks in here

SLIDE 16

Hyperparameter tuning experiment

Inception-v3 transfer learning problem, linear combination of 4 regularizers

LearnReg kicks in here