Learning Optimal Linear Regularizers Matthew Streeter Setup - - PowerPoint PPT Presentation

learning optimal linear regularizers
SMART_READER_LITE
LIVE PREVIEW

Learning Optimal Linear Regularizers Matthew Streeter Setup - - PowerPoint PPT Presentation

Learning Optimal Linear Regularizers Matthew Streeter Setup Want to produce a model Will minimize training loss + regularizer: L train () + R() Ultimately, we care about test loss: L test () Setup Want to produce a


slide-1
SLIDE 1

Matthew Streeter

Learning Optimal Linear Regularizers

slide-2
SLIDE 2

Setup

  • Want to produce a model θ
  • Will minimize training loss + regularizer: Ltrain(θ) + R(θ)
  • Ultimately, we care about test loss: Ltest(θ)
slide-3
SLIDE 3

Setup

  • Want to produce a model θ
  • Will minimize training loss + regularizer: Ltrain(θ) + R(θ)
  • Ultimately, we care about test loss: Ltest(θ)
  • An optimal regularizer: R(θ) = Ltest(θ) - Ltrain(θ)

○ suggests that a good regularizer should upper bound the generalization gap

slide-4
SLIDE 4

What makes a good regularizer?

  • Want to find regularizer R that minimizes Ltest(θR)
slide-5
SLIDE 5

What makes a good regularizer?

  • Want to find regularizer R that minimizes Ltest(θR)
slide-6
SLIDE 6

What makes a good regularizer?

  • Want to find regularizer R that minimizes Ltest(θR)
slide-7
SLIDE 7

What makes a good regularizer?

  • Want to find regularizer R that minimizes Ltest(θR)

Approximate by maximizing over small set of models

(estimating test loss using validation set)

slide-8
SLIDE 8

Learning linear regularizers

  • Linear regularizer: R(θ) = λ * feature_vector(θ)
slide-9
SLIDE 9

Learning linear regularizers

  • Linear regularizer: R(θ) = λ * feature_vector(θ)
  • LearnReg: given models with known training & validation loss, finds

best λ (in terms of approximation on previous slide)

slide-10
SLIDE 10

Learning linear regularizers

  • Linear regularizer: R(θ) = λ * feature_vector(θ)

Solves a sequence of linear programs

  • LearnReg: given models with known training & validation loss, finds

best λ (in terms of approximation on previous slide)

slide-11
SLIDE 11

Learning linear regularizers

  • Linear regularizer: R(θ) = λ * feature_vector(θ)

Under certain assumptions, can “jump” to optimal λ given data from just 1 + |λ| models Solves a sequence of linear programs

  • LearnReg: given models with known training & validation loss, finds

best λ (in terms of approximation on previous slide)

slide-12
SLIDE 12

Learning linear regularizers

  • Linear regularizer: R(θ) = λ * feature_vector(θ)

Under certain assumptions, can “jump” to optimal λ given data from just 1 + |λ| models

  • TuneReg: uses LearnReg iteratively to do hyperparameter tuning

Solves a sequence of linear programs

  • LearnReg: given models with known training & validation loss, finds

best λ (in terms of approximation on previous slide)

slide-13
SLIDE 13

Hyperparameter tuning experiment

  • Inception-v3 transfer learning problem, linear combination of 4 regularizers
slide-14
SLIDE 14

Hyperparameter tuning experiment

  • Inception-v3 transfer learning problem, linear combination of 4 regularizers
slide-15
SLIDE 15

Hyperparameter tuning experiment

  • Inception-v3 transfer learning problem, linear combination of 4 regularizers

LearnReg kicks in here

slide-16
SLIDE 16

Hyperparameter tuning experiment

  • Inception-v3 transfer learning problem, linear combination of 4 regularizers

LearnReg kicks in here