Learning Step Size Controllers for Robust Neural Network Training - - PowerPoint PPT Presentation

▶

Aug 22, 2022 268 likes •439 views

Learning Step Size Controllers for Robust Neural Network Training Christian Daniel et al. Recent Trends in Automated Machine Learning Abeeha Shafiq 18.07.2019 Motivation Optimizers are sensitive to initial learning rate Good

SLIDE 1

Recent Trends in Automated Machine Learning Abeeha Shafiq 18.07.2019

Learning Step Size Controllers for Robust Neural Network Training

Christian Daniel et al.

SLIDE 2

Optimizers are sensitive to initial learning rate
Good learning rate is problem specific
Manual search required

2 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Motivation

Image taken from I2DL lecture slide

SLIDE 3

Waterfall scheme
Exponential/power scheme
TONGA

3 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Previous Work

SLIDE 4

Develop an adaptive controller for the learning rate used in training algorithms such as Stochastic Gradient Descent (SGD) with Reinforcement Learning

4 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Goal

SLIDE 5

Identifying informative features for controller
Proposing a learning setup for a controller
Showing that the resulting controller generalizes across different tasks and architectures.

5 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Contributions

SLIDE 6

6 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Problem statement for controller

Find the minimizer
F(·) sums over the function values induced by the individual inputs
T(·) is an optimization operator which yields a weight update vector to find ω∗
SGD weight update

SLIDE 7

7 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Learning a Controller

Relative Entropy Policy Search (REPS) Concept similar to Proximal Policy Optimization

SLIDE 8

Informative about current state
Generalize across different tasks and architectures
Constrained by computation and memory limits

Features

SLIDE 9

Predictive change in function value.

9 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Features

Disagreement of function values.

SLIDE 10

Discounted Average.
Smooths outliers
Serve as memory

10 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Mini Batch Setting

Uncertainty Estimate
Estimate of noise in the system

SLIDE 11

Datasets: MNIST, CIFAR-10
Learning Algorithms: SGD and RMSProp
Model: CNN
For Learning Controller parameters:
Subset of MNIST
Small CNN architecture
π(θ) to a Gaussian with isotropic covariance

11 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Experimental Setup

SLIDE 12

verhead of 36% for controller training
Generalized to different variants of CNN
Did not generalize to different training methods

12 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Results

SLIDE 13

13 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Static RMSProp vs Controlled RMSProp

SLIDE 14

14 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Static SGD vs Controlled SGD

SLIDE 15

Strengths:
Features
Not sensitive to initial learning rate
Effort to generalize
Weakness:
Tested on only 2 dataset
CNN only
Lacks comparison with
learning rate decay techniques
Grid search for initial learning rate

This is a prior technique to learning the complete optimizer

15 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Discussion

SLIDE 16

Recent Trends in Automated Machine Learning Abeeha Shafiq 18.07.2019

Learning Step Size Controllers for Robust Neural Network Training

Christian Daniel et al.

Motivation

Previous Work

Develop an adaptive controller for the learning rate used in training algorithms such as Stochastic Gradient Descent (SGD) with Reinforcement Learning

Goal

Contributions

Problem statement for controller

Learning a Controller

Relative Entropy Policy Search (REPS) Concept similar to Proximal Policy Optimization

Features

Features

Mini Batch Setting

Experimental Setup

Results

Static RMSProp vs Controlled RMSProp

Static SGD vs Controlled SGD

This is a prior technique to learning the complete optimizer

Discussion

Questions?