Learning Step Size Controllers for Robust Neural Network Training - - PowerPoint PPT Presentation

learning step size controllers for robust neural network
SMART_READER_LITE
LIVE PREVIEW

Learning Step Size Controllers for Robust Neural Network Training - - PowerPoint PPT Presentation

Learning Step Size Controllers for Robust Neural Network Training Christian Daniel et al. Recent Trends in Automated Machine Learning Abeeha Shafiq 18.07.2019 Motivation Optimizers are sensitive to initial learning rate Good


slide-1
SLIDE 1

Recent Trends in Automated Machine Learning Abeeha Shafiq 18.07.2019

Learning Step Size Controllers for Robust Neural Network Training

Christian Daniel et al.

slide-2
SLIDE 2
  • Optimizers are sensitive to initial learning rate
  • Good learning rate is problem specific
  • Manual search required

2 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Motivation

Image taken from I2DL lecture slide

slide-3
SLIDE 3
  • Waterfall scheme
  • Exponential/power scheme
  • TONGA

3 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Previous Work

slide-4
SLIDE 4

Develop an adaptive controller for the learning rate used in training algorithms such as Stochastic Gradient Descent (SGD) with Reinforcement Learning

4 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Goal

slide-5
SLIDE 5
  • Identifying informative features for controller
  • Proposing a learning setup for a controller
  • Showing that the resulting controller generalizes across different tasks and architectures.

5 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Contributions

slide-6
SLIDE 6

6 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Problem statement for controller

  • Find the minimizer
  • F(·) sums over the function values induced by the individual inputs
  • T(·) is an optimization operator which yields a weight update vector to find ω∗
  • SGD weight update
slide-7
SLIDE 7

7 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Learning a Controller

Relative Entropy Policy Search (REPS) Concept similar to Proximal Policy Optimization

slide-8
SLIDE 8
  • Informative about current state
  • Generalize across different tasks and architectures
  • Constrained by computation and memory limits

Features

slide-9
SLIDE 9
  • Predictive change in function value.

9 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Features

  • Disagreement of function values.
slide-10
SLIDE 10
  • Discounted Average.
  • Smooths outliers
  • Serve as memory

10 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Mini Batch Setting

  • Uncertainty Estimate
  • Estimate of noise in the system
slide-11
SLIDE 11
  • Datasets: MNIST, CIFAR-10
  • Learning Algorithms: SGD and RMSProp
  • Model: CNN
  • For Learning Controller parameters:
  • Subset of MNIST
  • Small CNN architecture
  • π(θ) to a Gaussian with isotropic covariance

11 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Experimental Setup

slide-12
SLIDE 12
  • verhead of 36% for controller training
  • Generalized to different variants of CNN
  • Did not generalize to different training methods

12 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Results

slide-13
SLIDE 13

13 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Static RMSProp vs Controlled RMSProp

slide-14
SLIDE 14

14 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Static SGD vs Controlled SGD

slide-15
SLIDE 15
  • Strengths:
  • Features
  • Not sensitive to initial learning rate
  • Effort to generalize
  • Weakness:
  • Tested on only 2 dataset
  • CNN only
  • Lacks comparison with
  • learning rate decay techniques
  • Grid search for initial learning rate

This is a prior technique to learning the complete optimizer

15 Abeeha Shafiq | Recent Trends in Automated Machine Learning

Discussion

slide-16
SLIDE 16

Questions?