Learning Step Size Controllers for Robust Neural Network Training - PowerPoint PPT Presentation
Learning Step Size Controllers for Robust Neural Network Training Christian Daniel et al. Recent Trends in Automated Machine Learning Abeeha Shafiq 18.07.2019 Motivation Optimizers are sensitive to initial learning rate Good
Learning Step Size Controllers for Robust Neural Network Training Christian Daniel et al. Recent Trends in Automated Machine Learning Abeeha Shafiq 18.07.2019
Motivation • Optimizers are sensitive to initial learning rate • Good learning rate is problem specific • Manual search required Image taken from I2DL lecture slide Abeeha Shafiq | Recent Trends in Automated Machine Learning 2
Previous Work • Waterfall scheme • Exponential/power scheme • TONGA Abeeha Shafiq | Recent Trends in Automated Machine Learning 3
Goal Develop an adaptive controller for the learning rate used in training algorithms such as Stochastic Gradient Descent (SGD) with Reinforcement Learning Abeeha Shafiq | Recent Trends in Automated Machine Learning 4
Contributions • Identifying informative features for controller • Proposing a learning setup for a controller • Showing that the resulting controller generalizes across different tasks and architectures. Abeeha Shafiq | Recent Trends in Automated Machine Learning 5
Problem statement for controller • Find the minimizer • F ( · ) sums over the function values induced by the individual inputs T ( · ) is an optimization operator which yields a weight update vector to find ω ∗ • • SGD weight update Abeeha Shafiq | Recent Trends in Automated Machine Learning 6
Learning a Controller Relative Entropy Policy Search (REPS) Concept similar to Proximal Policy Optimization Abeeha Shafiq | Recent Trends in Automated Machine Learning 7
Features • Informative about current state • Generalize across different tasks and architectures • Constrained by computation and memory limits
Features • Predictive change in function value. • Disagreement of function values. Abeeha Shafiq | Recent Trends in Automated Machine Learning 9
Mini Batch Setting • Discounted Average. • Smooths outliers • Serve as memory • Uncertainty Estimate • Estimate of noise in the system Abeeha Shafiq | Recent Trends in Automated Machine Learning 10
Experimental Setup • Datasets: MNIST, CIFAR-10 • Learning Algorithms: SGD and RMSProp • Model: CNN • For Learning Controller parameters: • Subset of MNIST • Small CNN architecture • π ( θ ) to a Gaussian with isotropic covariance Abeeha Shafiq | Recent Trends in Automated Machine Learning 11
Results • overhead of 36% for controller training • Generalized to different variants of CNN • Did not generalize to different training methods Abeeha Shafiq | Recent Trends in Automated Machine Learning 12
Static RMSProp vs Controlled RMSProp Abeeha Shafiq | Recent Trends in Automated Machine Learning 13
Static SGD vs Controlled SGD Abeeha Shafiq | Recent Trends in Automated Machine Learning 14
Discussion • Strengths: • Features • Not sensitive to initial learning rate • Effort to generalize • Weakness: • Tested on only 2 dataset • CNN only • Lacks comparison with • learning rate decay techniques • Grid search for initial learning rate This is a prior technique to learning the complete optimizer Abeeha Shafiq | Recent Trends in Automated Machine Learning 15
Questions?
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.