Newton Methods for Neural Networks: Part 1
Chih-Jen Lin
National Taiwan University Last updated: June 18, 2019
Chih-Jen Lin (National Taiwan Univ.) 1 / 29
Newton Methods for Neural Networks: Part 1 Chih-Jen Lin National - - PowerPoint PPT Presentation
Newton Methods for Neural Networks: Part 1 Chih-Jen Lin National Taiwan University Last updated: June 18, 2019 Chih-Jen Lin (National Taiwan Univ.) 1 / 29 Outline Introduction 1 Newton method 2 Hessian and Gaussian-Newton Matrices 3
Chih-Jen Lin (National Taiwan Univ.) 1 / 29
1
2
3
Chih-Jen Lin (National Taiwan Univ.) 2 / 29
Introduction
1
2
3
Chih-Jen Lin (National Taiwan Univ.) 3 / 29
Introduction
Chih-Jen Lin (National Taiwan Univ.) 4 / 29
Newton method
1
2
3
Chih-Jen Lin (National Taiwan Univ.) 5 / 29
Newton method
θ f (θ)
d
Chih-Jen Lin (National Taiwan Univ.) 6 / 29
Newton method
Chih-Jen Lin (National Taiwan Univ.) 7 / 29
Newton method
Chih-Jen Lin (National Taiwan Univ.) 8 / 29
Newton method
Chih-Jen Lin (National Taiwan Univ.) 9 / 29
Newton method
Chih-Jen Lin (National Taiwan Univ.) 10 / 29
Newton method
Chih-Jen Lin (National Taiwan Univ.) 11 / 29
Newton method
Chih-Jen Lin (National Taiwan Univ.) 12 / 29
Newton method
Chih-Jen Lin (National Taiwan Univ.) 13 / 29
Newton method
Chih-Jen Lin (National Taiwan Univ.) 14 / 29
Newton method
Chih-Jen Lin (National Taiwan Univ.) 15 / 29
Newton method
Chih-Jen Lin (National Taiwan Univ.) 16 / 29
Newton method
Chih-Jen Lin (National Taiwan Univ.) 17 / 29
Hessian and Gaussian-Newton Matrices
1
2
3
Chih-Jen Lin (National Taiwan Univ.) 18 / 29
Hessian and Gaussian-Newton Matrices
Chih-Jen Lin (National Taiwan Univ.) 19 / 29
Hessian and Gaussian-Newton Matrices
l
∂zL+1,i
1
∂θ1
∂zL+1,i
1
∂θn
∂zL+1,i
nL+1
∂θ1
∂zL+1,i
nL+1
∂θn
nL+1×n
Chih-Jen Lin (National Taiwan Univ.) 20 / 29
Hessian and Gaussian-Newton Matrices
l
l
nL
j
∂2zL+1,i
j
∂θ1∂θ1
∂2zL+1,i
j
∂θ1∂θn
∂2zL+1,i
j
∂θn∂θ1
∂2zL+1,i
j
∂θn∂θn
Chih-Jen Lin (National Taiwan Univ.) 21 / 29
Hessian and Gaussian-Newton Matrices
zL+1,i,zL+1,iξ(zL+1,i; y i, Z 1,i)
ts = ∂2ξ(zL+1,i; y i, Z 1,i)
t
s
Chih-Jen Lin (National Taiwan Univ.) 22 / 29
Hessian and Gaussian-Newton Matrices
Chih-Jen Lin (National Taiwan Univ.) 23 / 29
Hessian and Gaussian-Newton Matrices
Chih-Jen Lin (National Taiwan Univ.) 24 / 29
Hessian and Gaussian-Newton Matrices
∂zL+1,i
1
∂θ1
∂zL+1,i
1
∂θn
∂zL+1,i
nL
∂θ1
∂zL+1,i
nL
∂θn
Chih-Jen Lin (National Taiwan Univ.) 25 / 29
Hessian and Gaussian-Newton Matrices
l
Chih-Jen Lin (National Taiwan Univ.) 26 / 29
Hessian and Gaussian-Newton Matrices
Chih-Jen Lin (National Taiwan Univ.) 27 / 29
Hessian and Gaussian-Newton Matrices
Proceedings of the 34th International Conference on Machine Learning, pages 557–565, 2017.
neural networks, 2018. URL https://openreview.net/forum?id=HJYoqzbC-. S.-W. Chen, C.-N. Chou, and E. Y. Chang. An approximate second-order method for training fully-connected neural networks. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019.
Newton solvers for deep learning, 2018. arXiv preprint 1805.08095.
International Conference on Machine Learning (ICML), 2010.
pages 2408–2417, 2015.
Chih-Jen Lin (National Taiwan Univ.) 28 / 29
Hessian and Gaussian-Newton Matrices
second-order optimization using kronecker-factored approximate curvature for deep convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
Neural Computation, 14(7):1723–1738, 2002. C.-H. Tsai, C.-Y. Lin, and C.-J. Lin. Incremental and decremental training for linear
Knowledge Discovery and Data Mining, 2014. URL http://www.csie.ntu.edu.tw/~cjlin/papers/ws/inc-dec.pdf. C.-C. Wang, K.-L. Tan, C.-T. Chen, Y.-H. Lin, S. S. Keerthi, D. Mahajan, S. Sundararajan, and C.-J. Lin. Distributed Newton methods for deep learning. Neural Computation, 30: 1673–1724, 2018a. URL http://www.csie.ntu.edu.tw/~cjlin/papers/dnn/dsh.pdf. C.-C. Wang, K. L. Tan, and C. J. Lin. Newton methods for convolutional neural networks. Technical report, National Taiwan University, 2018b.
Intelligent Engineering Systems, 2007.
training neural networks, 2017. arXiv preprint arXiv:1712.07296.
Chih-Jen Lin (National Taiwan Univ.) 29 / 29