CS885 Reinforcement Learning Lecture 14c: June 15, 2018
Trust Region Methods [Nocedal and Wright, Chapter 4]
CS885 Spring 2018 Pascal Poupart 1 University of Waterloo
CS885 Reinforcement Learning Lecture 14c: June 15, 2018 Trust - - PowerPoint PPT Presentation
CS885 Reinforcement Learning Lecture 14c: June 15, 2018 Trust Region Methods [Nocedal and Wright, Chapter 4] University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Op#miza#on in ML It is common to formulate ML problems as optimization
CS885 Spring 2018 Pascal Poupart 1 University of Waterloo
CS885 Spring 2018 Pascal Poupart 2
University of Waterloo
CS885 Spring 2018 Pascal Poupart 3
University of Waterloo
CS885 Spring 2018 Pascal Poupart 4
$∗ = '()*+,-∈34563789:;< ̃ !($)
University of Waterloo
CS885 Spring 2018 Pascal Poupart 5
4 ≤ 6
University of Waterloo
CS885 Spring 2018 Pascal Poupart 6
University of Waterloo
∗ and % = 0
∗ = ,-./0%1 ̃
3(") subject to " − "+78
∗ 9 ≤ !
∗
∗) then increase !
CS885 Spring 2018 Pascal Poupart 7
&
5 ≤ 7
University of Waterloo