CS885 Reinforcement Learning Lecture 14c: June 15, 2018 Trust - - PowerPoint PPT Presentation

cs885 reinforcement learning lecture 14c june 15 2018
SMART_READER_LITE
LIVE PREVIEW

CS885 Reinforcement Learning Lecture 14c: June 15, 2018 Trust - - PowerPoint PPT Presentation

CS885 Reinforcement Learning Lecture 14c: June 15, 2018 Trust Region Methods [Nocedal and Wright, Chapter 4] University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Op#miza#on in ML It is common to formulate ML problems as optimization


slide-1
SLIDE 1

CS885 Reinforcement Learning Lecture 14c: June 15, 2018

Trust Region Methods [Nocedal and Wright, Chapter 4]

CS885 Spring 2018 Pascal Poupart 1 University of Waterloo

slide-2
SLIDE 2

CS885 Spring 2018 Pascal Poupart 2

Op#miza#on in ML

  • It is common to formulate ML problems as
  • ptimization problems.

– Min squared error – Min cross entropy – Max log likelihood – Max discounted sum of rewards

University of Waterloo

slide-3
SLIDE 3

CS885 Spring 2018 Pascal Poupart 3

Two important classes

  • Line search methods

– Find a direction of improvement – Select a step length

  • Trust region methods

– Select a trust region (analog to max step length) – Find a point of improvement in the region

University of Waterloo

slide-4
SLIDE 4

CS885 Spring 2018 Pascal Poupart 4

Trust Region Methods

  • Idea:

– Approximate objective ! with a simpler objective " ! – Solve ̃ $∗ = '()*+,- ̃ !($)

  • Problem: The optimum 0

$∗ might be in a region where " ! poorly approximates ! and therefore 1 $∗ might be far from optimal

  • Solution: restrict the search to a region where we

trust " ! to approximate ! well.

– Solve ̃

$∗ = '()*+,-∈34563789:;< ̃ !($)

University of Waterloo

slide-5
SLIDE 5

CS885 Spring 2018 Pascal Poupart 5

Example

  • !

" o)en chosen to be a quadra5c approxima5on of "

" # ≈ ! " # = f c + ∇" * + # − * + 1 2! # − * +0(*)(# − *)

where 3" is the gradient and 0 is the hessian

  • Trust region o)en chosen to be a hypersphere

# − *

4 ≤ 6

University of Waterloo

slide-6
SLIDE 6

CS885 Spring 2018 Pascal Poupart 6

Generic Algorithm

University of Waterloo

trustRegionMethod

Initialize !, "#

∗ and % = 0

Repeat % ← % + 1 Solve "+

∗ = ,-./0%1 ̃

3(") subject to " − "+78

∗ 9 ≤ !

If ; 3 "+

≈ 3("+

∗) then increase !

else decrease ! Until convergence

slide-7
SLIDE 7

CS885 Spring 2018 Pascal Poupart 7

Trust Region Subproblem

  • !

" often chosen to be a quadratic approximation of " min

&

f c + ∇" + , - − + + 1 2! - − + ,2(+)(- − +) subject to - − +

5 ≤ 7

  • When 2 is positive semi-definite

– Convex optimization – Simple and globally optimal solution

  • When 2 is not positive semi-definite

– Non-convex optimization – Simple heuristics that guarantee improvement

University of Waterloo