CS885 Reinforcement Learning Lecture 14c: June 15, 2018 Trust - - PowerPoint PPT Presentation

▶

Nov 16, 2022 132 likes •212 views

CS885 Reinforcement Learning Lecture 14c: June 15, 2018 Trust Region Methods [Nocedal and Wright, Chapter 4] University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Op#miza#on in ML It is common to formulate ML problems as optimization

SLIDE 1

CS885 Reinforcement Learning Lecture 14c: June 15, 2018

Trust Region Methods [Nocedal and Wright, Chapter 4]

CS885 Spring 2018 Pascal Poupart 1 University of Waterloo

SLIDE 2

CS885 Spring 2018 Pascal Poupart 2

Op#miza#on in ML

It is common to formulate ML problems as
ptimization problems.

– Min squared error – Min cross entropy – Max log likelihood – Max discounted sum of rewards

University of Waterloo

SLIDE 3

CS885 Spring 2018 Pascal Poupart 3

Two important classes

Line search methods

– Find a direction of improvement – Select a step length

Trust region methods

– Select a trust region (analog to max step length) – Find a point of improvement in the region

University of Waterloo

SLIDE 4

CS885 Spring 2018 Pascal Poupart 4

Trust Region Methods

Idea:

– Approximate objective ! with a simpler objective " ! – Solve ̃ $∗ = '()*+,- ̃ !($)

Problem: The optimum 0

$∗ might be in a region where " ! poorly approximates ! and therefore 1 $∗ might be far from optimal

Solution: restrict the search to a region where we

trust " ! to approximate ! well.

– Solve ̃

$∗ = '()*+,-∈34563789:;< ̃ !($)

University of Waterloo

SLIDE 5

CS885 Spring 2018 Pascal Poupart 5

Example

" o)en chosen to be a quadra5c approxima5on of "

" # ≈ ! " # = f c + ∇" * + # − * + 1 2! # − * +0()(# − )

where 3" is the gradient and 0 is the hessian

Trust region o)en chosen to be a hypersphere

# − *

4 ≤ 6

University of Waterloo

SLIDE 6

CS885 Spring 2018 Pascal Poupart 6

Generic Algorithm

University of Waterloo

trustRegionMethod

Initialize !, "#

∗ and % = 0

Repeat % ← % + 1 Solve "+

∗ = ,-./0%1 ̃

3(") subject to " − "+78

∗ 9 ≤ !

If ; 3 "+

∗

≈ 3("+

∗) then increase !

else decrease ! Until convergence

SLIDE 7

CS885 Spring 2018 Pascal Poupart 7

Trust Region Subproblem

" often chosen to be a quadratic approximation of " min

f c + ∇" + , - − + + 1 2! - − + ,2(+)(- − +) subject to - − +

5 ≤ 7

When 2 is positive semi-definite

– Convex optimization – Simple and globally optimal solution

When 2 is not positive semi-definite

– Non-convex optimization – Simple heuristics that guarantee improvement

University of Waterloo

CS885 Reinforcement Learning Lecture 14c: June 15, 2018

Trust Region Methods [Nocedal and Wright, Chapter 4]

Op#miza#on in ML

– Min squared error – Min cross entropy – Max log likelihood – Max discounted sum of rewards

Two important classes

– Find a direction of improvement – Select a step length

– Select a trust region (analog to max step length) – Find a point of improvement in the region

Trust Region Methods

– Approximate objective ! with a simpler objective " ! – Solve ̃ $∗ = '()*+,- ̃ !($)

$∗ might be in a region where " ! poorly approximates ! and therefore 1 $∗ might be far from optimal

trust " ! to approximate ! well.

– Solve ̃

Example

" o)en chosen to be a quadra5c approxima5on of "

" # ≈ ! " # = f c + ∇" * + # − * + 1 2! # − * +0(*)(# − *)

where 3" is the gradient and 0 is the hessian

# − *

Generic Algorithm

trustRegionMethod

Initialize !, "#

Repeat % ← % + 1 Solve "+

If ; 3 "+

≈ 3("+

else decrease ! Until convergence

Trust Region Subproblem

" often chosen to be a quadratic approximation of " min

f c + ∇" + , - − + + 1 2! - − + ,2(+)(- − +) subject to - − +

– Convex optimization – Simple and globally optimal solution

– Non-convex optimization – Simple heuristics that guarantee improvement

" # ≈ ! " # = f c + ∇" * + # − * + 1 2! # − * +0()(# − )