Optimal Control Theory The theory Optimal control theory is a - - PowerPoint PPT Presentation

optimal control theory the theory
SMART_READER_LITE
LIVE PREVIEW

Optimal Control Theory The theory Optimal control theory is a - - PowerPoint PPT Presentation

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline which provides algorithms to solve various control problems The elaborate mathematical machinery behind optimal control models is rarely exposed


slide-1
SLIDE 1

Optimal Control Theory

slide-2
SLIDE 2

The theory

  • Optimal control theory is a mature mathematical discipline

which provides algorithms to solve various control problems

  • The elaborate mathematical machinery behind optimal control

models is rarely exposed to computer animation community

  • Most controllers designed in practice are theoretically

suboptimal

  • This lecture closely follows the excellent tutorial by Dr. Emo

Todorov (http://www.cs.washington.edu/homes/todorov/ papers/optimality_chapter.pdf)

slide-3
SLIDE 3
  • Discrete control: Bellman equations
  • Continuous control: HJB equations
  • Maximum principle
  • Linear quadratic regulator (LQR)
slide-4
SLIDE 4

Standard problem

  • Find an action sequence (u0, u1, ..., un-1) and corresponding

state sequence (x0, x1, ..., xn-1) minimizing the total cost

  • The initial state (x0) and the destination state (xn) are given
slide-5
SLIDE 5

Discrete control

$250 $200 $150 $120 $500 $450 $350 $250 $150 $120 $200 $350 $300

next(x,u) cost(x,u)

slide-6
SLIDE 6

Dynamic programming

  • Bellman optimality principle:
  • If a given state-action sequence is optimal and we remove

the first state and action, remaining sequence is also optimal

  • The choice of optimal actions in the futures is independent
  • f the past actions which led to the present state
  • The optimal state-action sequences can be constructed by

starting at the final state and extending backwards

slide-7
SLIDE 7

Optimal value function

  • v(x) = “minimal total cost for completing the task starting from

state x”

  • Find optimal actions:
  • 1. Consider every action available at the current state
  • 2. Add its immediate cost to the optimal value of the resulting

next state

  • 3. Choose an action for which the sum is minimal
slide-8
SLIDE 8

Optimal control policy

  • A mapping from states to actions is called control policy or

control law

  • Once we have a control policy, we can start at any state and

reach the destination state by following the control policy

  • Optimal control policy satisfies
  • Its corresponding optimal value function satisfies
slide-9
SLIDE 9

Value iteration

  • Bellman equations cannot be solved in a single pass if the state

transitions are cyclic

  • Value iteration starts with a guess v(0) of the optimal value

function and construct a sequence of improved guesses:

slide-10
SLIDE 10
  • Discrete control: Bellman equations
  • Continuous control: HJB equations
  • Maximum principle
  • Linear quadratic regulator (LQR)
slide-11
SLIDE 11

Continuous control

  • State space and control space are continuos
  • Dynamics of the system:
  • Continuous time
  • Discrete time
  • Objective function:
slide-12
SLIDE 12

HJB equation

  • HJB equation is a nonlinear PDE with respect to unknown

function v

  • An optimal control π(x, t) is a value of u which achieves the

minimum in HJB equation −vt(x, t) = min

u∈U(x)(l(x, u, t) + f(x, u)T vx(x, t))

π(x, t) = arg min

u∈U(x)(l(x, u, t) + f(x, u)T vx(x, t))

slide-13
SLIDE 13

Numerical solution

  • Non-linear differential equations do not always have classic

solutions which satisfy them everywhere

  • Numerical methods guarantee convergence, but they rely on

discretization of the state space, which grows exponentially in the state space dimension

  • Nevertheless, the HJB equations have motivated a number of

methods for approximate solution

slide-14
SLIDE 14

Parametric value function

  • Consider an approximation to the optimal value function
  • The derivative function with respect to x
  • Choose a large enough set of states and evaluate the right hand

side of HJB using the approximated value function

  • Adjust theta such that get closer to target values
slide-15
SLIDE 15
  • Discrete control: Bellman equations
  • Continuous control: HJB equations
  • Maximum principle
  • Linear quadratic regulator (LQR)
slide-16
SLIDE 16
  • Maximum principle solves the optimal control for a

deterministic dynamic system with boundary conditions

  • Can be derived via HJB equations or Lagrange multipliers
  • Can be generalized to other types of optimal control problems:

free final time, intermediate constraints, first exit time, control constraints, etc

Maximum principle

slide-17
SLIDE 17

Derivation via HJB

  • The finite horizon HJB:
  • If an optimal control policy, π(x, t) is given, we can set u =

π(x, t) and drop the min operator in HJB

slide-18
SLIDE 18

Maximum principle

  • The remarkable property of the maximum principle is that it is

an ODE, even though we derived it starting from a PDE

  • An ODE is a consistency condition which singles out specific

trajectories without reference to neighboring trajectories

  • Extremal trajectories which solve the above optimization

remove the dependence on neighboring trajectories

slide-19
SLIDE 19

Hamiltonian function

  • The maximum principle can be written in more compact and

symmetric form with the help of the Hamiltonian function

  • Maximum principle can be redefined as
slide-20
SLIDE 20
  • Discrete control: Bellman equations
  • Continuous control: HJB equations
  • Maximum principle
  • Linear quadratic regulator (LQR)
slide-21
SLIDE 21
  • Most optimal control problems do not have closed-form
  • solutions. One exception is LQR case
  • LQR is a class of problems which dynamic function is linear

and cost function is quadratic

  • dynamics:
  • cost rate:
  • final cost
  • R is symmetric positive definite, and Q and Qf are symmetric
  • A, B, R, Q can be made time-varying

Linear quadratic regulator

slide-22
SLIDE 22

Optimal value function

  • For a LQR problem, the optimal value function is quadratic in

x and can be expressed as

  • We can obtain the ODE of V(t) via HJB equation

where V(t) is a symmetric matrix

slide-23
SLIDE 23

Discrete LQR

  • LQR is defined as follows when time is discretized
  • dynamics
  • cost rate
  • final cost
  • Let n = tf /Δ, the correspondence to continuous-time problem is
slide-24
SLIDE 24

Optimal value function

  • We derive optimal value function from Bellman equation
  • Again, the optimal value function is quadratic in x and changes
  • ver time
  • Plugging in Bellman equation, we obtain a recursive relation of

Vk

  • The optimal control law is linear in x