Optimal Control Theory The theory Optimal control theory is a - - PowerPoint PPT Presentation

▶

Mar 26, 2023 711 likes •971 views

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline which provides algorithms to solve various control problems The elaborate mathematical machinery behind optimal control models is rarely exposed

SLIDE 1

Optimal Control Theory

SLIDE 2

The theory

Optimal control theory is a mature mathematical discipline

which provides algorithms to solve various control problems

The elaborate mathematical machinery behind optimal control

models is rarely exposed to computer animation community

Most controllers designed in practice are theoretically

suboptimal

This lecture closely follows the excellent tutorial by Dr. Emo

Todorov (http://www.cs.washington.edu/homes/todorov/ papers/optimality_chapter.pdf)

SLIDE 3

Discrete control: Bellman equations
Continuous control: HJB equations
Maximum principle
Linear quadratic regulator (LQR)

SLIDE 4

Standard problem

Find an action sequence (u0, u1, ..., un-1) and corresponding

state sequence (x0, x1, ..., xn-1) minimizing the total cost

The initial state (x0) and the destination state (xn) are given

SLIDE 5

Discrete control

$250 $200 $150 $120 $500 $450 $350 $250 $150 $120 $200 $350 $300

next(x,u) cost(x,u)

SLIDE 6

Dynamic programming

Bellman optimality principle:
If a given state-action sequence is optimal and we remove

the first state and action, remaining sequence is also optimal

The choice of optimal actions in the futures is independent
f the past actions which led to the present state
The optimal state-action sequences can be constructed by

starting at the final state and extending backwards

SLIDE 7

Optimal value function

v(x) = “minimal total cost for completing the task starting from

state x”

Find optimal actions:
1. Consider every action available at the current state
2. Add its immediate cost to the optimal value of the resulting

next state

3. Choose an action for which the sum is minimal

SLIDE 8

Optimal control policy

A mapping from states to actions is called control policy or

control law

Once we have a control policy, we can start at any state and

reach the destination state by following the control policy

Optimal control policy satisfies
Its corresponding optimal value function satisfies

SLIDE 9

Value iteration

Bellman equations cannot be solved in a single pass if the state

transitions are cyclic

Value iteration starts with a guess v(0) of the optimal value

function and construct a sequence of improved guesses:

SLIDE 10

Discrete control: Bellman equations
Continuous control: HJB equations
Maximum principle
Linear quadratic regulator (LQR)

SLIDE 11

Continuous control

State space and control space are continuos
Dynamics of the system:
Continuous time
Discrete time
Objective function:

SLIDE 12

HJB equation

HJB equation is a nonlinear PDE with respect to unknown

function v

An optimal control π(x, t) is a value of u which achieves the

minimum in HJB equation −vt(x, t) = min

u∈U(x)(l(x, u, t) + f(x, u)T vx(x, t))

π(x, t) = arg min

u∈U(x)(l(x, u, t) + f(x, u)T vx(x, t))

SLIDE 13

Numerical solution

Non-linear differential equations do not always have classic

solutions which satisfy them everywhere

Numerical methods guarantee convergence, but they rely on

discretization of the state space, which grows exponentially in the state space dimension

Nevertheless, the HJB equations have motivated a number of

methods for approximate solution

SLIDE 14

Parametric value function

Consider an approximation to the optimal value function
The derivative function with respect to x
Choose a large enough set of states and evaluate the right hand

side of HJB using the approximated value function

Adjust theta such that get closer to target values

SLIDE 15

Discrete control: Bellman equations
Continuous control: HJB equations
Maximum principle
Linear quadratic regulator (LQR)

SLIDE 16

Maximum principle solves the optimal control for a

deterministic dynamic system with boundary conditions

Can be derived via HJB equations or Lagrange multipliers
Can be generalized to other types of optimal control problems:

free final time, intermediate constraints, first exit time, control constraints, etc

Maximum principle

SLIDE 17

Derivation via HJB

The finite horizon HJB:
If an optimal control policy, π(x, t) is given, we can set u =

π(x, t) and drop the min operator in HJB

SLIDE 18

Maximum principle

The remarkable property of the maximum principle is that it is

an ODE, even though we derived it starting from a PDE

An ODE is a consistency condition which singles out specific

trajectories without reference to neighboring trajectories

Extremal trajectories which solve the above optimization

remove the dependence on neighboring trajectories

SLIDE 19

Hamiltonian function

The maximum principle can be written in more compact and

symmetric form with the help of the Hamiltonian function

Maximum principle can be redefined as

SLIDE 20

Discrete control: Bellman equations
Continuous control: HJB equations
Maximum principle
Linear quadratic regulator (LQR)

SLIDE 21

Most optimal control problems do not have closed-form
solutions. One exception is LQR case
LQR is a class of problems which dynamic function is linear

and cost function is quadratic

dynamics:
cost rate:
final cost
R is symmetric positive definite, and Q and Qf are symmetric
A, B, R, Q can be made time-varying

Linear quadratic regulator

SLIDE 22

Optimal value function

For a LQR problem, the optimal value function is quadratic in

x and can be expressed as

We can obtain the ODE of V(t) via HJB equation

where V(t) is a symmetric matrix

SLIDE 23

Discrete LQR

LQR is defined as follows when time is discretized
dynamics
cost rate
final cost
Let n = tf /Δ, the correspondence to continuous-time problem is

SLIDE 24

Optimal value function

We derive optimal value function from Bellman equation
Again, the optimal value function is quadratic in x and changes
ver time
Plugging in Bellman equation, we obtain a recursive relation of

The optimal control law is linear in x