Optimal Control Theory The theory Optimal control theory is a - - PowerPoint PPT Presentation
Optimal Control Theory The theory Optimal control theory is a - - PowerPoint PPT Presentation
Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline which provides algorithms to solve various control problems The elaborate mathematical machinery behind optimal control models is rarely exposed
The theory
- Optimal control theory is a mature mathematical discipline
which provides algorithms to solve various control problems
- The elaborate mathematical machinery behind optimal control
models is rarely exposed to computer animation community
- Most controllers designed in practice are theoretically
suboptimal
- This lecture closely follows the excellent tutorial by Dr. Emo
Todorov (http://www.cs.washington.edu/homes/todorov/ papers/optimality_chapter.pdf)
- Discrete control: Bellman equations
- Continuous control: HJB equations
- Maximum principle
- Linear quadratic regulator (LQR)
Standard problem
- Find an action sequence (u0, u1, ..., un-1) and corresponding
state sequence (x0, x1, ..., xn-1) minimizing the total cost
- The initial state (x0) and the destination state (xn) are given
Discrete control
$250 $200 $150 $120 $500 $450 $350 $250 $150 $120 $200 $350 $300
next(x,u) cost(x,u)
Dynamic programming
- Bellman optimality principle:
- If a given state-action sequence is optimal and we remove
the first state and action, remaining sequence is also optimal
- The choice of optimal actions in the futures is independent
- f the past actions which led to the present state
- The optimal state-action sequences can be constructed by
starting at the final state and extending backwards
Optimal value function
- v(x) = “minimal total cost for completing the task starting from
state x”
- Find optimal actions:
- 1. Consider every action available at the current state
- 2. Add its immediate cost to the optimal value of the resulting
next state
- 3. Choose an action for which the sum is minimal
Optimal control policy
- A mapping from states to actions is called control policy or
control law
- Once we have a control policy, we can start at any state and
reach the destination state by following the control policy
- Optimal control policy satisfies
- Its corresponding optimal value function satisfies
Value iteration
- Bellman equations cannot be solved in a single pass if the state
transitions are cyclic
- Value iteration starts with a guess v(0) of the optimal value
function and construct a sequence of improved guesses:
- Discrete control: Bellman equations
- Continuous control: HJB equations
- Maximum principle
- Linear quadratic regulator (LQR)
Continuous control
- State space and control space are continuos
- Dynamics of the system:
- Continuous time
- Discrete time
- Objective function:
HJB equation
- HJB equation is a nonlinear PDE with respect to unknown
function v
- An optimal control π(x, t) is a value of u which achieves the
minimum in HJB equation −vt(x, t) = min
u∈U(x)(l(x, u, t) + f(x, u)T vx(x, t))
π(x, t) = arg min
u∈U(x)(l(x, u, t) + f(x, u)T vx(x, t))
Numerical solution
- Non-linear differential equations do not always have classic
solutions which satisfy them everywhere
- Numerical methods guarantee convergence, but they rely on
discretization of the state space, which grows exponentially in the state space dimension
- Nevertheless, the HJB equations have motivated a number of
methods for approximate solution
Parametric value function
- Consider an approximation to the optimal value function
- The derivative function with respect to x
- Choose a large enough set of states and evaluate the right hand
side of HJB using the approximated value function
- Adjust theta such that get closer to target values
- Discrete control: Bellman equations
- Continuous control: HJB equations
- Maximum principle
- Linear quadratic regulator (LQR)
- Maximum principle solves the optimal control for a
deterministic dynamic system with boundary conditions
- Can be derived via HJB equations or Lagrange multipliers
- Can be generalized to other types of optimal control problems:
free final time, intermediate constraints, first exit time, control constraints, etc
Maximum principle
Derivation via HJB
- The finite horizon HJB:
- If an optimal control policy, π(x, t) is given, we can set u =
π(x, t) and drop the min operator in HJB
Maximum principle
- The remarkable property of the maximum principle is that it is
an ODE, even though we derived it starting from a PDE
- An ODE is a consistency condition which singles out specific
trajectories without reference to neighboring trajectories
- Extremal trajectories which solve the above optimization
remove the dependence on neighboring trajectories
Hamiltonian function
- The maximum principle can be written in more compact and
symmetric form with the help of the Hamiltonian function
- Maximum principle can be redefined as
- Discrete control: Bellman equations
- Continuous control: HJB equations
- Maximum principle
- Linear quadratic regulator (LQR)
- Most optimal control problems do not have closed-form
- solutions. One exception is LQR case
- LQR is a class of problems which dynamic function is linear
and cost function is quadratic
- dynamics:
- cost rate:
- final cost
- R is symmetric positive definite, and Q and Qf are symmetric
- A, B, R, Q can be made time-varying
Linear quadratic regulator
Optimal value function
- For a LQR problem, the optimal value function is quadratic in
x and can be expressed as
- We can obtain the ODE of V(t) via HJB equation
where V(t) is a symmetric matrix
Discrete LQR
- LQR is defined as follows when time is discretized
- dynamics
- cost rate
- final cost
- Let n = tf /Δ, the correspondence to continuous-time problem is
Optimal value function
- We derive optimal value function from Bellman equation
- Again, the optimal value function is quadratic in x and changes
- ver time
- Plugging in Bellman equation, we obtain a recursive relation of
Vk
- The optimal control law is linear in x