A Series of Lectures on Approximate Dynamic Programming
Dimitri P . Bertsekas
Laboratory for Information and Decision Systems Massachusetts Institute of Technology
Lucca, Italy June 2017
Bertsekas (M.I.T.) Approximate Dynamic Programming 1 / 24
A Series of Lectures on Approximate Dynamic Programming Dimitri P - - PowerPoint PPT Presentation
A Series of Lectures on Approximate Dynamic Programming Dimitri P . Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology Lucca, Italy June 2017 Bertsekas (M.I.T.) Approximate Dynamic Programming 1
Laboratory for Information and Decision Systems Massachusetts Institute of Technology
Bertsekas (M.I.T.) Approximate Dynamic Programming 1 / 24
Bertsekas (M.I.T.) Approximate Dynamic Programming 2 / 24
Bertsekas (M.I.T.) Approximate Dynamic Programming 3 / 24
Bertsekas (M.I.T.) Approximate Dynamic Programming 4 / 24
Bertsekas (M.I.T.) Approximate Dynamic Programming 5 / 24
1
2
3
4
Bertsekas (M.I.T.) Approximate Dynamic Programming 6 / 24
N−1
Bertsekas (M.I.T.) Approximate Dynamic Programming 8 / 24
System xk+1 = fk(xk, uk, wk) uk = µk(xk) xk wk µk
N−1
π Jπ(x0)
Bertsekas (M.I.T.) Approximate Dynamic Programming 9 / 24
NQxN + N−1
kQxk + u′ kRuk
Bertsekas (M.I.T.) Approximate Dynamic Programming 11 / 24
e A CDA C AB AC CA CD ABC ACB ACD CAB CAD
1 Initial al State
3 5 3 5 3 5 3 5 3 5 2 4 6 2 2 4 6 2 2 4 6 2 4 6 2 4 6 2 4 6 2 4 6
8 3 8 3 9 6 1 1 1 2
Bertsekas (M.I.T.) Approximate Dynamic Programming 12 / 24
1 Initial al State
A Stage 2 Subproblem
Bertsekas (M.I.T.) Approximate Dynamic Programming 13 / 24
1 Initial al State
A Stage 1 Subproblem
Bertsekas (M.I.T.) Approximate Dynamic Programming 14 / 24
1 Initial al State
Stage 0 Subproblem
Bertsekas (M.I.T.) Approximate Dynamic Programming 15 / 24
0, µ∗ 1, . . . , µ∗ N−1} be an optimal policy
N−1
k, µ∗ k+1, . . . , µ∗ N−1} of the optimal policy
Bertsekas (M.I.T.) Approximate Dynamic Programming 16 / 24
uk ∈Uk (xk ) Ewk
k(xk) minimize in the right side above for each xk and k. Then the policy
0, . . . , µ∗ N−1} is optimal
Bertsekas (M.I.T.) Approximate Dynamic Programming 18 / 24
◮ Discretization needed ◮ Exponential growth of the computation with the dimensions of the state and control
Bertsekas (M.I.T.) Approximate Dynamic Programming 19 / 24
uk ∈Uk (xk ) E
Bertsekas (M.I.T.) Approximate Dynamic Programming 21 / 24
k , us k), s = 1, . . . , q, such that for each
k is a “good" control at state xs k
rk q
k − ˜
k , rk)
k , rk) is an “approximation architecture"
Bertsekas (M.I.T.) Approximate Dynamic Programming 22 / 24
Bertsekas (M.I.T.) Approximate Dynamic Programming 23 / 24