[CMU16-745] Lecture 6 Deterministic Optimal Control Introduction

[CMU16-745] Lecture 6 Deterministic Optimal Control Introduction_第1张图片
Source: CMU 16-745 Study Notes, taught by Prof. Zac Manchester

Lecture 5 Optimization Part 3

Content

  • Review
    • Constrained Optimization
  • Deterministic Optimal Control Introduction
    • Deterministic Optimal Control
      • (1) Continuous-Time Formulation
      • (2) Discrete-Time Formulation
    • Pontryagin’s Minimum Principle
      • Some Notes:
  • References

Review

Constrained Optimization

  • Augmented Lagrangian
  • Merit Functions/Line Search

Deterministic Optimal Control Introduction

Deterministic Optimal Control

(1) Continuous-Time Formulation

min ⁡ x ( t ) , u ( t ) J ( x ( t ) , u ( t ) ) = ∫ t 0 t f ℓ ( x ( t ) , u ( t ) )   d t + ℓ F ( x ( t f ) ) s.t. x ˙ ( t ) = f ( x ( t ) , u ( t ) ) \min_{x(t), u(t)} J(x(t), u(t)) = \int_{t_0}^{t_f} \ell(x(t), u(t)) \, dt + \ell_F(x(t_f)) \quad \text{s.t.} \quad \dot{x}(t) = f(x(t), u(t)) x(t),u(t)minJ(x(t),u(t))=t0tf(x(t),u(t))dt+F(x(tf))s.t.x˙(t)=f(x(t),u(t))

  • State: x ( t ) x(t) x(t)
  • Control input: u ( t ) u(t) u(t)
  • Cost function: J ( x ( t ) , u ( t ) ) J(x(t), u(t)) J(x(t),u(t))
  • Stage cost: ℓ ( t , x ( t ) , u ( t ) ) \ell(t, x(t), u(t)) (t,x(t),u(t))
  • Terminal cost: ℓ F ( x ( t f ) ) \ell_F(x(t_f)) F(x(tf))
  • Dynamics constraints: f ( x ( t ) , u ( t ) ) f(x(t), u(t)) f(x(t),u(t))

This is an infinite-dimensional optimization problem in the following sense:
u ( t ) = lim ⁡ N → ∞ u 1 : N u(t) = \lim_{N \to \infty} u_{1:N} u(t)=Nlimu1:N

Solutions are open-loop trajectories.
We will focus on the discrete-time setting which leads to tractable algorithms.

(2) Discrete-Time Formulation

min ⁡ x 1 : N , u 1 : N − 1 J ( x 1 : N , u 1 : N − 1 ) = ∑ k = 1 N − 1 ℓ ( x k , u k ) + ℓ F ( x N ) s.t. x k + 1 = f ( x k , u k ) \min_{x_{1:N}, u_{1:N-1}} J(x_{1:N}, u_{1:N-1}) = \sum_{k=1}^{N-1} \ell(x_k, u_k) + \ell_F(x_N) \quad \text{s.t.} \quad x_{k+1} = f(x_k, u_k) x1:N,u1:N1minJ(x1:N,u1:N1)=k=1N1(xk,uk)+F(xN)s.t.xk+1=f(xk,uk)

  • Torque limits: u min ⁡ ≤ u k ≤ u max ⁡ u_{\min} \leq u_k \leq u_{\max} uminukumax
  • Obstacle/safety constraints: c ( x k ) ≤ 0 c(x_k) \leq 0 c(xk)0

This is a finite-dimensional optimization problem.

  • Knot points: Samples x k x_k xk, u k u_k uk
  • Continuous to Discrete: Use integration (e.g., Runge-Kutta)
  • Discrete to Continuous: Use interpolation (e.g., cubic splines)

Pontryagin’s Minimum Principle

Also known as the maximum principle if you maximize a reward.
First-order necessary conditions for deterministic optimal control problems.
We can form the Lagrangian:
L = ∑ k = 1 N − 1 [ ℓ ( x k , u k ) + λ k + 1 ⊤ ( f ( x k , u k ) − x k + 1 ) ] + ℓ F ( x N ) \mathcal{L} = \sum_{k=1}^{N-1} \left[ \ell(x_k, u_k) + \lambda_{k+1}^\top (f(x_k, u_k) - x_{k+1}) \right] + \ell_F(x_N) L=k=1N1[(xk,uk)+λk+1(f(xk,uk)xk+1)]+F(xN)

The Hamiltonian:
H ( x , u , λ ) = ℓ ( x , u ) + λ ⊤ f ( x , u ) \mathcal{H}(x, u, \lambda) = \ell(x, u) + \lambda^\top f(x, u) H(x,u,λ)=(x,u)+λf(x,u)

Plugging into L \mathcal{L} L:
L = H ( x 1 , u 1 , λ 2 ) + [ ∑ k = 2 N − 1 H ( x k , u k , λ k + 1 ) − λ k ⊤ x k ] + ℓ F ( x N ) − λ N ⊤ x N \mathcal{L} = \mathcal{H}(x_1, u_1, \lambda_2) + \left[ \sum_{k=2}^{N-1} \mathcal{H}(x_k, u_k, \lambda_{k+1}) - \lambda_k^\top x_k \right] + \ell_F(x_N) - \lambda_N^\top x_N L=H(x1,u1,λ2)+[k=2N1H(xk,uk,λk+1)λkxk]+F(xN)λNxN

Taking derivatives with respect to x k x_k xk and λ k \lambda_k λk:

  1. For λ k \lambda_k λk:
    ∂ L ∂ λ k = ∂ H ∂ λ k − x k = f ( x k − 1 , u k − 1 ) − x k = 0 \frac{\partial \mathcal{L}}{\partial \lambda_k} = \frac{\partial \mathcal{H}}{\partial \lambda_k} - x_k = f(x_{k-1}, u_{k-1}) - x_k = 0 λkL=λkHxk=f(xk1,uk1)xk=0

  2. For x k x_k xk:
    ∂ L ∂ x k = ∂ H ∂ x k − λ k = ∂ ℓ ∂ x k + λ k + 1 ⊤ ∂ f ∂ x k − λ k ⊤ = 0 \frac{\partial \mathcal{L}}{\partial x_k} = \frac{\partial \mathcal{H}}{\partial x_k} - \lambda_k = \frac{\partial \ell}{\partial x_k} + \lambda_{k+1}^\top \frac{\partial f}{\partial x_k} - \lambda_k^\top = 0 xkL=xkHλk=xk+λk+1xkfλk=0

For the N N N-th state:
∂ L ∂ x N = ∂ ℓ F ∂ x N − λ N ⊤ = 0 \frac{\partial \mathcal{L}}{\partial x_N} = \frac{\partial \ell_F}{\partial x_N} - \lambda_N^\top = 0 xNL=xNFλN=0

For u k u_k uk:
u k = arg ⁡ min ⁡ u   H ( x k , u   , λ k + 1 ) s.t. u   ∈ U u_k = \arg \min_{u~} \mathcal{H}(x_k, u~, \lambda_{k+1}) \quad \text{s.t.} \quad u~ \in U uk=argu minH(xk,u ,λk+1)s.t.u U

Summary

  • x k + 1 = ∇ λ H ( x k , u k , λ k + 1 ) = f ( x k , u k ) x_{k+1} = \nabla_\lambda \mathcal{H}(x_k, u_k, \lambda_{k+1}) = f(x_k, u_k) xk+1=λH(xk,uk,λk+1)=f(xk,uk)
  • λ k = ∇ x H ( x k , u k , λ k + 1 ) = ∇ x ℓ ( x k , u k ) + ( ∂ f ∂ x k ) ⊤ λ k + 1 \lambda_k = \nabla_x \mathcal{H}(x_k, u_k, \lambda_{k+1}) = \nabla_x \ell(x_k, u_k) + \left(\frac{\partial f}{\partial x_k}\right)^\top \lambda_{k+1} λk=xH(xk,uk,λk+1)=x(xk,uk)+(xkf)λk+1
  • u k = arg ⁡ min ⁡ u   H ( x k , u   , λ k + 1 ) s.t. u   ∈ U u_k = \arg \min_{u~} \mathcal{H}(x_k, u~, \lambda_{k+1}) \quad \text{s.t.} \quad u~ \in U uk=argminu H(xk,u ,λk+1)s.t.u U
  • λ N = ∂ ℓ F ∂ x N \lambda_N = \frac{\partial \ell_F}{\partial x_N} λN=xNF

This is where the shooting method comes from.

Continuous-time version

x ˙ ( t ) = ∇ λ H ( x ( t ) , u ( t ) , λ ( t ) ) = f ( x ( t ) , u ( t ) ) \dot{x}(t) = \nabla_\lambda H(x(t), u(t), \lambda(t)) = f(x(t), u(t)) x˙(t)=λH(x(t),u(t),λ(t))=f(x(t),u(t)) − λ ˙ ( t ) = ∇ x H ( x ( t ) , u ( t ) , λ ( t ) ) = ∇ x ℓ ( x ( t ) , u ( t ) ) + ( ∂ f ∂ x ) ⊤ λ ( t ) -\dot{\lambda}(t) = \nabla_x H(x(t), u(t), \lambda(t)) = \nabla_x \ell(x(t), u(t)) + \left( \frac{\partial f}{\partial x} \right)^\top \lambda(t) λ˙(t)=xH(x(t),u(t),λ(t))=x(x(t),u(t))+(xf)λ(t) u ( t ) = arg ⁡ min ⁡ u   H ( x ( t ) , u   , λ ( t ) ) s.t. u   ∈ U u(t) = \arg \min_{u~} H(x(t), u~, \lambda(t)) \quad \text{s.t.} \quad u~ \in U u(t)=argu minH(x(t),u ,λ(t))s.t.u U λ ( t f ) = ∂ ℓ F ∂ x ( t f ) \lambda(t_f) = \frac{\partial \ell_F}{\partial x}(t_f) λ(tf)=xF(tf)

Some Notes:

  • Historically, many algorithms were based on integrating the continuous ODEs, forward/backward to do gradient descent on u ( t ) u(t) u(t).
  • These methods are called indirect methods or shooting methods.
  • In continuous-time, λ ( t ) \lambda(t) λ(t) is called the co-state trajectory.
  • These methods have largely fallen out of favor as computers have improved.

References

[1] Lecture 6 确定性最优控制(Deterministic Optimal Control
[2] 【Optimal Control (CMU 16-745)】Lecture 6 Deterministic Optimal Control Introduction
[3] CMU Optimal Control 16-745 Video From Bilibili
[4] CMU Optimal Control 16-745

感谢大家阅读这篇学习笔记!记录学习的内容,如果有问题或者不准确的地方,欢迎随时联系我改正,与大家共同进步!

你可能感兴趣的:(最优控制,机器人)