最优控制 2:使用变分法求解最优控制问题

最优控制 2:使用变分法求解最优控制问题

  • 引言
  • 1. 末段时刻固定的最优问题解
    • 1.1 末端时刻固定,末端状态固定的最优控制的必要条件
    • 1.2 末段时刻固定,终端状态自由的最优控制的必要条件
    • 1.2 末段时刻固定,终端状态受约束的最优控制的必要条件
  • 2. 末段时刻自由的最优控制问题解
    • 2.1 终端时刻自由,终端状态固定 ( δ x 1 \delta x_1 δx1 没了,但是只是固定,并不是受约束)
    • 2.2 终端时刻自由,终端状态自由 ( δ x 1 \delta x_1 δx1 又回来了,但是 ψ \psi ψ 还是没有)
    • 2.3 终端时刻自由,终端状态受约束 (这回 ψ \psi ψ 回来了)

引言

上一篇博客粗略地讲了最优控制中不同情形下泛函取得极值的必要条件。但是那个所谓的“泛函”是比较抽象的,这个博客将会细化这个问题,并且将问题具象化为:如何使用变分法求解最优控制问题。即:最小化
J = φ [ x ( t 1 ) , t 1 ] + ∫ t 0 t 1 L ( x , u , t ) d t J=\varphi\left[x(t_1),t_1\right]+\int_{t_0}^{t_1}{L(x,u,t)}dt J=φ[x(t1),t1]+t0t1L(x,u,t)dt
其中, φ ( ⋅ ) \varphi(\cdot) φ()是末端状态惩罚项, u u u是控制输入,并且系统要时时刻刻满足微分方程约束 x ˙ = f ( x , t ) \dot{x}=f(x,t) x˙=f(x,t)

与上一个博客类似,这里还是分几种情况分别叙述。注意:所有的最优控制问题的初始时刻和初始状态都是已知的,这是合理且必须的 (啥时候开始、在哪开始都不知道,还玩个啥…)。

1. 末段时刻固定的最优问题解

这类问题的数学描述为:
u ∗ = arg min ⁡ u φ [ x ( t 1 ) ] + ∫ t ) t 1 L ( x , u , t ) d t s . t . x ˙ = f ( x , u , t ) , x 0 = x ( t 0 ) , ψ [ x ( t 1 ) ] = 0 \begin{align} \begin{aligned} u^* &= \argmin_{u}{\varphi\left[x(t_1)\right]+\int_{t_)}^{t_1}{L\left(x,u,t\right)}dt}\\ & s.t.\quad\dot{x}=f(x,u,t),x_0=x(t_0),\psi\left[x(t_1)\right]=0 \end{aligned} \end{align} u=uargminφ[x(t1)]+t)t1L(x,u,t)dts.t.x˙=f(x,u,t),x0=x(t0),ψ[x(t1)]=0
这实际上是一个带有等式约束的泛函极值的问题。他可以通过引入拉格朗日乘子 γ \gamma γ λ ( t ) \lambda(t) λ(t),构造广义泛函 J a J_a Ja,并定义哈密尔顿函数的方法来解决。

构造广义泛函如下:
J a = φ ( x 1 ) + γ ψ ( x 1 ) + ∫ t ) t 1 { L ( x , u , t ) + λ [ f ( x , u , t − x ˙ ) ] } d t \begin{align} \begin{aligned} J_a &= \varphi\left(x_1\right)+\gamma\psi(x_1)+\int_{t_)}^{t_1}{\left\{L\left(x,u,t\right)+\lambda\left[f(x,u,t-\dot{x})\right]\right\}}dt \end{aligned} \end{align} Ja=φ(x1)+γψ(x1)+t)t1{L(x,u,t)+λ[f(x,u,tx˙)]}dt
定义哈密尔顿函数 (Hamiltonian function) 如下:
H ( x , u , λ , t ) = L ( x , u , t ) + λ f ( x , u , t ) \begin{align} \begin{aligned} H(x,u,\lambda,t)=L(x,u,t)+\lambda f(x,u,t) \end{aligned} \end{align} H(x,u,λ,t)=L(x,u,t)+λf(x,u,t)
将哈密尔顿函数代入 (2),并进行一次分部积分有:
J a = φ ( x 1 ) + γ ψ ( x 1 ) + ∫ t 0 t 1 H ( x , u , γ , λ ) − λ x ˙ d t = φ ( x 1 ) + γ ψ ( x 1 ) − λ x ∣ t 0 t 1 + ∫ t 0 t 1 H ( x , u , γ , λ ) + λ ˙ x d t \begin{align} \begin{aligned} J_a &= \varphi\left(x_1\right)+\gamma\psi(x_1)+\int_{t_0}^{t_1}{H(x,u,\gamma,\lambda)-\lambda\dot{x}}dt\\ &=\varphi\left(x_1\right)+\gamma\psi(x_1)-\left.\lambda x\right|_{t_0}^{t_1}+\int_{t_0}^{t_1}{H(x,u,\gamma,\lambda)+\dot{\lambda}x}dt \end{aligned} \end{align} Ja=φ(x1)+γψ(x1)+t0t1H(x,u,γ,λ)λx˙dt=φ(x1)+γψ(x1)λxt0t1+t0t1H(x,u,γ,λ)+λ˙xdt
这里需要计算 J a J_a Ja 的变分,注意 J a J_a Ja 仅仅会受 δ x \delta x δx δ u \delta u δu 影响,而不会受 λ \lambda λ γ \gamma γ 影响。这里推导一次,剩下的都类似~~
J a ( x + δ x , u + δ u ) = φ ( x 1 + δ x 1 ) + γ ψ ( x 1 + δ x 1 ) − λ ( x + δ x ) ∣ t 0 t 1 + ∫ t 0 t 1 H ( x + δ x , u + δ u , γ , λ ) + λ ˙ ( x + δ x ) d t \begin{align} \begin{aligned} J_a(x+\delta x,u+\delta u) &= \varphi(x_1+\delta x_1)+\gamma\psi(x_1+\delta x_1)-\left.\lambda (x+\delta x)\right|_{t_0}^{t_1}\\ &+\int_{t_0}^{t_1}{H(x+\delta x,u+\delta u,\gamma,\lambda)+\dot{\lambda}(x+\delta x)}dt \end{aligned} \end{align} Ja(x+δx,u+δu)=φ(x1+δx1)+γψ(x1+δx1)λ(x+δx)t0t1+t0t1H(x+δx,u+δu,γ,λ)+λ˙(x+δx)dt
太长了,写不下,(5) 中第一行记为 J a 1 ( x + δ x , u + δ u ) = J a 1 ~ J_{a_1}(x+\delta x,u+\delta u)=\tilde{J_{a_1}} Ja1(x+δx,u+δu)=Ja1~,第二行记为 J a 2 ( x + δ x , u + δ u ) = J a 2 ~ J_{a_2}(x+\delta x,u+\delta u)=\tilde{J_{a_2}} Ja2(x+δx,u+δu)=Ja2~。则有
J a 1 ~ = φ ( x 1 + δ x 1 ) + γ ψ ( x 1 + δ x 1 ) − λ x ∣ t 0 t 1 − λ ( t 1 ) δ x 1 = φ ( x 1 ) + ∂ φ ∂ x ∣ x = x 1 δ x 1 + γ ψ ( x 1 ) + γ ∂ ψ ∂ x ∣ x = x 1 δ x 1 \begin{align} \begin{aligned} \tilde{J_{a_1}} &= \varphi(x_1+\delta x_1)+\gamma\psi(x_1+\delta x_1)-\left.\lambda x\right|_{t_0}^{t_1}-\lambda(t_1)\delta x_1\\ &=\varphi(x_1)+\left.\frac{\partial\varphi}{\partial x}\right|_{x=x_1}\delta x_1+\gamma\psi(x_1)+\left.\gamma\frac{\partial\psi}{\partial x}\right|_{x=x_1}\delta x_1 \end{aligned} \end{align} Ja1~=φ(x1+δx1)+γψ(x1+δx1)λxt0t1λ(t1)δx1=φ(x1)+xφ x=x1δx1+γψ(x1)+γxψ x=x1δx1
J a 2 ~ = ∫ t 0 t 1 H ( x + δ x , u + δ u , γ , λ ) + λ ˙ ( x + δ x ) d t = ∫ t 0 t 1 H ( x , u , γ , λ ) + λ ˙ x d t + ∫ t 0 t 1 ∂ H ∂ x δ x + ∂ H ∂ u δ u + λ ˙ δ x d t \begin{align} \begin{aligned} \tilde{J_{a_2}} &= \int_{t_0}^{t_1}{H(x+\delta x,u+\delta u,\gamma,\lambda)+\dot{\lambda}(x+\delta x)}dt\\ &=\int_{t_0}^{t_1}{H(x,u,\gamma,\lambda)+\dot{\lambda}x}dt+\int_{t_0}^{t_1}{\frac{\partial H}{\partial x}\delta x+\frac{\partial H}{\partial u}\delta u +\dot{\lambda}\delta x}dt \end{aligned} \end{align} Ja2~=t0t1H(x+δx,u+δu,γ,λ)+λ˙(x+δx)dt=t0t1H(x,u,γ,λ)+λ˙xdt+t0t1xHδx+uHδu+λ˙δxdt
J a J_a Ja 的变分为:
δ J a = J a 1 ~ + J a 2 ~ − J a = [ ∂ φ ∂ x + γ ∂ ψ ∂ x − λ ( t 1 ) ∣ t = t 1 , x = x 1 ] δ x 1 + ∫ t 0 t 1 [ ∂ H ∂ x + λ ˙ ] δ x + ∂ H ∂ u δ u d t \begin{align} \begin{aligned} \delta J_a&=\tilde{J_{a_1}}+\tilde{J_{a_2}}-J_a\\ &=\left[\left.\frac{\partial\varphi}{\partial x} + \gamma\frac{\partial\psi}{\partial x} - \lambda(t_1)\right|_{t=t_1,x=x_1}\right]\delta x_1\\ &+\int_{t_0}^{t_1}{\left[\frac{\partial H}{\partial x}+\dot{\lambda}\right]\delta x+\frac{\partial H}{\partial u}\delta u}dt \end{aligned} \end{align} δJa=Ja1~+Ja2~Ja=[xφ+γxψλ(t1) t=t1,x=x1]δx1+t0t1[xH+λ˙]δx+uHδudt
很自然地,(8) 若要恒等于零,那么必须有以下必要条件成立
∂ H ( x ∗ , u ∗ , λ ∗ , t ) ∂ x + λ ∗ ˙ ( t ) = 0 \begin{align} \begin{aligned} \frac{\partial H\left(x^*,u^*,\lambda^*,t\right)}{\partial x}+\dot{\lambda^*}(t)=0 \end{aligned} \end{align} xH(x,u,λ,t)+λ˙(t)=0
∂ H ( x ∗ , u ∗ , λ ∗ , t ) ∂ u = 0 \begin{align} \begin{aligned} \frac{\partial H\left(x^*,u^*,\lambda^*,t\right)}{\partial u}=0 \end{aligned} \end{align} uH(x,u,λ,t)=0
∂ φ [ x ∗ ( t ) ] ∂ x + γ ∗ ∂ ψ [ x ∗ ( t ) ] ∂ x − λ ∗ ( t ) ∣ t = t 1 = 0 \begin{align} \begin{aligned} \left.\frac{\partial\varphi\left[x^*(t)\right]}{\partial x} + \gamma^*\frac{\partial\psi\left[x^*(t)\right]}{\partial x} - \lambda^*(t)\right|_{t=t_1}=0 \end{aligned} \end{align} xφ[x(t)]+γxψ[x(t)]λ(t) t=t1=0
同时,根据 (3) 中 哈密尔顿函数的定义,有
x ˙ ∗ = f ( x ∗ , u ∗ , t ) = ∂ H ( x ∗ , u ∗ , λ ∗ , t ) ∂ λ \begin{align} \begin{aligned} \dot{x}^*=f\left(x^*,u^*,t\right)=\frac{\partial H\left(x^*,u^*,\lambda^*,t\right)}{\partial \lambda} \end{aligned} \end{align} x˙=f(x,u,t)=λH(x,u,λ,t)

1.1 末端时刻固定,末端状态固定的最优控制的必要条件

  1. 正则方程:
    λ ∗ ˙ ( t ) = − ∂ H ( x ∗ , u ∗ , λ ∗ , t ) ∂ x x ∗ ˙ ( t ) = ∂ H ( x ∗ , u ∗ , λ ∗ , t ) ∂ λ \begin{align} \begin{aligned} & \dot{\lambda^*}(t)=-\frac{\partial H\left(x^*,u^*,\lambda^*,t\right)}{\partial x}\\ & \dot{x^*}(t)=\frac{\partial H\left(x^*,u^*,\lambda^*,t\right)}{\partial \lambda} \end{aligned} \end{align} λ˙(t)=xH(x,u,λ,t)x˙(t)=λH(x,u,λ,t)
  2. 控制方程
    ∂ H ( x ∗ , u ∗ , λ ∗ , t ) ∂ u = 0 \begin{align} \begin{aligned} \frac{\partial H\left(x^*,u^*,\lambda^*,t\right)}{\partial u}=0 \end{aligned} \end{align} uH(x,u,λ,t)=0
  3. 边界条件 ($\psi 函数没了, 函数没了, 函数没了,\delta x_1$ 也没了)
    x ∗ ( t 0 ) = x 0 , x ∗ ( t 1 ) = x 1 \begin{align} \begin{aligned} x^*(t_0)=x_0,\quad x^*(t_1)=x_1 \end{aligned} \end{align} x(t0)=x0,x(t1)=x1

1.2 末段时刻固定,终端状态自由的最优控制的必要条件

  1. 正则方程:
    λ ∗ ˙ ( t ) = − ∂ H ( x ∗ , u ∗ , λ ∗ , t ) ∂ x x ∗ ˙ ( t ) = ∂ H ( x ∗ , u ∗ , λ ∗ , t ) ∂ λ \begin{align} \begin{aligned} & \dot{\lambda^*}(t)=-\frac{\partial H\left(x^*,u^*,\lambda^*,t\right)}{\partial x}\\ & \dot{x^*}(t)=\frac{\partial H\left(x^*,u^*,\lambda^*,t\right)}{\partial \lambda} \end{aligned} \end{align} λ˙(t)=xH(x,u,λ,t)x˙(t)=λH(x,u,λ,t)
  2. 控制方程
    ∂ H ( x ∗ , u ∗ , λ ∗ , t ) ∂ u = 0 \begin{align} \begin{aligned} \frac{\partial H\left(x^*,u^*,\lambda^*,t\right)}{\partial u}=0 \end{aligned} \end{align} uH(x,u,λ,t)=0
  3. 横截条件 (因为终端状态自由,所以 ψ \psi ψ 那个就没了)
    ∂ φ [ x ∗ ( t 1 ) ] ∂ x = λ ∗ ( t 1 ) x ∗ ( t 0 ) = x 0 \begin{align} \begin{aligned} &\frac{\partial\varphi\left[x^*(t_1)\right]}{\partial x} = \lambda^*(t_1)\\ &x^*(t_0)=x_0 \end{aligned} \end{align} xφ[x(t1)]=λ(t1)x(t0)=x0

1.2 末段时刻固定,终端状态受约束的最优控制的必要条件

  1. 正则方程:
    λ ∗ ˙ ( t ) = − ∂ H ( x ∗ , u ∗ , λ ∗ , t ) ∂ x x ∗ ˙ ( t ) = ∂ H ( x ∗ , u ∗ , λ ∗ , t ) ∂ λ \begin{align} \begin{aligned} & \dot{\lambda^*}(t)=-\frac{\partial H\left(x^*,u^*,\lambda^*,t\right)}{\partial x}\\ & \dot{x^*}(t)=\frac{\partial H\left(x^*,u^*,\lambda^*,t\right)}{\partial \lambda} \end{aligned} \end{align} λ˙(t)=xH(x,u,λ,t)x˙(t)=λH(x,u,λ,t)
  2. 控制方程
    ∂ H ( x ∗ , u ∗ , λ ∗ , t ) ∂ u = 0 \begin{align} \begin{aligned} \frac{\partial H\left(x^*,u^*,\lambda^*,t\right)}{\partial u}=0 \end{aligned} \end{align} uH(x,u,λ,t)=0
  3. 横截条件 (因为终端状态受 ψ \psi ψ 函数约束,所以 ψ \psi ψ 又回来了)
    ∂ φ [ x ∗ ( t 1 ) ] ∂ x + ∂ ψ [ x ∗ ( t 1 ) ] ∂ x γ ∗ = λ ∗ ( t 1 ) x ∗ ( t 0 ) = x 0 , ψ [ x ∗ ( t 1 ) ] = 0 \begin{align} \begin{aligned} &\frac{\partial\varphi\left[x^*(t_1)\right]}{\partial x}+\frac{\partial \psi\left[x^*(t_1)\right]}{\partial x}\gamma^* = \lambda^*(t_1)\\ &x^*(t_0)=x_0,\quad \psi\left[x^*(t_1)\right]=0 \end{aligned} \end{align} xφ[x(t1)]+xψ[x(t1)]γ=λ(t1)x(t0)=x0,ψ[x(t1)]=0

2. 末段时刻自由的最优控制问题解

算了,还是写一遍吧,万一以后我忘了,还能回来查。终端时刻自由比终端时刻固定多了一个终端时刻的变分 δ t 1 \delta t_1 δt1
J a ( x + δ x , u + δ u , t 1 + δ t 1 ) = φ ( x 1 + δ x 1 , t 1 + δ t 1 ) + γ ψ ( x 1 + δ x 1 , t 1 + δ t 1 ) + ∫ t 0 t 1 + δ t 1 H ( x + δ x , u + δ u , γ , λ ) − λ ( x ˙ + δ x ˙ ) d t \begin{align} \begin{aligned} J_a(x+\delta x,u+\delta u,t_1+\delta t_1)&=\varphi(x_1+\delta x_1,t_1+\delta t_1)+\gamma\psi(x_1+\delta x_1,t_1+\delta t_1)\\ &+\int_{t_0}^{t_1+\delta t_1}{H(x+\delta x,u+\delta u,\gamma,\lambda)-\lambda(\dot{x}+\delta \dot{x})}dt \end{aligned} \end{align} Ja(x+δx,u+δu,t1+δt1)=φ(x1+δx1,t1+δt1)+γψ(x1+δx1,t1+δt1)+t0t1+δt1H(x+δx,u+δu,γ,λ)λ(x˙+δx˙)dt
同理,令 (19) 中第一行记为 J a 1 ~ \tilde{J_{a_1}} Ja1~,第二行记为 J a 2 ~ \tilde{J_{a_2}} Ja2~,则有
δ J a 1 ~ = φ ( x 1 + δ x 1 , t 1 + δ t 1 ) + γ ψ ( x 1 + δ x 1 , t 1 + δ t 1 ) = ∂ φ ∂ x ∣ x = x 1 δ x 1 + ∂ φ ∂ t 1 δ t 1 + γ ∂ ψ ∂ x ∣ x = x 1 δ x 1 + γ ∂ ψ ∂ t 1 δ t 1 \begin{align} \begin{aligned} \delta\tilde{J_{a_1}} &= \varphi(x_1+\delta x_1,t_1+\delta t_1)+\gamma\psi(x_1+\delta x_1,t_1+\delta t_1) \\ &= \left.\frac{\partial\varphi}{\partial x}\right|_{x=x_1}\delta x_1+\frac{\partial\varphi}{\partial t_1}\delta t_1+\gamma\left.\frac{\partial\psi}{\partial x}\right|_{x=x_1}\delta x_1+\gamma\frac{\partial\psi}{\partial t_1}\delta t_1 \end{aligned} \end{align} δJa1~=φ(x1+δx1,t1+δt1)+γψ(x1+δx1,t1+δt1)=xφ x=x1δx1+t1φδt1+γxψ x=x1δx1+γt1ψδt1
类似地,
δ J a 2 ~ = ∫ t 0 t 1 + δ t 1 H ( x + δ x , u + δ u , γ , λ ) − λ ( x ˙ + δ x ˙ ) d t − ∫ t 0 t 1 H ( x , u , γ , λ ) + λ x ˙ d t = ∫ t 0 t 1 H ( x + δ x , u + δ u , γ , λ ) − λ ( x ˙ + δ x ˙ ) − H ( x , u , γ , λ ) − λ x ˙ d t + ∫ t 1 t 1 + δ t 1 H ( x + δ x , u + δ u , γ , λ ) − λ ( x ˙ + δ x ˙ ) d t = ∫ t 0 t 1 ∂ H ∂ x δ x + ∂ H ∂ u δ u − λ δ x ˙ d t + [ H ( x 1 + θ δ x 1 , u + θ δ u , γ , λ ) − λ x ˙ 1 ] δ t 1 = − λ δ x ( t 1 ) + ∫ t 0 t 1 ( ∂ H ∂ x + λ ˙ ) δ x + ∂ H ∂ u δ u d t + [ H ( x 1 , u , γ , λ ) − λ x ˙ 1 ] δ t 1 \begin{align} \begin{aligned} \delta\tilde{J_{a_2}} &= \int_{t_0}^{t_1+\delta t_1}{H(x+\delta x,u+\delta u,\gamma,\lambda)-\lambda(\dot{x}+\delta \dot{x})}dt\\ &-\int_{t_0}^{t_1}{H(x,u,\gamma,\lambda)+\lambda\dot{x}}dt\\ &= \int_{t_0}^{t_1}{H(x+\delta x,u+\delta u,\gamma,\lambda)-\lambda(\dot{x}+\delta \dot{x})-H(x,u,\gamma,\lambda)-\lambda\dot{x}}dt\\ &+\int_{t_1}^{t_1+\delta t_1}{H(x+\delta x,u+\delta u,\gamma,\lambda)-\lambda(\dot{x}+\delta \dot{x})}dt\\ &=\int_{t_0}^{t_1}{\frac{\partial H}{\partial x}\delta x+\frac{\partial H}{\partial u}\delta u-\lambda\delta \dot{x}}dt+\left[H(x_1+\theta\delta x_1,u+\theta\delta u,\gamma,\lambda)-\lambda\dot{x}_1\right]\delta t_1\\ &=-\lambda\delta x(t_1)+\int_{t_0}^{t_1}{\left(\frac{\partial H}{\partial x}+\dot{\lambda}\right)\delta x+\frac{\partial H}{\partial u}\delta u}dt+\left[H(x_1,u,\gamma,\lambda)-\lambda\dot{x}_1\right]\delta t_1 \end{aligned} \end{align} δJa2~=t0t1+δt1H(x+δx,u+δu,γ,λ)λ(x˙+δx˙)dtt0t1H(x,u,γ,λ)+λx˙dt=t0t1H(x+δx,u+δu,γ,λ)λ(x˙+δx˙)H(x,u,γ,λ)λx˙dt+t1t1+δt1H(x+δx,u+δu,γ,λ)λ(x˙+δx˙)dt=t0t1xHδx+uHδuλδx˙dt+[H(x1+θδx1,u+θδu,γ,λ)λx˙1]δt1=λδx(t1)+t0t1(xH+λ˙)δx+uHδudt+[H(x1,u,γ,λ)λx˙1]δt1
这里需要复习一下,上篇博客第二个图对应的近似公式:
δ x ( t 1 ) = δ x 1 − x ˙ ( t 1 ) ⋅ δ t 1 \delta x(t_1)=\delta x_1-\dot{x}(t_1)\cdot\delta t_1 δx(t1)=δx1x˙(t1)δt1
把它带入到 (21) 中,进而,
δ J a = δ J a 1 ~ + δ J a 2 ~ = ∂ φ ∂ x ∣ x = x 1 δ x 1 + ∂ φ ∂ t 1 δ t 1 + γ ∂ ψ ∂ x ∣ x = x 1 δ x 1 + γ ∂ ψ ∂ t 1 δ t 1 − λ δ x 1 + ∫ t 0 t 1 ( ∂ H ∂ x + λ ˙ ) δ x + ∂ H ∂ u δ u d t + [ H ( x 1 , u , γ , λ ) − λ x ˙ 1 ] δ t 1 = [ ∂ φ ( x 1 ) ∂ x 1 + γ ∂ ψ ( x 1 ) ∂ x 1 − λ ( t 1 ) ] δ x 1 + [ ∂ φ ( t 1 ) ∂ t 1 + γ ∂ ψ ( t 1 ) ∂ t 1 + H ( t 1 ) ] δ t 1 + ∫ t 0 t 1 ( ∂ H ∂ x + λ ˙ ) δ x + ∂ H ∂ u δ u d t \begin{align} \begin{aligned} \delta J_a &= \delta\tilde{J_{a_1}}+\delta\tilde{J_{a_2}}\\ &= \left.\frac{\partial\varphi}{\partial x}\right|_{x=x_1}\delta x_1+\frac{\partial\varphi}{\partial t_1}\delta t_1+\gamma\left.\frac{\partial\psi}{\partial x}\right|_{x=x_1}\delta x_1+\gamma\frac{\partial\psi}{\partial t_1}\delta t_1\\ &-\lambda\delta x_1+\int_{t_0}^{t_1}{\left(\frac{\partial H}{\partial x}+\dot{\lambda}\right)\delta x+\frac{\partial H}{\partial u}\delta u}dt+\left[H(x_1,u,\gamma,\lambda)-\lambda\dot{x}_1\right]\delta t_1\\ &=\left[\frac{\partial\varphi(x_1)}{\partial x_1}+\gamma\frac{\partial\psi(x_1)}{\partial x_1}-\lambda(t_1)\right]\delta x_1+\left[\frac{\partial\varphi(t_1)}{\partial t_1}+\gamma\frac{\partial\psi(t_1)}{\partial t_1}+H(t_1)\right]\delta t_1\\ &+\int_{t_0}^{t_1}{\left(\frac{\partial H}{\partial x}+\dot{\lambda}\right)\delta x+\frac{\partial H}{\partial u}\delta u}dt \end{aligned} \end{align} δJa=δJa1~+δJa2~=xφ x=x1δx1+t1φδt1+γxψ x=x1δx1+γt1ψδt1λδx1+t0t1(xH+λ˙)δx+uHδudt+[H(x1,u,γ,λ)λx˙1]δt1=[x1φ(x1)+γx1ψ(x1)λ(t1)]δx1+[t1φ(t1)+γt1ψ(t1)+H(t1)]δt1+t0t1(xH+λ˙)δx+uHδudt
与之前同理,若要实现最优控制,那么两部分变分必须都恒为零才行。下边分别讨论

2.1 终端时刻自由,终端状态固定 ( δ x 1 \delta x_1 δx1 没了,但是只是固定,并不是受约束)

  1. 正则方程
    λ ∗ ˙ ( t ) = − ∂ H ( x ∗ , u ∗ , λ ∗ , t ) ∂ x x ∗ ˙ ( t ) = ∂ H ( x ∗ , u ∗ , λ ∗ , t ) ∂ λ \begin{align} \begin{aligned} & \dot{\lambda^*}(t)=-\frac{\partial H\left(x^*,u^*,\lambda^*,t\right)}{\partial x}\\ & \dot{x^*}(t)=\frac{\partial H\left(x^*,u^*,\lambda^*,t\right)}{\partial \lambda} \end{aligned} \end{align} λ˙(t)=xH(x,u,λ,t)x˙(t)=λH(x,u,λ,t)
  2. 控制方程
    ∂ H ( x ∗ , u ∗ , λ ∗ , t ) ∂ u = 0 \begin{align} \begin{aligned} \frac{\partial H\left(x^*,u^*,\lambda^*,t\right)}{\partial u}=0 \end{aligned} \end{align} uH(x,u,λ,t)=0
  3. 边界条件
    x ∗ ( t 0 ) = x 0 , x ∗ ( t 1 ) = x 1 \begin{align} \begin{aligned} x^*(t_0)=x_0,\quad x^*(t_1)=x_1 \end{aligned} \end{align} x(t0)=x0,x(t1)=x1
  4. 哈密尔顿函数终值条件 ( ψ \psi ψ 函数没了)
    H ( t 1 ) = − ∂ φ ∂ t 1 \begin{align} \begin{aligned} H(t_1)=-\frac{\partial\varphi}{\partial t_1} \end{aligned} \end{align} H(t1)=t1φ

2.2 终端时刻自由,终端状态自由 ( δ x 1 \delta x_1 δx1 又回来了,但是 ψ \psi ψ 还是没有)

  1. 正则方程
    λ ∗ ˙ ( t ) = − ∂ H ( x ∗ , u ∗ , λ ∗ , t ) ∂ x x ∗ ˙ ( t ) = ∂ H ( x ∗ , u ∗ , λ ∗ , t ) ∂ λ \begin{align} \begin{aligned} & \dot{\lambda^*}(t)=-\frac{\partial H\left(x^*,u^*,\lambda^*,t\right)}{\partial x}\\ & \dot{x^*}(t)=\frac{\partial H\left(x^*,u^*,\lambda^*,t\right)}{\partial \lambda} \end{aligned} \end{align} λ˙(t)=xH(x,u,λ,t)x˙(t)=λH(x,u,λ,t)
  2. 控制方程
    ∂ H ( x ∗ , u ∗ , λ ∗ , t ) ∂ u = 0 \begin{align} \begin{aligned} \frac{\partial H\left(x^*,u^*,\lambda^*,t\right)}{\partial u}=0 \end{aligned} \end{align} uH(x,u,λ,t)=0
  3. 横截条件
    x ∗ ( t 0 ) = x 0 , λ ∗ ( t 1 ) = ∂ φ ∂ x ∗ ( t 1 ) \begin{align} \begin{aligned} x^*(t_0)=x_0,\quad \lambda^*(t_1)=\frac{\partial \varphi}{\partial x^*(t_1)} \end{aligned} \end{align} x(t0)=x0,λ(t1)=x(t1)φ
  4. 哈密尔顿函数终值条件 ( ψ \psi ψ 函数没了)
    H ( t 1 ) = − ∂ φ ∂ t 1 \begin{align} \begin{aligned} H(t_1)=-\frac{\partial\varphi}{\partial t_1} \end{aligned} \end{align} H(t1)=t1φ

2.3 终端时刻自由,终端状态受约束 (这回 ψ \psi ψ 回来了)

  1. 正则方程
    λ ∗ ˙ ( t ) = − ∂ H ( x ∗ , u ∗ , λ ∗ , t ) ∂ x x ∗ ˙ ( t ) = ∂ H ( x ∗ , u ∗ , λ ∗ , t ) ∂ λ \begin{align} \begin{aligned} & \dot{\lambda^*}(t)=-\frac{\partial H\left(x^*,u^*,\lambda^*,t\right)}{\partial x}\\ & \dot{x^*}(t)=\frac{\partial H\left(x^*,u^*,\lambda^*,t\right)}{\partial \lambda} \end{aligned} \end{align} λ˙(t)=xH(x,u,λ,t)x˙(t)=λH(x,u,λ,t)
  2. 控制方程
    ∂ H ( x ∗ , u ∗ , λ ∗ , t ) ∂ u = 0 \begin{align} \begin{aligned} \frac{\partial H\left(x^*,u^*,\lambda^*,t\right)}{\partial u}=0 \end{aligned} \end{align} uH(x,u,λ,t)=0
  3. 横截条件
    x ∗ ( t 0 ) = x 0 , ψ [ x ∗ ( t 1 ) , t 1 ] = 0 , λ ( t 1 ) = ∂ φ ∂ x ∗ ( t 1 ) + γ ∂ ψ ∂ x ∗ ( t 1 ) \begin{align} \begin{aligned} x^*(t_0)=x_0,\quad \psi\left[x^*(t_1),t_1\right]=0,\quad \lambda(t_1)=\frac{\partial\varphi}{\partial x^*(t_1)}+\gamma\frac{\partial\psi}{\partial x^*(t_1)} \end{aligned} \end{align} x(t0)=x0,ψ[x(t1),t1]=0,λ(t1)=x(t1)φ+γx(t1)ψ
  4. 哈密尔顿函数终值条件 ( ψ \psi ψ 函数回来了)
    H ( t 1 ) = − ∂ φ ∂ t 1 − γ ∂ ψ ∂ t 1 \begin{align} \begin{aligned} H(t_1)=-\frac{\partial\varphi}{\partial t_1}-\gamma\frac{\partial \psi}{\partial t_1} \end{aligned} \end{align} H(t1)=t1φγt1ψ

至此,本文分析了以下六种情况下的最优控制的必要条件:

  1. 末端时刻固定,末端状态固定
  2. 末端时刻固定,末端状态自由
  3. 末端时刻固定,末端状态受约束
  4. 末端时刻自由,末端状态固定
  5. 末端时刻自由,末端状态自由
  6. 末端时刻自由,末端状态受约束

结合上一篇博客,通过变分法求解最优控制必要条件的内容,基本结束。
哪里推导错的,欢迎批评指正~

你可能感兴趣的:(RL,数学基础知识,控制)