摘要 本文主要参考《统计信号处理基础》,一些性质后面标了页码。从联合正态分布的一些性质出发推导标量和向量形式的卡尔曼滤波,以及重点推导了均值不为0的各个公式。下面的一些性质出于方便考虑给起了名字,不一定是正式的学术命名。
离散系统
x ( k ) = A x ( k − 1 ) + B u ( k − 1 ) + v ( k − 1 ) y ( k ) = C x ( k ) + W ( k ) \begin{aligned} & x(k)=Ax(k-1)+Bu(k-1)+v(k-1) \\ & y(k)=Cx(k)+W(k) \end{aligned} x(k)=Ax(k−1)+Bu(k−1)+v(k−1)y(k)=Cx(k)+W(k)
其中 V , W V,W V,W为零均值高斯白噪声的协方差矩阵。
卡尔曼滤波器递推公式如下
x ^ ( k ∣ k − 1 ) = A x ^ ( k − 1 ) + B u ( k − 1 ) x ^ ( k ) = x ^ ( k ∣ k − 1 ) + K ( k ) [ y ( k ) − C x ^ ( k ∣ k − 1 ) ] P ( k ∣ k − 1 ) = A P ( k − 1 ) A T + V P ( k ) = [ I − K ( k ) C ] P ( k ∣ k − 1 ) K ( k ) = P ( k ∣ k − 1 ) C T [ C P ( k ∣ k − 1 ) C T + W ] − 1 \begin{aligned} & \hat{x}(k|k-1)=A\hat{x}(k-1)+Bu(k-1) \\ & \hat{x}(k)=\hat{x}(k|k-1)+K(k)[y(k)-C\hat{x}(k|k-1)] \\ & P(k|k-1)=AP(k-1)A^{\text{T}}+V \\ & P(k)=[I-K(k)C]P(k|k-1) \\ & K(k)=P(k|k-1)C^{\text{T}}[CP(k|k-1)C^{\text{T}}+W]^{-1} \\ \end{aligned} x^(k∣k−1)=Ax^(k−1)+Bu(k−1)x^(k)=x^(k∣k−1)+K(k)[y(k)−Cx^(k∣k−1)]P(k∣k−1)=AP(k−1)AT+VP(k)=[I−K(k)C]P(k∣k−1)K(k)=P(k∣k−1)CT[CP(k∣k−1)CT+W]−1
贝叶斯最小均方误差估计量(Bmse)
观测到 x x x 后 θ \theta θ 的最小均方误差估计量 θ ^ \hat{\theta} θ^ 为([p255])
θ ^ = E ( θ ∣ x ) \hat{\theta}=\text{E}(\theta|x) θ^=E(θ∣x)
证明的主要思路是求 θ ^ \hat{\theta} θ^ 使得最小均方误差最小。证明:
Bmse ( θ ^ ) = E [ ( θ − θ ^ ) 2 ] = ∬ ( θ − θ ^ ) 2 p ( x , θ ) d x d θ = ∫ [ ∫ ( θ − θ ^ ) 2 p ( θ ∣ x ) d θ ] p ( x ) d x J = ∫ ( θ − θ ^ ) 2 p ( θ ∣ x ) d θ ∂ J ∂ θ ^ = − 2 ∫ θ p ( θ ∣ x ) d θ + 2 θ ^ ∫ p ( θ ∣ x ) d θ = 0 θ ^ = 2 ∫ θ p ( θ ∣ x ) d θ 2 ∫ p ( θ ∣ x ) d θ = E ( θ ∣ x ) \begin{aligned} \text{Bmse}(\hat{\theta}) &= \text{E}[(\theta-\hat{\theta})^2] \\ &= \iint(\theta-\hat{\theta})^2p(x,\theta)\text{d}x\text{d}\theta \\ &=\int\left[\int(\theta-\hat{\theta})^2p(\theta|x)\text{d}\theta\right] p(x)\text{d}x \\ J &= \int(\theta-\hat{\theta})^2p(\theta|x)\text{d}\theta \\ \frac{\partial J}{\partial\hat{\theta}} &= -2\int\theta p(\theta|x)\text{d}\theta +2\hat{\theta}\int p(\theta|x)\text{d}\theta=0 \\ \hat{\theta} &= \frac{2\displaystyle\int\theta p(\theta|x)\text{d}\theta} {2\displaystyle\int p(\theta|x)\text{d}\theta}=\text{E}(\theta|x) \\ \end{aligned} Bmse(θ^)J∂θ^∂Jθ^=E[(θ−θ^)2]=∬(θ−θ^)2p(x,θ)dxdθ=∫[∫(θ−θ^)2p(θ∣x)dθ]p(x)dx=∫(θ−θ^)2p(θ∣x)dθ=−2∫θp(θ∣x)dθ+2θ^∫p(θ∣x)dθ=0=2∫p(θ∣x)dθ2∫θp(θ∣x)dθ=E(θ∣x)
零均值应用定理(标量形式)
设 x x x和 y y y为联合正态分布的随机变量,则([p264])
E ( y ∣ x ) = E y + Cov ( x , y ) D x ( x − E x ) D ( y ∣ x ) = D y − Cov 2 ( x , y ) D x \begin{aligned} & \text{E}(y|x)=\text{E}y+\frac{\text{Cov}(x,y)}{\text{D}x}(x-\text{E}x) \\ & \text{D}(y|x)=\text{D}y-\frac{\text{Cov}^2(x,y)}{\text{D}x} \\ \end{aligned} E(y∣x)=Ey+DxCov(x,y)(x−Ex)D(y∣x)=Dy−DxCov2(x,y)
两式可另外写作
y ^ − E y D y = ρ x − E x D x D ( y ∣ x ) = D y ( 1 − ρ 2 ) \begin{aligned} & \frac{\hat{y}-\text{E}y}{\sqrt{\text{D}y}} =\rho\frac{x-\text{E}x}{\sqrt{\text{D}x}} \\ & \text{D}(y|x)=\text{D}y(1-\rho^2) \\ \end{aligned} Dyy^−Ey=ρDxx−ExD(y∣x)=Dy(1−ρ2)
证明:
y ^ = a x + b J = E ( y − y ^ ) 2 = E [ y 2 − 2 y ( a x + b ) + ( a x + b ) 2 ] = y 2 − 2 E x y ⋅ a − 2 b E y + E x 2 ⋅ a 2 + 2 E x ⋅ a b + b 2 d J = ( − 2 E x y + 2 a E x 2 + 2 b E x ) d a + ( − 2 E y + 2 a E x + 2 b ) d b ∂ J ∂ b = − 2 E y + 2 a E x + 2 b = 0 b = E y − a E x ∂ J ∂ a = − 2 E x y + 2 a E x 2 + 2 b E x = 0 a E x 2 = E x y − b E x = E x y − E x E y + a ( E x ) 2 a = Cov ( x , y ) D x y ^ = a x + b = a x + E y − a E x = E y + Cov ( x , y ) D x ( x − E x ) \begin{aligned} \hat{y} &= ax+b \\ J &= \text{E}(y-\hat{y})^2 \\ &= \text{E}[y^2-2y(ax+b)+(ax+b)^2] \\ &= y^2-2\text{E}xy\cdot a-2b\text{E}y +\text{E}x^2\cdot a^2+2\text{E}x\cdot ab+b^2 \\ \text{d}J &= (-2\text{E}xy+2a\text{E}x^2+2b\text{E}x)\text{d}a +(-2\text{E}y+2a\text{E}x+2b)\text{d}b \\ \frac{\partial J}{\partial b} &= -2\text{E}y+2a\text{E}x+2b = 0 \\ b &= \text{E}y-a\text{E}x \\ \frac{\partial J}{\partial a} &= -2\text{E}xy+2a\text{E}x^2+2b\text{E}x = 0 \\ a\text{E}x^2 &= \text{E}xy-b\text{E}x = \text{E}xy-\text{E}x\text{E}y+a(\text{E}x)^2 \\ a &= \frac{\text{Cov}(x,y)}{\text{D}x} \\ \hat{y} &= ax+b = ax+\text{E}y-a\text{E}x = \text{E}y+\frac{\text{Cov}(x,y)}{\text{D}x}(x-\text{E}x) \\ \end{aligned} y^JdJ∂b∂Jb∂a∂JaEx2ay^=ax+b=E(y−y^)2=E[y2−2y(ax+b)+(ax+b)2]=y2−2Exy⋅a−2bEy+Ex2⋅a2+2Ex⋅ab+b2=(−2Exy+2aEx2+2bEx)da+(−2Ey+2aEx+2b)db=−2Ey+2aEx+2b=0=Ey−aEx=−2Exy+2aEx2+2bEx=0=Exy−bEx=Exy−ExEy+a(Ex)2=DxCov(x,y)=ax+b=ax+Ey−aEx=Ey+DxCov(x,y)(x−Ex)
该定理只对包括正态分布在内的满足线性关系的随机变量有效(具体什么地方满足线性暂时没搞清楚)。例如,对两个联合均匀分布
f ( x , y ) = 2 , { 0 < x < 1 , 0 < y < x } f ( x , y ) = 3 , { 0 < x < 1 , x 2 < y < x } \begin{aligned} & f(x,y)=2,\{0
第一个成立,第二个由于非线性的存在而不成立,也就是说 y y y在数据 x x x下的线性贝叶斯估计量不是最佳估计量,线性贝叶斯估计量和最佳贝叶斯估计量分别为
y ^ = E y + Cov ( x , y ) D x ( x − E x ) = 133 x + 9 153 y ^ = E ( y ∣ x ) = ∫ ∞ ∞ y f ( y ∣ x ) d y = x − x 4 2 ( x − x 2 ) \begin{aligned} & \hat{y}=\text{E}y+\frac{\text{Cov}(x,y)}{\text{D}x}(x-\text{E}x) =\frac{133x+9}{153} \\ & \hat{y}=\text{E}(y|x)=\int_\infty^\infty yf(y|x)\text{d}y=\frac{x-x^4}{2(\sqrt x-x^2)} \\ \end{aligned} y^=Ey+DxCov(x,y)(x−Ex)=153133x+9y^=E(y∣x)=∫∞∞yf(y∣x)dy=2(x−x2)x−x4
下面是最佳贝叶斯估计量(条件期望)的详细推导
f ( x ) = ∫ a ( x ) b ( x ) f ( x , y ) d y = ∫ x 2 x 3 d y = 3 ( x − x 2 ) , { x ∈ ( 0 , 1 ) } f ( y ∣ x ) = f ( x , y ) f ( x ) = 3 , { x ∈ ( 0 , 1 ) , y ∈ ( x 2 , x ) } 3 ( x − x 2 ) , { x ∈ ( 0 , 1 ) } = 1 x − x 2 , { y ∈ ( x 2 , x ) } E ( y ∣ x ) = ∫ a ( x ) b ( x ) y f ( y ∣ x ) d y = ∫ x 2 x y x − x 2 d y = [ y 2 2 ( x − x 2 ) ] x 2 x = x − x 4 2 ( x − x 2 ) \begin{aligned} & f(x)=\int_{a(x)}^{b(x)} f(x,y)\text{d}y =\int_{x^2}^{\sqrt x}3\text{d}y=3(\sqrt x-x^2),\quad\{x\in(0,1)\} \\ & f(y|x)=\frac{f(x,y)}{f(x)} =\frac{3,\quad\{x\in(0,1),y\in(x^2,\sqrt x)\}}{3(\sqrt x-x^2),\quad\{x\in(0,1)\}} =\frac{1}{\sqrt x-x^2},\quad\{y\in(x^2,\sqrt x)\} \\ & \text{E}(y|x)=\int_{a(x)}^{b(x)} yf(y|x)\text{d}y =\int_{x^2}^{\sqrt x}\frac{y}{\sqrt x-x^2}\text{d}y =\left[\frac{y^2}{2(\sqrt x-x^2)}\right]_{x^2}^{\sqrt x} =\frac{x-x^4}{2(\sqrt x-x^2)} \end{aligned} f(x)=∫a(x)b(x)f(x,y)dy=∫x2x3dy=3(x−x2),{x∈(0,1)}f(y∣x)=f(x)f(x,y)=3(x−x2),{x∈(0,1)}3,{x∈(0,1),y∈(x2,x)}=x−x21,{y∈(x2,x)}E(y∣x)=∫a(x)b(x)yf(y∣x)dy=∫x2xx−x2ydy=[2(x−x2)y2]x2x=2(x−x2)x−x4
零均值应用定理(向量形式)
x \boldsymbol{x} x和 y \boldsymbol{y} y为联合正态分布的随机向量, x \boldsymbol{x} x是m×1, y \boldsymbol{y} y是n×1,分块协方差矩阵
C = [ C x x C x y C y x C y y ] \mathbf{C}=\left[\begin{matrix} \mathbf{C}_{xx} & \mathbf{C}_{xy} \\ \mathbf{C}_{yx} & \mathbf{C}_{yy} \end{matrix}\right] C=[CxxCyxCxyCyy]
则
E ( y ∣ x ) = E ( y ) + C y x C x x − 1 ( x − E ( x ) ) \text{E}(\boldsymbol{y}|\boldsymbol{x})=\text{E}(\boldsymbol{y}) +\mathbf{C}_{yx}\mathbf{C}_{xx}^{-1}(\boldsymbol{x}-\text{E}(\boldsymbol{x})) E(y∣x)=E(y)+CyxCxx−1(x−E(x))
其中 C x y C_{xy} Cxy表示 Cov ( x , y ) \text{Cov}(x,y) Cov(x,y)。证明:
(其中省略的步骤见下文)
y ^ = A x + B J = E ( y − y ^ ) ⊤ ( y − y ^ ) = E ( y ⊤ y − 2 y ⊤ y ^ + y ^ ⊤ y ^ ) K = y ⊤ y − 2 y ⊤ y ^ + y ^ ⊤ y ^ d K = d ( − 2 y ⊤ ( A x + B ) + ( A x + B ) ⊤ ( A x + B ) ) = − 2 y ⊤ ( d A x + d B ) + d ( x ⊤ A ⊤ A x + 2 B ⊤ A x + B ⊤ B ) = − 2 x y ⊤ d A − 2 y ⊤ d B + 2 x x ⊤ A ⊤ d A + 2 x ⊤ A ⊤ d B + 2 x B ⊤ d A + 2 B ⊤ d B = ( − 2 x y ⊤ + 2 x x ⊤ A ⊤ + 2 x B ⊤ ) d A + ( − 2 y ⊤ + 2 x ⊤ A ⊤ + 2 B ⊤ ) d B ∂ J ∂ B = − 2 y + 2 A x + 2 B = 0 B = E y − A E x ∂ J ∂ A = − 2 y x ⊤ + 2 A x x ⊤ + 2 B x ⊤ = 0 E ( A x x ⊤ ) = E ( y x ⊤ − B x ⊤ ) = E ( y x ⊤ − ( E y − A E x ) x ⊤ ) = E y x ⊤ − E y E x ⊤ + A E x E x ⊤ A = ( E y x ⊤ − E y E x ⊤ ) ( E ( x x ⊤ ) − E x E x ⊤ ) − 1 = C y x C x x − 1 y ^ = C y x C x x − 1 x + E y − C y x C x x − 1 E x = E y + C y x C x x − 1 ( x − E x ) \begin{aligned} \hat{y} &= Ax+B \\ J &= \text{E}(y-\hat{y})^\top(y-\hat{y}) \\ &= \text{E}(y^\top y-2y^\top\hat{y}+\hat{y}^\top\hat{y}) \\ K &= y^\top y-2y^\top\hat{y}+\hat{y}^\top\hat{y} \\ \text{d}K &= \text{d}(-2y^\top(Ax+B)+(Ax+B)^\top(Ax+B)) \\ &= -2y^\top(\text{d}Ax+\text{d}B) +\text{d}(x^\top A^\top Ax+2B^\top Ax+B^\top B) \\ &= -2xy^\top\text{d}A-2y^\top\text{d}B+2xx^\top A^\top\text{d}A \\ &+ 2x^\top A^\top\text{d}B+2xB^\top\text{d}A+2B^\top\text{d}B \\ &= (-2xy^\top+2xx^\top A^\top+2xB^\top)\text{d}A +(-2y^\top+2x^\top A^\top+2B^\top)\text{d}B \\ \frac{\partial J}{\partial B} &= -2y+2Ax+2B =0\\ B &= \text{E}y-A\text{E}x \\ \frac{\partial J}{\partial A} &= -2yx^\top+2Axx^\top+2Bx^\top =0\\ \text{E}(Axx^\top) &= \text{E}(yx^\top-Bx^\top) \\ &= \text{E}(yx^\top-(\text{E}y-A\text{E}x)x^\top) \\ &= \text{E}yx^\top-\text{E}y\text{E}x^\top+A\text{E}x\text{E}x^\top \\ A &= (\text{E}yx^\top-\text{E}y\text{E}x^\top) (\text{E}(xx^\top)-\text{E}x\text{E}x^\top)^{-1} =C_{yx}C_{xx}^{-1} \\ \hat{y} &= C_{yx}C_{xx}^{-1}x+\text{E}y-C_{yx}C_{xx}^{-1}\text{E}x \\ &= \text{E}y+C_{yx}C_{xx}^{-1}(x-\text{E}x) \\ \end{aligned} y^JKdK∂B∂JB∂A∂JE(Axx⊤)Ay^=Ax+B=E(y−y^)⊤(y−y^)=E(y⊤y−2y⊤y^+y^⊤y^)=y⊤y−2y⊤y^+y^⊤y^=d(−2y⊤(Ax+B)+(Ax+B)⊤(Ax+B))=−2y⊤(dAx+dB)+d(x⊤A⊤Ax+2B⊤Ax+B⊤B)=−2xy⊤dA−2y⊤dB+2xx⊤A⊤dA+2x⊤A⊤dB+2xB⊤dA+2B⊤dB=(−2xy⊤+2xx⊤A⊤+2xB⊤)dA+(−2y⊤+2x⊤A⊤+2B⊤)dB=−2y+2Ax+2B=0=Ey−AEx=−2yx⊤+2Axx⊤+2Bx⊤=0=E(yx⊤−Bx⊤)=E(yx⊤−(Ey−AEx)x⊤)=Eyx⊤−EyEx⊤+AExEx⊤=(Eyx⊤−EyEx⊤)(E(xx⊤)−ExEx⊤)−1=CyxCxx−1=CyxCxx−1x+Ey−CyxCxx−1Ex=Ey+CyxCxx−1(x−Ex)
其中矩阵求导的部分见 矩阵求导术(上),其中的推导过程省略了迹的符号 tr \text{tr} tr,注意分辨。另外因为3个符号 d \text{d} d、 E \text{E} E、 tr \text{tr} tr均为线性算符,因此可交换计算顺序,推导中也省略了 E \text{E} E。
上面推导中用到的一些矩阵微分与求迹公式详细推导如下:
d ( B ⊤ B ) = tr ( d B ⊤ B + B ⊤ d B ) = tr ( d B ⊤ B ) + tr ( B ⊤ d B ) = tr ( B ⊤ d B ) + tr ( B ⊤ d B ) = tr ( 2 B ⊤ d B ) d ( x ⊤ A ⊤ A x ) = tr ( x ⊤ d ( A ⊤ A ) x ) = tr ( x ⊤ 2 A ⊤ d A x ) = tr ( 2 x x ⊤ A ⊤ d A ) d ( B ⊤ A x ) = tr ( d B ⊤ A x + B ⊤ d A x ) = tr ( x ⊤ A ⊤ d B + x B ⊤ d A ) \begin{aligned} \text{d}(B^\top B) &= \text{tr}(\text{d}B^\top B+B^\top\text{d}B) \\ &= \text{tr}(\text{d}B^\top B)+\text{tr}(B^\top\text{d}B) \\ &= \text{tr}(B^\top\text{d}B)+\text{tr}(B^\top\text{d}B) \\ &= \text{tr}(2B^\top\text{d}B) \\ \text{d}(x^\top A^\top Ax) &= \text{tr}(x^\top\text{d}(A^\top A)x) \\ &= \text{tr}(x^\top2A^\top\text{d}Ax) \\ &= \text{tr}(2xx^\top A^\top\text{d}A) \\ \text{d}(B^\top Ax) &= \text{tr}(\text{d}B^\top Ax+B^\top\text{d}Ax) \\ &= \text{tr}(x^\top A^\top\text{d}B+xB^\top\text{d}A) \\ \end{aligned} d(B⊤B)d(x⊤A⊤Ax)d(B⊤Ax)=tr(dB⊤B+B⊤dB)=tr(dB⊤B)+tr(B⊤dB)=tr(B⊤dB)+tr(B⊤dB)=tr(2B⊤dB)=tr(x⊤d(A⊤A)x)=tr(x⊤2A⊤dAx)=tr(2xx⊤A⊤dA)=tr(dB⊤Ax+B⊤dAx)=tr(x⊤A⊤dB+xB⊤dA)
投影定理(正交原理)
当利用数据样本的线性组合来估计一个随机变量的时候,当估计值与真实值的误差和每一个数据样本正交时,该估计值是最佳估计量,即数据样本 x x x与最佳估计量 y ^ \hat{y} y^满足
E [ ( y − y ^ ) x ⊤ ( n ) ] = 0 n = 0 , 1 , ⋯ , N − 1 \text{E}[(y-\hat{y})x^\top(n)]=0\quad n=0,1,\cdots,N-1 E[(y−y^)x⊤(n)]=0n=0,1,⋯,N−1
零均值的随机变量满足内积空间中的性质。定义变量的长度 ∣ ∣ x ∣ ∣ = E x 2 ||x||=\sqrt{\text{E}x^2} ∣∣x∣∣=Ex2,变量 x x x和 y y y的内积 ( x , y ) (x,y) (x,y)定义为 E ( x y ) \text{E}(xy) E(xy),两个变量的夹角定义为相关系数 ρ \rho ρ。当 E ( x y ) = 0 \text{E}(xy)=0 E(xy)=0时称变量 x x x和 y y y正交。
均值不为零时,定义变量的长度 ∣ ∣ x ∣ ∣ = D x ||x||=\sqrt{\text{D}x} ∣∣x∣∣=Dx,变量 x x x和 y y y的内积 ( x , y ) (x,y) (x,y)定义为 Cov ( x y ) \text{Cov}(xy) Cov(xy),两个变量的夹角定义为相关系数 ρ \rho ρ。均值不为零的情况是我自己的猜测,很多资料都没有详细说明,但卡尔曼滤波的推导里全是均值非零的。
将 x x x和 y y y对应成数据的形式,即 x ( 0 ) x(0) x(0)是 x x x, x ( 1 ) x(1) x(1)是 y y y, x ^ ( 1 ∣ 0 ) \hat{x}(1|0) x^(1∣0)是 y ^ \hat{y} y^,得到
E [ ( x ( 1 ) − x ^ ( 1 ∣ 0 ) ) ⊤ x ( 0 ) ] = 0 \text{E}[(x(1)-\hat{x}(1|0))^\top x(0)]=0 E[(x(1)−x^(1∣0))⊤x(0)]=0
其中
x ~ ( k ∣ k − 1 ) = x ( k ) − x ^ ( k ∣ k − 1 ) \widetilde{x}(k|k-1) = x(k)-\hat{x}(k|k-1) x (k∣k−1)=x(k)−x^(k∣k−1)
称为新息(innovation),与旧数据 x ( 0 ) x(0) x(0)正交。
标量形式证明:
E [ x ( y ^ − y ) ] = E [ x E y + Cov ( x , y ) D x ( x 2 − x E x ) − x y ] = E x E y + Cov ( x , y ) D x ( E x 2 − ( E x ) 2 ) − E x y = Cov ( x , y ) + E x E y − E x y = 0 \begin{aligned} \text{E}[x(\hat{y}-y)] &= \text{E}[x\text{E}y +\frac{\text{Cov}(x,y)}{\text{D}x}(x^2-x\text{E}x)-xy] \\ &= \text{E}x\text{E}y +\frac{\text{Cov}(x,y)}{\text{D}x}(\text{E}x^2-(\text{E}x)^2)-\text{E}xy \\ &= \text{Cov}(x,y)+\text{E}x\text{E}y-\text{E}xy=0 \end{aligned} E[x(y^−y)]=E[xEy+DxCov(x,y)(x2−xEx)−xy]=ExEy+DxCov(x,y)(Ex2−(Ex)2)−Exy=Cov(x,y)+ExEy−Exy=0
向量形式证明:
E [ ( y ^ − y ) x ⊤ ] = E [ E y x ⊤ + C y x C x x − 1 ( x − E x ) x ⊤ − y x ⊤ ] = E y E x ⊤ + E [ C y x C x x − 1 x x ⊤ ] − C y x C x x − 1 E x E x ⊤ − E y x ⊤ = − C y x + C y x C x x − 1 ( E x x ⊤ − E x E x ⊤ ) = − C y x + C y x C x x − 1 C x x = 0 \begin{aligned} \text{E}[(\hat{y}-y)x^\top] &= \text{E}[\text{E}yx^\top+C_{yx}C_{xx}^{-1}(x-\text{E}x)x^\top-yx^\top] \\ &= \text{E}y\text{E}x^\top+\text{E}[C_{yx}C_{xx}^{-1}xx^\top] -C_{yx}C_{xx}^{-1}\text{E}x\text{E}x^\top-\text{E}yx^\top \\ &= -C_{yx}+C_{yx}C_{xx}^{-1}(\text{E}xx^\top-\text{E}x\text{E}x^\top) \\ &= -C_{yx}+C_{yx}C_{xx}^{-1}C_{xx} \\ &= 0 \end{aligned} E[(y^−y)x⊤]=E[Eyx⊤+CyxCxx−1(x−Ex)x⊤−yx⊤]=EyEx⊤+E[CyxCxx−1xx⊤]−CyxCxx−1ExEx⊤−Eyx⊤=−Cyx+CyxCxx−1(Exx⊤−ExEx⊤)=−Cyx+CyxCxx−1Cxx=0
由于 E ( y − y ^ ) = 0 \text{E}(y-\hat{y})=0 E(y−y^)=0,所以均值非零时正交条件也恰好成立。由图可得投影定理的另一个公式
E [ ( y − y ^ ) y ^ ] = 0 \text{E}[(y-\hat{y})\hat{y}]=0 E[(y−y^)y^]=0
证明:
E [ y ^ ( y ^ − y ) ] = E [ ( E y + k x − k E x ) ( E y + k x − k E x − y ) ] = ( E y ) 2 + k E x E y − k E x E y − ( E y ) 2 + k E x E y + k 2 E x 2 − k 2 ( E x ) 2 − k E x y − k E x E y − k 2 ( E x ) 2 + k 2 ( E x ) 2 + k E x E y = k 2 D x − k Cov ( x , y ) = [ Cov ( x , y ) ] 2 [ D x ] 2 D x − Cov ( x , y ) D x Cov ( x , y ) = 0 \begin{aligned} & \text{E}[\hat{y}(\hat{y}-y)] \\ =& \text{E}[(\text{E} y+k x-k \text{E} x)(\text{E} y+k x-k \text{E} x-y)] \\ =& (\text{E}y)^{2}+k\text{E}x\text{E}y-k\text{E}x\text{E}y-(\text{E}y)^{2}\\ &+ k\text{E}x\text{E}y+k^{2}Ex^2-k^{2}(\text{E}x)^{2}-k\text{E}xy \\ &- k\text{E}x\text{E}y-k^{2}(\text{E}x)^{2}+k^{2}(\text{E}x)^{2}+k\text{E}x\text{E}y \\ =& k^{2}\text{D}x-k\text{Cov}(x,y) \\ =& \frac{[\text{Cov}(x,y)]^2}{[\text{D}x]^2}\text{D}x-\frac{\text{Cov}(x,y)}{\text{D}x}\text{Cov}(x,y) \\ =&0 \end{aligned} =====E[y^(y^−y)]E[(Ey+kx−kEx)(Ey+kx−kEx−y)](Ey)2+kExEy−kExEy−(Ey)2+kExEy+k2Ex2−k2(Ex)2−kExy−kExEy−k2(Ex)2+k2(Ex)2+kExEyk2Dx−kCov(x,y)[Dx]2[Cov(x,y)]2Dx−DxCov(x,y)Cov(x,y)0
期望可加性
E [ y 1 + y 2 ∣ x ] = E [ y 1 ∣ x ] + E [ y 2 ∣ x ] \text{E}[y_1+y_2|x]=\text{E}[y_1|x]+\text{E}[y_2|x] E[y1+y2∣x]=E[y1∣x]+E[y2∣x]
独立条件可加性
若 x 1 x_1 x1和 x 2 x_2 x2独立,则
E [ y ∣ x 1 , x 2 ] = E [ y ∣ x 1 ] + E [ y ∣ x 2 ] − E y \text{E}[y|x_1,x_2]=\text{E}[y|x_1]+\text{E}[y|x_2]-\text{E}y E[y∣x1,x2]=E[y∣x1]+E[y∣x2]−Ey
证明:
令 x = [ x 1 ⊤ , x 2 ⊤ ] ⊤ x=[x_1^\top,x_2^\top]^\top x=[x1⊤,x2⊤]⊤,则
C x x − 1 = [ C x 1 x 1 C x 1 x 2 C x 2 x 1 C x 2 x 2 ] − 1 = [ C x 1 x 1 − 1 O O C x 2 x 2 − 1 ] C y x = [ C y x 1 C y x 2 ] E ( y ∣ x ) = E y + C y x C x x − 1 ( x − E x ) = E y + [ C y x 1 C y x 2 ] [ C x 1 x 1 − 1 O O C x 2 x 2 − 1 ] [ x 1 − E x 1 x 2 − E x 2 ] = E [ y ∣ x 1 ] + E [ y ∣ x 2 ] − E y \begin{aligned} C_{xx}^{-1} &= \left[\begin{matrix} C_{x_1x_1} & C_{x_1x_2} \\ C_{x_2x_1} & C_{x_2x_2} \end{matrix}\right]^{-1} = \left[\begin{matrix} C_{x_1x_1}^{-1} & O \\ O & C_{x_2x_2}^{-1} \end{matrix}\right] \\ C_{yx} &= \left[\begin{matrix} C_{yx_1} & C_{yx_2} \end{matrix}\right] \\ \text{E}(y|x) &= \text{E}y+C_{yx}C_{xx}^{-1}(x-\text{E}x) \\ &= \text{E}y+\left[\begin{matrix} C_{yx_1} & C_{yx_2} \end{matrix}\right] \left[\begin{matrix} C_{x_1x_1}^{-1} & O \\ O & C_{x_2x_2}^{-1} \end{matrix}\right] \left[\begin{matrix} x_1-\text{E}x_1 \\ x_2-\text{E}x_2 \end{matrix}\right] \\ &= \text{E}[y|x_1]+\text{E}[y|x_2]-\text{E}y \end{aligned} Cxx−1CyxE(y∣x)=[Cx1x1Cx2x1Cx1x2Cx2x2]−1=[Cx1x1−1OOCx2x2−1]=[Cyx1Cyx2]=Ey+CyxCxx−1(x−Ex)=Ey+[Cyx1Cyx2][Cx1x1−1OOCx2x2−1][x1−Ex1x2−Ex2]=E[y∣x1]+E[y∣x2]−Ey
非独立条件可加性(新息定理)
若 x 1 x_1 x1和 x 2 x_2 x2不独立,则根据投影定理取 x 2 x_2 x2与 x 1 x_1 x1独立的分量 x ~ 2 \widetilde{x}_2 x 2,满足
E [ y ∣ x 1 , x 2 ] = E [ y ∣ x 1 , x ~ 2 ] = E [ y ∣ x 1 ] + E [ y ∣ x ~ 2 ] − E y \text{E}[y|x_1,x_2] =\text{E}[y|x_1,\widetilde{x}_2] =\text{E}[y|x_1]+\text{E}[y|\widetilde{x}_2]-\text{E}y E[y∣x1,x2]=E[y∣x1,x 2]=E[y∣x1]+E[y∣x 2]−Ey
其中 x ~ 2 = x 2 − x ^ 2 = x 2 − E ( x 2 ∣ x 1 ) \widetilde{x}_2=x_2-\hat{x}_2=x_2-\text{E}(x_2|x_1) x 2=x2−x^2=x2−E(x2∣x1),由投影定理, x 1 x_1 x1与 x ~ 2 \widetilde{x}_2 x 2独立, x ~ 2 \widetilde{x}_2 x 2称为新息。
证明:
(下面的每个式子是先求部分后求整体,为便于理解可以从下往上看)
(标量情况下, C x 1 x 2 = C x 2 x 1 C_{x_1x_2}=C_{x_2x_1} Cx1x2=Cx2x1)
Cov ( y , x ^ 2 ) = E [ y ( E x 2 + Cov ( x 2 , x 1 ) D x 1 ( x 1 − E x 1 ) ) ] − E y E x ^ 2 = E y E x 2 + Cov ( x 2 , x 1 ) D x 1 ( E x 1 y − E x 1 E y ) − E y E x 2 = C x 1 x 2 C y x 1 D x 1 Cov ( x 2 , x ^ 2 ) = E x 2 x ^ 2 − E x 2 E x ^ 2 = E [ x 2 ( E x 2 + Cov ( x 2 , x 1 ) D x 1 ( x 1 − E x 1 ) ) ] − ( E x 2 ) 2 = Cov ( x 2 , x 1 ) D x 1 ( E x 2 x 1 − E x 2 E x 1 ) = C x 1 x 2 2 D x 1 D x ^ 2 = D ( E x 2 + Cov ( x 2 , x 1 ) D x 1 ( x 1 − E x 1 ) ) = C x 1 x 2 2 ( D x 1 ) 2 D x 1 = C x 1 x 2 2 D x 1 D x ~ 2 = D x 2 + D x ^ 2 − 2 Cov ( x 2 , x ^ 2 ) = D x 2 − C x 1 x 2 2 D x 1 E [ y ∣ x 1 ] + E [ y ∣ x ~ 2 ] − E y = E y + C x 1 y D x 1 ( x 1 − E x 1 ) + Cov ( y , x ~ 2 ) D x ~ 2 ( x ~ 2 − E x ~ 2 ) = OMIT + Cov ( y , x 2 ) − Cov ( y , x ^ 2 ) D x ~ 2 ( x 2 − x ^ 2 ) = OMIT + C y x 2 − C x 1 x 2 C y x 1 D x 1 D x 2 − C x 1 x 2 2 D x 1 ( x 2 − E 2 − Cov ( x 2 , x 1 ) D x 1 ( x 1 − E 1 ) ) = OMIT + C y x 2 D x 1 − C x 1 x 2 C y x 1 D x 1 D x 2 − C x 1 x 2 2 ( x 2 − E 2 − C x 1 x 2 D x 1 ( x 1 − E 1 ) ) = E y + A ( x 1 − E x 1 ) + B ( x 2 − E 2 ) \begin{aligned} \text{Cov}(y,\hat{x}_2) &= \text{E}[y\left(\text{E}x_2 +\frac{\text{Cov}(x_2,x_1)}{\text{D}x_1}(x_1-\text{E}x_1)\right)] -\text{E}y\text{E}\hat{x}_2 \\ &= \text{E}y\text{E}x_2+\frac{\text{Cov}(x_2,x_1)}{\text{D}x_1} (\text{E}x_1y-\text{E}x_1\text{E}y)-\text{E}y\text{E}x_2 \\ &= \frac{C_{x_1x_2}C_{yx_1}}{\text{D}x_1} \\ \text{Cov}(x_2,\hat{x}_2) &= \text{E}x_2\hat{x}_2-\text{E}x_2\text{E}\hat{x}_2 \\ &= \text{E}[x_2(\text{E}x_2+\frac{\text{Cov}(x_2,x_1)}{\text{D}x_1} (x_1-\text{E}x_1))]-(\text{E}x_2)^2 \\ &= \frac{\text{Cov}(x_2,x_1)}{\text{D}x_1}(\text{E}x_2x_1-\text{E}x_2\text{E}x_1) \\ &= \frac{C_{x_1x_2}^2}{\text{D}x_1} \\ \text{D}\hat{x}_2 &= \text{D}(\text{E}x_2+\frac{\text{Cov}(x_2,x_1)}{\text{D}x_1} (x_1-\text{E}x_1)) \\ &= \frac{C_{x_1x_2}^2}{(\text{D}x_1)^2}\text{D}x_1 =\frac{C_{x_1x_2}^2}{\text{D}x_1} \\ \text{D}\widetilde{x}_2 &= \text{D}x_2+\text{D}\hat{x}_2 -2\text{Cov}(x_2,\hat{x}_2) \\ &= \text{D}x_2-\frac{C_{x_1x_2}^2}{\text{D}x_1} \\ \text{E}[y|x_1]+\text{E}[y|\widetilde{x}_2]-\text{E}y &= \text{E}y +\frac{C_{x_1y}}{\text{D}x_1}(x_1-\text{E}x_1) +\frac{\text{Cov}(y,\widetilde{x}_2)}{\text{D}\widetilde{x}_2} (\widetilde{x}_2-\text{E}\widetilde{x}_2) \\ &= \text{OMIT}+\frac{\text{Cov}(y,x_2)-\text{Cov}(y,\hat{x}_2)} {\text{D}\widetilde{x}_2}(x_2-\hat{x}_2) \\ &= \text{OMIT}+\frac{C_{yx_2}-\displaystyle\frac{C_{x_1x_2}C_{yx_1}}{\text{D}x_1}} {\text{D}x_2-\displaystyle\frac{C_{x_1x_2}^2}{\text{D}x_1}}(x_2-\text{E}_2 -\frac{\text{Cov}(x_2,x_1)}{\text{D}x_1}(x_1-\text{E}_1)) \\ &= \text{OMIT}+\frac{C_{yx_2}\text{D}x_1-C_{x_1x_2}C_{yx_1}} {\text{D}x_1\text{D}x_2-C_{x_1x_2}^2}(x_2-\text{E}_2 -\frac{C_{x_1x_2}}{\text{D}x_1}(x_1-\text{E}_1)) \\ &= \text{E}y+A(x_1-\text{E}x_1) +B(x_2-\text{E}_2) \\ \end{aligned} Cov(y,x^2)Cov(x2,x^2)Dx^2Dx 2E[y∣x1]+E[y∣x 2]−Ey=E[y(Ex2+Dx1Cov(x2,x1)(x1−Ex1))]−EyEx^2=EyEx2+Dx1Cov(x2,x1)(Ex1y−Ex1Ey)−EyEx2=Dx1Cx1x2Cyx1=Ex2x^2−Ex2Ex^2=E[x2(Ex2+Dx1Cov(x2,x1)(x1−Ex1))]−(Ex2)2=Dx1Cov(x2,x1)(Ex2x1−Ex2Ex1)=Dx1Cx1x22=D(Ex2+Dx1Cov(x2,x1)(x1−Ex1))=(Dx1)2Cx1x22Dx1=Dx1Cx1x22=Dx2+Dx^2−2Cov(x2,x^2)=Dx2−Dx1Cx1x22=Ey+Dx1Cx1y(x1−Ex1)+Dx 2Cov(y,x 2)(x 2−Ex 2)=OMIT+Dx 2Cov(y,x2)−Cov(y,x^2)(x2−x^2)=OMIT+Dx2−Dx1Cx1x22Cyx2−Dx1Cx1x2Cyx1(x2−E2−Dx1Cov(x2,x1)(x1−E1))=OMIT+Dx1Dx2−Cx1x22Cyx2Dx1−Cx1x2Cyx1(x2−E2−Dx1Cx1x2(x1−E1))=Ey+A(x1−Ex1)+B(x2−E2)
其中 OMIT \text{OMIT} OMIT用于代替式
E y + C x 1 y D x 1 ( x 1 − E x 1 ) \text{E}y+\frac{C_{x_1y}}{\text{D}x_1}(x_1-\text{E}x_1) Ey+Dx1Cx1y(x1−Ex1)
以及
B = C y x 2 D x 1 − C x 1 x 2 C y x 1 D x 1 D x 2 − C x 1 x 2 2 A = C x 1 y D x 1 − B C x 1 x 2 D x 1 = C x 1 y D x 1 ( D x 1 D x 2 − C x 1 x 2 2 ) − C x 1 x 2 D x 1 ( C y x 2 D x 1 − C x 1 x 2 C y x 1 ) D x 1 D x 2 − C x 1 x 2 2 = C x 1 y D x 2 − C x 1 x 2 C y x 2 D x 1 D x 2 − C x 1 x 2 2 \begin{aligned} B &= \frac{C_{yx_2}\text{D}x_1-C_{x_1x_2}C_{yx_1}} {\text{D}x_1\text{D}x_2-C_{x_1x_2}^2} \\ A &= \frac{C_{x_1y}}{\text{D}x_1}-B\frac{C_{x_1x_2}}{\text{D}x_1} \\ &= \frac{\displaystyle\frac{C_{x_1y}}{\text{D}x_1} (\text{D}x_1\text{D}x_2-C_{x_1x_2}^2) -\displaystyle\frac{C_{x_1x_2}}{\text{D}x_1} (C_{yx_2}\text{D}x_1-C_{x_1x_2}C_{yx_1})} {\text{D}x_1\text{D}x_2-C_{x_1x_2}^2} \\ &= \frac{C_{x_1y}\text{D}x_2-C_{x_1x_2}C_{yx_2}} {\text{D}x_1\text{D}x_2-C_{x_1x_2}^2} \\ \end{aligned} BA=Dx1Dx2−Cx1x22Cyx2Dx1−Cx1x2Cyx1=Dx1Cx1y−BDx1Cx1x2=Dx1Dx2−Cx1x22Dx1Cx1y(Dx1Dx2−Cx1x22)−Dx1Cx1x2(Cyx2Dx1−Cx1x2Cyx1)=Dx1Dx2−Cx1x22Cx1yDx2−Cx1x2Cyx2
与另一个式子
E ( y ∣ x 1 , x 2 ) = E y + [ C y x 1 C y x 2 ] [ D x 1 C x 1 x 2 C x 2 x 1 D x 2 ] − 1 [ x 1 − E x 1 x 2 − E x 2 ] = E y + [ C y x 1 C y x 2 ] D x 1 D x 2 − C x 1 x 2 2 [ D x 2 − C x 1 x 2 − C x 1 x 2 D x 1 ] [ x 1 − E x 1 x 2 − E x 2 ] \begin{aligned} \text{E}(y|x_1,x_2) &= \text{E}y+\left[ \begin{matrix} C_{yx_1} & C_{yx_2} \end{matrix}\right] \left[\begin{matrix} \text{D}x_1 & C_{x_1x_2} \\ C_{x_2x_1} & \text{D}x_2 \end{matrix}\right]^{-1} \left[\begin{matrix} x_1-\text{E}x_1 \\ x_2-\text{E}x_2 \end{matrix}\right] \\ &= \text{E}y+\frac{\left[ \begin{matrix} C_{yx_1} & C_{yx_2} \end{matrix}\right]} {\text{D}x_1\text{D}x_2-C^2_{x_1x_2}} \left[\begin{matrix} \text{D}x_2 & -C_{x_1x_2} \\ -C_{x_1x_2} & \text{D}x_1 \end{matrix}\right] \left[\begin{matrix} x_1-\text{E}x_1 \\ x_2-\text{E}x_2 \end{matrix}\right] \\ \end{aligned} E(y∣x1,x2)=Ey+[Cyx1Cyx2][Dx1Cx2x1Cx1x2Dx2]−1[x1−Ex1x2−Ex2]=Ey+Dx1Dx2−Cx1x22[Cyx1Cyx2][Dx2−Cx1x2−Cx1x2Dx1][x1−Ex1x2−Ex2]
中对应的 A A A和 B B B相等。
这个例子可以作为铺垫,有助于理解卡尔曼滤波各个公式的来源,比如 x ( k ) x(k) x(k)和 x ( k − 1 ) x(k-1) x(k−1)之间为什么还要有一个 x ^ ( k ∣ k − 1 ) \hat{x}(k|k-1) x^(k∣k−1)等。考虑模型
x ( k ) = A + w ( k ) x(k)=A+w(k) x(k)=A+w(k)
其中 A A A是待估计参数, w ( k ) w(k) w(k)是均值为0、方差为 σ 2 \sigma^2 σ2的高斯白噪声, x ( k ) x(k) x(k)是观测。这里需要注意的是, A A A也是一个先验随机变量,也就是说在测量 A A A之前,预先猜测比方说 A A A应该在10左右,大概率不超过7~13的范围,因此假设 A ∼ N ( 10 , 1 ) A\sim N(10,1) A∼N(10,1),然后开始测量。
一开始可以得到 x ^ ( 0 ) = x ( 0 ) \hat{x}(0)=x(0) x^(0)=x(0),然后根据 x ( 0 ) x(0) x(0)和 x ( 1 ) x(1) x(1)预测 k = 1 k=1 k=1时刻的值 E [ x ( 1 ) ∣ x ( 1 ) , x ( 0 ) ] \text{E}[x(1)|x(1),x(0)] E[x(1)∣x(1),x(0)]时,需要用到联合正态分布的条件可加性,但由于 x ( 1 ) x(1) x(1)和 x ( 0 ) x(0) x(0)不独立,需要使用投影定理计算出两个独立的变量 x ( 0 ) x(0) x(0)与 x ~ ( 1 ∣ 0 ) \widetilde{x}(1|0) x (1∣0),进而计算 x ^ ( 1 ) \hat{x}(1) x^(1),即
x ^ ( 1 ) = E [ x ( 1 ) ∣ x ( 1 ) , x ( 0 ) ] = E [ x ( 1 ) ∣ x ( 0 ) , x ~ ( 1 ∣ 0 ) ] = E [ x ( 1 ) ∣ x ( 0 ) ] + E [ x ( 1 ) ∣ x ~ ( 1 ∣ 0 ) ] − E x ( 1 ) \begin{aligned} \hat{x}(1) &= \text{E}[x(1)|x(1),x(0)] \\ &= \text{E}[x(1)|x(0),\widetilde{x}(1|0)] \\ &= \text{E}[x(1)|x(0)]+\text{E}[x(1)|\widetilde{x}(1|0)]-\text{E}x(1) \end{aligned} x^(1)=E[x(1)∣x(1),x(0)]=E[x(1)∣x(0),x (1∣0)]=E[x(1)∣x(0)]+E[x(1)∣x (1∣0)]−Ex(1)
其中 E [ x ( 1 ) ∣ x ( 0 ) ] = x ^ ( 1 ∣ 0 ) \text{E}[x(1)|x(0)]=\hat{x}(1|0) E[x(1)∣x(0)]=x^(1∣0),
x ^ ( 1 ∣ 0 ) = E x ( 1 ) + Cov ( x ( 1 ) , x ( 0 ) ) D x ( 0 ) ( x ( 0 ) − E x ( 0 ) ) = A + E ( A + w ( 1 ) ) ( A + w ( 0 ) ) − E ( A + w ( 1 ) ) E ( A + w ( 0 ) ) E ( A + w ( 1 ) ) 2 − [ E ( A + w ( 1 ) ) ] 2 ( x ( 0 ) − A ) = A \begin{aligned} \hat{x}(1|0) &= \text{E}x(1)+\frac{\text{Cov}(x(1),x(0))} {\text{D}x(0)}(x(0)-\text{E}x(0)) \\ &= A+\frac{\text{E}(A+w(1))(A+w(0))-\text{E}(A+w(1))\text{E}(A+w(0))} {\text{E}(A+w(1))^2-[\text{E}(A+w(1))]^2}(x(0)-A) \\ &= A \end{aligned} x^(1∣0)=Ex(1)+Dx(0)Cov(x(1),x(0))(x(0)−Ex(0))=A+E(A+w(1))2−[E(A+w(1))]2E(A+w(1))(A+w(0))−E(A+w(1))E(A+w(0))(x(0)−A)=A
此时式中就出现了 x ^ ( k ∣ k − 1 ) \hat{x}(k|k-1) x^(k∣k−1),和另一个未知式 E [ x ( 1 ) ∣ x ~ ( 1 ∣ 0 ) ] \text{E}[x(1)|\widetilde{x}(1|0)] E[x(1)∣x (1∣0)]。由零均值应用定理,
E [ x ( 1 ) ∣ x ~ ( 1 ∣ 0 ) ] = E x ( 0 ) + Cov ( x ( 1 ) , x ~ ( 1 ∣ 0 ) ) D x ~ ( 1 ∣ 0 ) ( x ~ ( 1 ∣ 0 ) − E x ~ ( 1 ∣ 0 ) ) \text{E}[x(1)|\widetilde{x}(1|0)] = \text{E}x(0)+\frac{\text{Cov}(x(1),\widetilde{x}(1|0))} {\text{D}\widetilde{x}(1|0)}(\widetilde{x}(1|0)-\text{E}\widetilde{x}(1|0)) E[x(1)∣x (1∣0)]=Ex(0)+Dx (1∣0)Cov(x(1),x (1∣0))(x (1∣0)−Ex (1∣0))
其中
x ~ ( 1 ∣ 0 ) = x ( 1 ) − x ^ ( 1 ∣ 0 ) E x ~ ( 1 ∣ 0 ) = E ( x − x ^ ) = 0 D x ~ ( 1 ∣ 0 ) = E ( x ( 1 ) − x ^ ( 1 ∣ 0 ) ) 2 = E ( A + w ( 1 ) ) 2 = a 2 P ( 0 ) + σ 2 = P ( 1 ∣ 0 ) \begin{aligned} \widetilde{x}(1|0) &= x(1)-\hat{x}(1|0) \\ \text{E}\widetilde{x}(1|0) &= \text{E}(x-\hat{x}) = 0 \\ \text{D}\widetilde{x}(1|0) &= \text{E}(x(1)-\hat{x}(1|0))^2 \\ &= \text{E}(A+w(1))^2 \\ &= a^2P(0)+\sigma^2 \\ &= P(1|0) \end{aligned} x (1∣0)Ex (1∣0)Dx (1∣0)=x(1)−x^(1∣0)=E(x−x^)=0=E(x(1)−x^(1∣0))2=E(A+w(1))2=a2P(0)+σ2=P(1∣0)
标量形式
x ^ ( k ∣ k − 1 ) = E [ x ( k ) ∣ Y ( k − 1 ) ] = E [ A x ( k − 1 ) + B u ( k − 1 ) + v ( k − 1 ) ∣ Y ( k − 1 ) ] = A E [ x ( k − 1 ) ∣ Y ( k − 1 ) ] + B E [ u ( k − 1 ) ∣ Y ( k − 1 ) ] + E [ v ( k − 1 ) ∣ Y ( k − 1 ) ] = A x ^ ( k − 1 ) + B u ( k − 1 ) x ~ ( k ∣ k − 1 ) = x ( k ) − x ^ ( k ∣ k − 1 ) = [ A x ( k − 1 ) + B u ( k − 1 ) + v ( k − 1 ) ] − [ A x ^ ( k − 1 ) + B u ( k − 1 ) ] = A x ~ ( k − 1 ) + v ( k − 1 ) y ^ ( k ∣ k − 1 ) = E [ y ( k ) ∣ Y ( k − 1 ) ] = E [ C x ( k ) + w ( k − 1 ) ∣ Y ( k − 1 ) ] = C x ^ ( k ∣ k − 1 ) \begin{aligned} \hat{x}(k|k-1) &= \text{E}[x(k)|Y(k-1)] \\ &= \text{E}[Ax(k-1)+Bu(k-1)+v(k-1)|Y(k-1)] \\ &= A\text{E}[x(k-1)|Y(k-1)]+B\text{E}[u(k-1)|Y(k-1)]+\text{E}[v(k-1)|Y(k-1)] \\ &= A\hat{x}(k-1)+Bu(k-1) \\ \widetilde{x}(k|k-1) &= x(k)-\hat{x}(k|k-1) \\ &= [Ax(k-1)+Bu(k-1)+v(k-1)]-[A\hat{x}(k-1)+Bu(k-1)] \\ &= A\widetilde{x}(k-1)+v(k-1) \\ \hat{y}(k|k-1) &= \text{E}[y(k)|Y(k-1)] \\ &= \text{E}[Cx(k)+w(k-1)|Y(k-1)] \\ &= C\hat{x}(k|k-1) \\ \end{aligned} x^(k∣k−1)x (k∣k−1)y^(k∣k−1)=E[x(k)∣Y(k−1)]=E[Ax(k−1)+Bu(k−1)+v(k−1)∣Y(k−1)]=AE[x(k−1)∣Y(k−1)]+BE[u(k−1)∣Y(k−1)]+E[v(k−1)∣Y(k−1)]=Ax^(k−1)+Bu(k−1)=x(k)−x^(k∣k−1)=[Ax(k−1)+Bu(k−1)+v(k−1)]−[Ax^(k−1)+Bu(k−1)]=Ax (k−1)+v(k−1)=E[y(k)∣Y(k−1)]=E[Cx(k)+w(k−1)∣Y(k−1)]=Cx^(k∣k−1)
由条件可加性,
x ^ ( k ) = E [ x ( k ) ∣ Y ( k ) ] = E [ x ( k ) ∣ Y ( k − 1 ) , y ( k ) ] = E [ x ( k ) ∣ Y ( k − 1 ) , y ~ ( k ∣ k − 1 ) ] = E [ x ( k ) ∣ Y ( k − 1 ) ] + E [ x ( k ) ∣ y ~ ( k ∣ k − 1 ) ] − E x ( k ) = x ^ ( k ∣ k − 1 ) + E [ x ( k ) ∣ y ~ ( k ∣ k − 1 ) ] − E x ( k ) \begin{aligned} \hat{x}(k) &= \text{E}[x(k)|Y(k)] \\ &= \text{E}[x(k)|Y(k-1),y(k)] \\ &= \text{E}[x(k)|Y(k-1),\widetilde{y}(k|k-1)] \\ &= \text{E}[x(k)|Y(k-1)]+\text{E}[x(k)|\widetilde{y}(k|k-1)]-\text{E}x(k) \\ &= \hat{x}(k|k-1)+\text{E}[x(k)|\widetilde{y}(k|k-1)]-\text{E}x(k) \end{aligned} x^(k)=E[x(k)∣Y(k)]=E[x(k)∣Y(k−1),y(k)]=E[x(k)∣Y(k−1),y (k∣k−1)]=