动态系统两大问题:
{ L e a r n i n g I n f e r e n c e : p ( Z ∣ X ) { d e c o d i n g → H M M ( 维特比 ) p r o b o f e v i d e n c e → p ( O ∣ λ ) ( 前向/后向 ) f i l t e r i n g : p ( z t ∣ x 1 , x 2 , ⋯ , x t ) s m o o t h i n g : p ( z t ∣ x 1 , x 2 , ⋯ , x T ) ( 给定所有数据 ) p r e d i c t i o n : { p ( z t ∣ x 1 , x 2 , ⋯ , x t − 1 ) p ( x t ∣ x 1 , x 2 , ⋯ , x t − 1 ) \left\{\begin{aligned}&Learning\\ &Inference:p(Z|X)\ \left\{ \begin{aligned} &decoding\to HMM\qquad\qquad\qquad\;\;\; _{(\text{维特比})} \\ &prob\ of\ evidence\to p(O|\lambda)\ \ \ \ \ \ \ \ \ \ \ \ \ \ _{(\text{前向/后向})} \\ &filtering:\quad\;\; p(z_t|x_1,x_2,\cdots,x_t)\\ &smoothing:\;\;\;p(z_t|x_1,x_2,\cdots,x_T)\ \ \ \ _{(\text{给定所有数据})}\\ &prediction:\left\{\begin{aligned}p(z_t|x_1,x_2,\cdots,x_{t-1})\\ p(x_t|x_1,x_2,\cdots,x_{t-1})\end{aligned}\right. \end{aligned}\right. \end{aligned}\right. ⎩⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎧LearningInference:p(Z∣X) ⎩⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎧decoding→HMM(维特比)prob of evidence→p(O∣λ) (前向/后向)filtering:p(zt∣x1,x2,⋯,xt)smoothing:p(zt∣x1,x2,⋯,xT) (给定所有数据)prediction:{p(zt∣x1,x2,⋯,xt−1)p(xt∣x1,x2,⋯,xt−1)
HMM 模型适用于隐变量是离散的值的时候,对于连续隐变量的 HMM,常用线性动态系统描述线性高斯模型的状态变量,使用粒子滤波来表述非高斯非线性变量。
LDS 又叫卡尔曼滤波,其线性体现在上一时刻与当前时刻隐变量以及隐变量与观测变量之间:
z t = A ⋅ z t − 1 + B + ε x t = C ⋅ z t + D + δ } 线性+噪声 ε ∼ N ( 0 , Q ) δ ∼ N ( 0 , R ) } 高斯分布噪声 \begin{aligned} &\left. \begin{aligned} z_t&=A\cdot z_{t-1}+B+\varepsilon\\ x_t&=C\cdot z_t+D+\delta \end{aligned} \right\}\color{blue}\ _\text{线性+噪声}\\ &\left. \begin{aligned} \varepsilon&\sim\mathcal{N}(0,Q)\ \\ \delta&\sim\mathcal{N}(0,R) \end{aligned} \right\}\color{blue}\quad_\text{高斯分布噪声} \end{aligned} ztxt=A⋅zt−1+B+ε=C⋅zt+D+δ} 线性+噪声εδ∼N(0,Q) ∼N(0,R)}高斯分布噪声
由线性高斯模型的性质可知:
p ( z t ∣ z t − 1 ) ∼ N ( A ⋅ z t − 1 + B , Q ) p ( x t ∣ z t ) ∼ N ( C ⋅ z t + D , R ) z 1 ∼ N ( μ 1 , Σ 1 ) \begin{aligned} p(z_t|z_{t-1})&\sim\mathcal{N}(A\cdot z_{t-1}+B,Q)\\ p(x_t|z_t)&\sim\mathcal{N}(C\cdot z_t+D,R)\\ z_1&\sim\mathcal{N}(\mu_1,\Sigma_1) \end{aligned} p(zt∣zt−1)p(xt∣zt)z1∼N(A⋅zt−1+B,Q)∼N(C⋅zt+D,R)∼N(μ1,Σ1)
LDS 更关心滤波问题:边缘后验概率分布 p ( z t ∣ x 1 , x 2 , ⋯ , x t ) p(z_t|x_1,x_2,\cdots,x_t) p(zt∣x1,x2,⋯,xt)。类似 HMM 中的前向算法,需要找到一个递推关系,令 x 1 , x 2 , ⋅ , x t x_1,x_2,\cdot,x_t x1,x2,⋅,xt 的简化表示为 x 1 : t x_{1:t} x1:t ,有:
p ( z t ∣ x 1 : t ) = p ( x 1 : t , z t ) p ( x 1 : t ) ∝ p ( x 1 : t , z t ) = p ( x t ∣ x 1 : t − 1 , z t ) p ( x 1 : t − 1 , z t ) = p ( x t ∣ z t ) p ( x 1 : t − 1 , z t ) = p ( x t ∣ z t ) p ( z t ∣ x 1 : t − 1 ) ⏟ P r e d i c t i o n p ( x 1 : t − 1 ) ⏟ 常数 ∝ p ( x t ∣ z t ) p ( z t ∣ x 1 : t − 1 ) \begin{aligned} \color{blue} p(z_t|x_{1:t})&=\frac{p(x_{1:t},z_t)} {\sout {p(x_{1:t})}}\\[12pt] &\propto{\color{blue}p(x_{1:t},z_t)} =p(x_t|x_{1:t-1},z_t)p(x_{1:t-1},z_t)\\ &=p(x_t|z_t)p(x_{1:t-1},z_t)\\ &=p(x_t|z_t)\underbrace{p(z_t|x_{1:t-1})}_{\color{blue}Prediction}\underbrace{p(x_{1:t-1})}_{\color{blue}\text{常数}}\\ &\propto p(x_t|z_t)p(z_t|x_{1:t-1})\ \end{aligned} p(zt∣x1:t)=p(x1:t)p(x1:t,zt)∝p(x1:t,zt)=p(xt∣x1:t−1,zt)p(x1:t−1,zt)=p(xt∣zt)p(x1:t−1,zt)=p(xt∣zt)Prediction p(zt∣x1:t−1)常数 p(x1:t−1)∝p(xt∣zt)p(zt∣x1:t−1)
对于上式中的 p r e d i c t i o n prediction prediction 项:
p ( z t ∣ x 1 : t − 1 ) = ∫ z t − 1 p ( z t , z t − 1 ∣ x 1 : t − 1 ) d z t − 1 = ∫ z t − 1 p ( z t ∣ z t − 1 , x 1 : t − 1 ) p ( z t − 1 ∣ x 1 : t − 1 ) d z t − 1 = ∫ z t − 1 p ( z t ∣ z t − 1 ) p ( z t − 1 ∣ x 1 : t − 1 ) d z t − 1 \begin{aligned} p(z_t|x_{1:t-1})&=\int_{z_{t-1}}p(z_t,z_{t-1}|x_{1:t-1})dz_{t-1}\\ &=\int_{z_{t-1}}p(z_t|z_{t-1},x_{1:t-1})p(z_{t-1}|x_{1:t-1})dz_{t-1}\\ &=\int_{z_{t-1}}p(z_t|z_{t-1}){\color{blue}p(z_{t-1}|x_{1:t-1})}dz_{t-1} \end{aligned} p(zt∣x1:t−1)=∫zt−1p(zt,zt−1∣x1:t−1)dzt−1=∫zt−1p(zt∣zt−1,x1:t−1)p(zt−1∣x1:t−1)dzt−1=∫zt−1p(zt∣zt−1)p(zt−1∣x1:t−1)dzt−1
至此,得到了关于 p ( z t ∣ x 1 : t ) \color{blue} p(z_t|x_{1:t}) p(zt∣x1:t) 的递推公式,总体上可分为两个部分
这其实是一个 Online 的过程:
t = 1 { p ( z 1 ∣ x 1 ) ⟶ u p d a t e p ( z 2 ∣ x 1 ) ⟶ p r i d i c t i o n t = 2 { p ( z 2 ∣ x 1 , x 2 ) ⟶ u p d a t e p ( z 3 ∣ x 1 , x 2 ) ⟶ p r i d i c t i o n ⋮ t = T { p ( z T ∣ x 1 , ⋯ , x T ) ⟶ u p d a t e p ( z T + 1 ∣ x 1 , ⋯ , x T ) ⟶ p r i d i c t i o n \begin{aligned} &t=1\ \left\{ \begin{aligned} &p(z_1|x_1)\longrightarrow update\\ &p(z_2|x_1)\longrightarrow pridiction \end{aligned} \right.\\[12pt] &t=2\ \left\{ \begin{aligned} &p(z_2|x_1,x_2)\longrightarrow update\\ &p(z_3|x_1,x_2)\longrightarrow pridiction \end{aligned} \right.\\ &\quad \vdots\\ & t=T\ \left\{ \begin{aligned} &p(z_T|x_1,\cdots,x_T)\longrightarrow update\\ &p(z_{T+1}|x_1,\cdots,x_T)\longrightarrow pridiction \end{aligned} \right. \end{aligned} t=1 {p(z1∣x1)⟶updatep(z2∣x1)⟶pridictiont=2 {p(z2∣x1,x2)⟶updatep(z3∣x1,x2)⟶pridiction⋮t=T {p(zT∣x1,⋯,xT)⟶updatep(zT+1∣x1,⋯,xT)⟶pridiction
对于线性高斯假设,可以得到解析解:
p ( z t ∣ x 1 : t − 1 ) = ∫ z t − 1 p ( z t ∣ z t − 1 ) p ( z t − 1 ∣ x 1 : t − 1 ) d z t − 1 = ∫ z t − 1 N ( A z t − 1 + B , Q ) N ( μ t − 1 , Σ t − 1 ) d z t − 1 \begin{aligned} p(z_t|x_{1:t-1})&=\int_{z_{t-1}}p(z_t|z_{t-1})p(z_{t-1}|x_{1:t-1})dz_{t-1}\\ &=\int_{z_{t-1}}\mathcal{N}(Az_{t-1}+B,Q)\mathcal{N}(\mu_{t-1},\Sigma_{t-1})dz_{t-1} \end{aligned} p(zt∣x1:t−1)=∫zt−1p(zt∣zt−1)p(zt−1∣x1:t−1)dzt−1=∫zt−1N(Azt−1+B,Q)N(μt−1,Σt−1)dzt−1
其中第二个高斯分布是上一步的 Update 过程,所以根据线性高斯模型性质:
p ( x ) = N ( x ∣ μ , Λ − 1 ) p ( y ∣ x ) = N ( y ∣ A x + b , L − 1 ) p ( y ) = N ( y ∣ A μ + b , L − 1 + A Λ − 1 A T ) \boxed{\begin{aligned} p(x)&=\mathcal N(x|\mu,\ \Lambda^{-1})\\ p(y|x)&=\mathcal N(y|Ax+b,\ L^{-1})\\ p(y)&=\mathcal N(y|A\mu+b,\ L^{-1}+A\Lambda^{-1}A^T) \end{aligned}} p(x)p(y∣x)p(y)=N(x∣μ, Λ−1)=N(y∣Ax+b, L−1)=N(y∣Aμ+b, L−1+AΛ−1AT)
可直接写出该积分: p ( z t ∣ x 1 : t − 1 ) = N ( A μ t − 1 + B , Q + A Σ t − 1 A T ) p(z_t|x_{1:t-1})=\mathcal{N}(A\mu_{t-1}+B,Q+A\Sigma_{t-1}A^T) p(zt∣x1:t−1)=N(Aμt−1+B,Q+AΣt−1AT)
p ( z t ∣ x 1 : t ) ∝ p ( x t ∣ z t ) p ( z t ∣ x 1 : t − 1 ) p(z_t|x_{1:t})\propto p(x_t|z_t)p(z_t|x_{1:t-1}) p(zt∣x1:t)∝p(xt∣zt)p(zt∣x1:t−1) 同样可以利用线性高斯模型的性质直接写出这个高斯分布。
【1】线性动态系统
【2】详解卡尔曼滤波原理
【3】How a Kalman filter works, in pictures
【4】线性动态系统与卡尔曼滤波
【5】隐马尔科夫模型(HMM)与线性动态系统(LDS)
【6】贝叶斯视角下的卡尔曼滤波
【7】机器学习中的高斯分布
【8】[易懂]如何理解那个把嫦娥送上天的卡尔曼滤波算法Kalman filter?