HMM 模型适用于隐变量是离散的值的时候,对于连续隐变量的 HMM,常用卡尔曼滤波(Kalman Filtering)描述线性高斯模型的态变量,使用粒子滤波(Particle Filter)来表述非高斯非线性的态变量。
线性体现在上一时刻和这一时刻的隐变量以及隐变量和观测变量之间,它们的关系可以表示为:
z t = A ⋅ z t − 1 + B + ε x t = C ⋅ z t + D + δ ε ∼ N ( 0 , Q ) δ ∼ N ( 0 , R ) z_{t}=A\cdot z_{t-1}+B+\varepsilon \\ x_{t}=C\cdot z_{t}+D+\delta \\ \varepsilon \sim N(0,Q)\\ \delta \sim N(0,R) zt=A⋅zt−1+B+εxt=C⋅zt+D+δε∼N(0,Q)δ∼N(0,R)
类比HMM中几个参数,我们也可以写出类似初始概率、转移概率或发射概率的形式:
P ( z t ∣ z t − 1 ) ∼ N ( A ⋅ z t − 1 + B , Q ) P ( x t ∣ z t ) ∼ N ( C ⋅ z t + D , R ) z 1 ∼ N ( μ 1 , Σ 1 ) P(z_{t}|z_{t-1})\sim N(A\cdot z_{t-1}+B,Q)\\ P(x_{t}|z_{t})\sim N(C\cdot z_{t}+D,R)\\ z_{1}\sim N(\mu _{1},\Sigma _{1}) P(zt∣zt−1)∼N(A⋅zt−1+B,Q)P(xt∣zt)∼N(C⋅zt+D,R)z1∼N(μ1,Σ1)
所有的参数为:
θ = ( A , B , C , D , Q , R , μ 1 , Σ 1 ) \theta =(A,B,C,D,Q,R,\mu _{1},\Sigma _{1}) θ=(A,B,C,D,Q,R,μ1,Σ1)
在多个inference问题中,卡尔曼滤波更关心Filtering问题,即求边缘概率:
P ( z t ∣ x 1 , x 2 , ⋯ , x t ) P(z_{t}|x_{1},x_{2},\cdots ,x_{t}) P(zt∣x1,x2,⋯,xt)
类似HMM的前向算法,我们需要找到一个递推关系:
P ( z t ∣ x 1 , x 2 , ⋯ , x t ) = P ( x 1 , x 2 , ⋯ , x t , z t ) P ( x 1 , x 2 , ⋯ , x t ) ∝ P ( x 1 , x 2 , ⋯ , x t , z t ) = P ( x t ∣ x 1 , x 2 , ⋯ , x t − 1 , z t ) ⏟ P ( x t ∣ z t ) ⋅ P ( x 1 , x 2 , ⋯ , x t − 1 , z t ) = P ( x t ∣ z t ) ⋅ P ( x 1 , x 2 , ⋯ , x t − 1 , z t ) = P ( x t ∣ z t ) ⋅ P ( z t ∣ x 1 , x 2 , ⋯ , x t − 1 ) ⏟ p r e d i c t i o n 问 题 ⋅ P ( x 1 , x 2 , ⋯ , x t − 1 ) ∝ P ( x t ∣ z t ) ⋅ P ( z t ∣ x 1 , x 2 , ⋯ , x t − 1 ) P(z_{t}|x_{1},x_{2},\cdots ,x_{t})\\ =\frac{P(x_{1},x_{2},\cdots ,x_{t},z_{t})}{P(x_{1},x_{2},\cdots ,x_{t})}\\ \propto P(x_{1},x_{2},\cdots ,x_{t},z_{t})\\ =\underset{P(x_{t}|z_{t})}{\underbrace{P(x_{t}|x_{1},x_{2},\cdots ,x_{t-1},z_{t})}}\cdot P(x_{1},x_{2},\cdots ,x_{t-1},z_{t})\\ =P(x_{t}|z_{t})\cdot P(x_{1},x_{2},\cdots ,x_{t-1},z_{t})\\ =P(x_{t}|z_{t})\cdot \underset{prediction问题} {\underbrace{P(z_{t}|x_{1},x_{2},\cdots ,x_{t-1})}}\cdot P(x_{1},x_{2},\cdots ,x_{t-1})\\ \propto P(x_{t}|z_{t})\cdot P(z_{t}|x_{1},x_{2},\cdots ,x_{t-1}) P(zt∣x1,x2,⋯,xt)=P(x1,x2,⋯,xt)P(x1,x2,⋯,xt,zt)∝P(x1,x2,⋯,xt,zt)=P(xt∣zt) P(xt∣x1,x2,⋯,xt−1,zt)⋅P(x1,x2,⋯,xt−1,zt)=P(xt∣zt)⋅P(x1,x2,⋯,xt−1,zt)=P(xt∣zt)⋅prediction问题 P(zt∣x1,x2,⋯,xt−1)⋅P(x1,x2,⋯,xt−1)∝P(xt∣zt)⋅P(zt∣x1,x2,⋯,xt−1)
上式结果中, P ( x t ∣ z t ) P(x_{t}|z_{t}) P(xt∣zt)已知,而另一项可做以下转化:
P ( z t ∣ x 1 , x 2 , ⋯ , x t − 1 ) = ∫ z t − 1 P ( z t , z t − 1 ∣ x 1 , x 2 , ⋯ , x t − 1 ) d z t − 1 = ∫ z t − 1 P ( z t ∣ z t − 1 , x 1 , x 2 , ⋯ , x t − 1 ) ⏟ P ( z t ∣ z t − 1 ) ⋅ P ( z t − 1 ∣ x 1 , x 2 , ⋯ , x t − 1 ) ⏟ F i l t e r i n g 问 题 d z t − 1 = ∫ z t − 1 P ( z t ∣ z t − 1 ) ⋅ P ( z t − 1 ∣ x 1 , x 2 , ⋯ , x t − 1 ) d z t − 1 P(z_{t}|x_{1},x_{2},\cdots ,x_{t-1})\\ =\int _{z_{t-1}}P(z_{t},z_{t-1}|x_{1},x_{2},\cdots ,x_{t-1})\mathrm{d}z_{t-1}\\ =\int _{z_{t-1}}\underset{P(z_{t}|z_{t-1})}{\underbrace{P(z_{t}|z_{t-1},x_{1},x_{2},\cdots ,x_{t-1})}}\cdot \underset{Filtering问题}{\underbrace{P(z_{t-1}|x_{1},x_{2},\cdots ,x_{t-1})}}\mathrm{d}z_{t-1}\\ =\int _{z_{t-1}}P(z_{t}|z_{t-1})\cdot P(z_{t-1}|x_{1},x_{2},\cdots ,x_{t-1})\mathrm{d}z_{t-1} P(zt∣x1,x2,⋯,xt−1)=∫zt−1P(zt,zt−1∣x1,x2,⋯,xt−1)dzt−1=∫zt−1P(zt∣zt−1) P(zt∣zt−1,x1,x2,⋯,xt−1)⋅Filtering问题 P(zt−1∣x1,x2,⋯,xt−1)dzt−1=∫zt−1P(zt∣zt−1)⋅P(zt−1∣x1,x2,⋯,xt−1)dzt−1
因此,我们找到了Filtering问题的递推式:
P ( z t ∣ x 1 , x 2 , ⋯ , x t ) = C ⋅ P ( x t ∣ z t ) ⋅ ∫ z t − 1 P ( z t ∣ z t − 1 ) ⋅ P ( z t − 1 ∣ x 1 , x 2 , ⋯ , x t − 1 ) d z t − 1 {\color{Red}{P(z_{t}|x_{1},x_{2},\cdots ,x_{t})}}=C\cdot P(x_{t}|z_{t})\cdot \int _{z_{t-1}}P(z_{t}|z_{t-1})\cdot {\color{Red}{P(z_{t-1}|x_{1},x_{2},\cdots ,x_{t-1})}}\mathrm{d}z_{t-1} P(zt∣x1,x2,⋯,xt)=C⋅P(xt∣zt)⋅∫zt−1P(zt∣zt−1)⋅P(zt−1∣x1,x2,⋯,xt−1)dzt−1
因此,我们可以确定求解Filtering问题的步骤如下:
t = 1 { P ( z 1 ∣ x 1 ) → u p d a t e P ( z 2 ∣ x 1 ) → p r e d i c t i o n t = 2 { P ( z 2 ∣ x 1 , x 2 ) → u p d a t e P ( z 3 ∣ x 1 , x 2 ) → p r e d i c t i o n ⋮ t { P ( z t ∣ x 1 , x 2 , ⋯ , x t ) → u p d a t e P ( z t + 1 ∣ x 1 , x 2 , ⋯ , x t ) → p r e d i c t i o n t=1\left\{\begin{matrix} P(z_{1}|x_{1})\rightarrow update\\ P(z_{2}|x_{1})\rightarrow prediction \end{matrix}\right.\\ t=2\left\{\begin{matrix} P(z_{2}|x_{1},x_{2})\rightarrow update\\ P(z_{3}|x_{1},x_{2})\rightarrow prediction \end{matrix}\right.\\ \vdots \\ t\left\{\begin{matrix} P(z_{t}|x_{1},x_{2},\cdots ,x_{t})\rightarrow update\\ P(z_{t+1}|x_{1},x_{2},\cdots ,x_{t})\rightarrow prediction \end{matrix}\right. t=1{ P(z1∣x1)→updateP(z2∣x1)→predictiont=2{ P(z2∣x1,x2)→updateP(z3∣x1,x2)→prediction⋮t{ P(zt∣x1,x2,⋯,xt)→updateP(zt+1∣x1,x2,⋯,xt)→prediction
很明显这是一个online的过程。
通过上述转化我们可以确定Filtering问题的计算是通过以下两步迭代计算进行的:
Step1 Prediction:
P ( z t ∣ x 1 , x 2 , ⋯ , x t − 1 ) = ∫ z t − 1 P ( z t ∣ z t − 1 ) ⋅ P ( z t − 1 ∣ x 1 , x 2 , ⋯ , x t − 1 ) d z t − 1 P(z_{t}|x_{1},x_{2},\cdots ,x_{t-1})=\int _{z_{t-1}}P(z_{t}|z_{t-1})\cdot P(z_{t-1}|x_{1},x_{2},\cdots ,x_{t-1})\mathrm{d}z_{t-1} P(zt∣x1,x2,⋯,xt−1)=∫zt−1P(zt∣zt−1)⋅P(zt−1∣x1,x2,⋯,xt−1)dzt−1
Step2 Update:
P ( z t ∣ x 1 , x 2 , ⋯ , x t ) = C ⋅ P ( x t ∣ z t ) ⋅ P ( z t ∣ x 1 , x 2 , ⋯ , x t − 1 ) P(z_{t}|x_{1},x_{2},\cdots ,x_{t})=C\cdot P(x_{t}|z_{t})\cdot P(z_{t}|x_{1},x_{2},\cdots ,x_{t-1}) P(zt∣x1,x2,⋯,xt)=C⋅P(xt∣zt)⋅P(zt∣x1,x2,⋯,xt−1)
我们可以确定的是几个高斯分布经过相乘或者积分运算后仍然是高斯分布,所以我们假设:
P r e d i c t i o n : P ( z t ∣ x 1 , x 2 , ⋯ , x t − 1 ) = N ( z t ∣ μ t ∗ , Σ t ∗ ) U p d a t e : P ( z t ∣ x 1 , x 2 , ⋯ , x t ) = N ( z t ∣ μ t , Σ t ) Prediction:P(z_{t}|x_{1},x_{2},\cdots ,x_{t-1})=N(z_{t}|\mu _{t}^{*},\Sigma _{t}^{*})\\ Update:P(z_{t}|x_{1},x_{2},\cdots ,x_{t})=N(z_{t}|\mu _{t},\Sigma _{t}) Prediction:P(zt∣x1,x2,⋯,xt−1)=N(zt∣μt∗,Σt∗)Update:P(zt∣x1,x2,⋯,xt)=N(zt∣μt,Σt)
代入高斯分布的形式可以得到:
P r e d i c t i o n : N ( z t ∣ μ t ∗ , Σ t ∗ ) = ∫ z t − 1 N ( z t ∣ A ⋅ z t − 1 + B , Q ) ⋅ N ( z t − 1 ∣ μ t − 1 , Σ t − 1 ) ⋅ d z t − 1 U p d a t e : N ( z t ∣ μ t , Σ t ) = C ⋅ N ( x t ∣ C ⋅ z t + D , R ) ⋅ N ( z t ∣ μ t ∗ , Σ t ∗ ) Prediction:N(z_{t}|\mu _{t}^{*},\Sigma _{t}^{*})=\int _{z_{t-1}}N(z_{t}|A\cdot z_{t-1}+B,Q)\cdot N(z_{t-1}|\mu _{t-1},\Sigma _{t-1})\cdot \mathrm{d}z_{t-1} \\ Update:N(z_{t}|\mu _{t},\Sigma _{t})=C\cdot N(x_{t}|C\cdot z_{t}+D,R)\cdot N(z_{t}|\mu _{t}^{*},\Sigma _{t}^{*}) Prediction:N(zt∣μt∗,Σt∗)=∫zt−1N(zt∣A⋅zt−1+B,Q)⋅N(zt−1∣μt−1,Σt−1)⋅dzt−1Update:N(zt∣μt,Σt)=C⋅N(xt∣C⋅zt+D,R)⋅N(zt∣μt∗,Σt∗)
接下来的求解需要用到高斯分布|机器学习推导系列(二)第六部分内容中我们得到的结论,即已知 P ( x ) P(x) P(x)和 P ( y ∣ x ) P(y|x) P(y∣x)来求 P ( y ) P(y) P(y)和 P ( x ∣ y ) P(x|y) P(x∣y),这里我们直接套用公式即可。
首先,在Prediction过程中:
P ( z t ∣ x 1 , x 2 , ⋯ , x t − 1 ) ⏟ P ( y ) = ∫ z t − 1 P ( z t ∣ z t − 1 ) ⏟ P ( y ∣ x ) ⋅ P ( z t − 1 ∣ x 1 , x 2 , ⋯ , x t − 1 ) ⏟ P ( x ) d z t − 1 \underset{P(y)}{\underbrace{P(z_{t}|x_{1},x_{2},\cdots ,x_{t-1})}}=\int _{z_{t-1}}\underset{P(y|x)}{\underbrace{P(z_{t}|z_{t-1})}}\cdot \underset{P(x)}{\underbrace{P(z_{t-1}|x_{1},x_{2},\cdots ,x_{t-1})}}\mathrm{d}z_{t-1} P(y) P(zt∣x1,x2,⋯,xt−1)=∫zt−1P(y∣x) P(zt∣zt−1)⋅P(x) P(zt−1∣x1,x2,⋯,xt−1)dzt−1
代入计算 P ( y ) P(y) P(y)的公式可得:
μ t ∗ = A μ t − 1 + B Σ t ∗ = Q + A Σ t − 1 A T \mu _{t}^{*}=A\mu _{t-1}+B\\ \Sigma _{t}^{*}=Q+A\Sigma _{t-1}A^{T} μt∗=Aμt−1+BΣt∗=Q+AΣt−1AT
在update过程中:
P ( z t ∣ x 1 , x 2 , ⋯ , x t ) ⏟ P ( x ∣ y ) = C ⋅ P ( x t ∣ z t ) ⏟ P ( y ∣ x ) ⋅ P ( z t ∣ x 1 , x 2 , ⋯ , x t − 1 ) ⏟ P ( x ) \underset{P(x|y)}{\underbrace{P(z_{t}|x_{1},x_{2},\cdots ,x_{t})}}=C\cdot \underset{P(y|x)}{\underbrace{P(x_{t}|z_{t})}}\cdot \underset{P(x)}{\underbrace{P(z_{t}|x_{1},x_{2},\cdots ,x_{t-1})}} P(x∣y) P(zt∣x1,x2,⋯,xt)=C⋅P(y∣x) P(xt∣zt)⋅P(x) P(zt∣x1,x2,⋯,xt−1)
代入计算 P ( x ∣ y ) P(x|y) P(x∣y)的的公式也可以得出结果,过程比较复杂,所以省略。
注意这里将 x 1 , x 2 , ⋯ , x t − 1 x_{1},x_{2},\cdots ,x_{t-1} x1,x2,⋯,xt−1看做已知即可,然后再套用 p ( x ) p(x) p(x)、 p ( y ∣ x ) p(y|x) p(y∣x)等形式。
另外这里只需要根据公式直接套用得出结果即可,具体的得出的Prediction和Update计算公式并没有用到。