此文章主要是结合哔站shuhuai008大佬的白板推导视频:粒子滤波_98min
全部笔记的汇总贴:机器学习-白板推导系列笔记
动态模型是在概率图模型中加入一个时间序列的标记,样本之间不再是独立同分布,而是有了依赖关系。动态模型其实质是一个混合模型,我们看到的样本序列是观测序列,每一个观测值,背后都对应一个隐变量,称隐变量为系统状态,所以动态模型也称状态空间模型。
动态模型有两个假设,
- 齐次马尔可夫假设,对于隐变量,给定 Z t Z_t Zt的情况下, Z t + 1 Z_{t+1} Zt+1和 Z t − 1 Z_{t-1} Zt−1无关。
- 观测独立假设,对于观测值,给定 Z t Z_t Zt的情况下, x t x_t xt只与 Z t Z_t Zt有关。
可以得到两个方程,分别是 Z t Z_t Zt与 Z t − 1 Z_{t-1} Zt−1之间的关系,以及 x t x_t xt与 Z t Z_t Zt之间的关系。
Z t = g ( Z t − 1 , u , ε ) Z_t = g(Z_{t-1},u,\varepsilon) Zt=g(Zt−1,u,ε)
x t = h ( Z t , u , δ ) x_t = h(Z_t,u, \delta) xt=h(Zt,u,δ)
在HMM中,参数为:
λ = ( π , A , B ) \lambda = (\pi,A,B) λ=(π,A,B)
其中 A A A是状态转移矩阵,对应函数 Z t = g ( Z t − 1 , u , ε ) Z_t = g(Z_{t-1},u,\varepsilon) Zt=g(Zt−1,u,ε), B B B对应函数 x t = h ( Z t , u , δ ) x_t = h(Z_t,u, \delta) xt=h(Zt,u,δ)。
在Kalman Filter中,我们有如下假设,
z t = A ⋅ z t − 1 + B + ε x t = C ⋅ z t + D + δ ε ∼ N ( 0 , Q ) δ ∼ N ( 0 , R ) z_{t}=A\cdot z_{t-1}+B+\varepsilon \\ x_{t}=C\cdot z_{t}+D+\delta \\ \varepsilon \sim N(0,Q)\\ \delta \sim N(0,R) zt=A⋅zt−1+B+εxt=C⋅zt+D+δε∼N(0,Q)δ∼N(0,R)
主要是解决Filtering问题,即求解边缘概率:
P ( z t ∣ x 1 , x 2 , ⋯ , x t ) P(z_{t}|x_{1},x_{2},\cdots ,x_{t}) P(zt∣x1,x2,⋯,xt)
主要分为两步,第一步是Prediction过程,也就是预测过程,相当于给 Z t Z_t Zt一个先验,
P ( z t ∣ x 1 , x 2 , ⋯ , x t − 1 ) = ∫ z t − 1 P ( z t ∣ z t − 1 ) ⋅ P ( z t − 1 ∣ x 1 , x 2 , ⋯ , x t − 1 ) d z t − 1 P(z_{t}|x_{1},x_{2},\cdots ,x_{t-1})=\int _{z_{t-1}}P(z_{t}|z_{t-1})\cdot P(z_{t-1}|x_{1},x_{2},\cdots ,x_{t-1})\mathrm{d}z_{t-1} P(zt∣x1,x2,⋯,xt−1)=∫zt−1P(zt∣zt−1)⋅P(zt−1∣x1,x2,⋯,xt−1)dzt−1
第二步是Update过程,也就是更新过程,就相当于是求 Z t Z_t Zt的后验,
P ( z t ∣ x 1 , x 2 , ⋯ , x t ) ∝ P ( x t ∣ z t ) ⋅ P ( z t ∣ x 1 , x 2 , ⋯ , x t − 1 ) { {P(z_{t}|x_{1},x_{2},\cdots ,x_{t})}}\propto { {P(x_{t}|z_{t})}}\cdot { {P(z_{t}|x_{1},x_{2},\cdots ,x_{t-1})}} P(zt∣x1,x2,⋯,xt)∝P(xt∣zt)⋅P(zt∣x1,x2,⋯,xt−1)
具体求解过程可以参考:白板推导系列笔记(十五)-卡曼滤波
卡曼滤波对应线性,而粒子滤波则对应非线性。
因为卡曼滤波是一种线性高斯模型,所以可以通过不断地进行Prediction和Update来求得解析解。但对于粒子滤波这种非线性的模型而言,没有像高斯分布这样的比较好的特征,所以没有办法得到解析解,因此要解决Filtering问题,就必须借助于采样。
对于贝叶斯问题,主要就是给定 X X X来求隐变量 Z Z Z,即 P ( Z ∣ X ) P(Z|X) P(Z∣X),而蒙特卡洛方法就是通过抽样来近似地求后验。
而这其中最主要地就是求期望 E E E,
E z ∣ x [ f ( z ) ] = ∫ f ( z ) p ( z ∣ x ) d z ≈ 1 N ∑ i = 1 N f ( z ( i ) ) E_{z|x}[f(z)]=\int f(z)p(z|x){d}z\\\approx\frac{1}{N}\sum^N_{i=1}{f(z^{(i)})} Ez∣x[f(z)]=∫f(z)p(z∣x)dz≈N1i=1∑Nf(z(i))
N N N个样本 z ( i ) ∼ p ( z ∣ x ) , z ( 1 ) , z ( 2 ) , ⋯ , z ( N ) z^{(i)}\sim p(z|x), z^{(1)}, z^{(2)}, \cdots,z^{(N)} z(i)∼p(z∣x),z(1),z(2),⋯,z(N)
现在的难题是,我们无法从 p ( z ∣ x ) p(z|x) p(z∣x)中采样,要么是 p ( z ∣ x ) p(z|x) p(z∣x)很复杂,要么就维度较高。因此,我们采用 Importance Sampling来处理这个问题,引入一个相对简单、可以直接采样的 q ( z ) q(z) q(z),再分析上式中的期望 E E E:
E z ∣ x [ f ( z ) ] = ∫ f ( z ) p ( z ∣ x ) d z = ∫ f ( z ) p ( z ∣ x ) q ( z ∣ x ) q ( z ∣ x ) d z = 1 N ∑ i = 1 N f ( z ( i ) ) p ( z ( i ) ∣ x 1 , x 2 , ⋯ , x t ) q ( z ( i ) ∣ x 1 , x 2 , ⋯ , x t ) E_{z|x}[f(z)]=\int f(z)p(z|x){d}z\\=\int f(z)\frac{p(z|x)}{q(z|x)}q(z|x){d}z\\=\frac{1}{N}\sum^N_{i=1}{f(z^{(i)})}\frac{p(z^{(i)}|x_1,x_2,\cdots,x_t)}{q(z^{(i)}|x_1,x_2,\cdots,x_t)} Ez∣x[f(z)]=∫f(z)p(z∣x)dz=∫f(z)q(z∣x)p(z∣x)q(z∣x)dz=N1i=1∑Nf(z(i))q(z(i)∣x1,x2,⋯,xt)p(z(i)∣x1,x2,⋯,xt)
q ( z ∣ x ) , z ( i ) ∼ q ( z ) , i = 1 , 2 , ⋯ , N q(z|x),z^{(i)}\sim q(z),i = 1,2,\cdots,N q(z∣x),z(i)∼q(z),i=1,2,⋯,N
q ( z ∣ x ) q(z|x) q(z∣x):提议分布
w ( i ) = p ( z ( i ) ∣ x 1 , x 2 , ⋯ , x t ) q ( z ( i ) ∣ x 1 , x 2 , ⋯ , x t ) w^{(i)}=\frac{p(z^{(i)}|x_1,x_2,\cdots,x_t)}{q(z^{(i)}|x_1,x_2,\cdots,x_t)} w(i)=q(z(i)∣x1,x2,⋯,xt)p(z(i)∣x1,x2,⋯,xt):weight
我们引入 x 1 : t = x 1 , x 2 , ⋯ , x t x_{1:t}=x_1,x_2,\cdots,x_t x1:t=x1,x2,⋯,xt,所以,对于filtering问题,
w t ( i ) = p ( z t ( i ) ∣ x 1 : t ) q ( z t ( i ) ∣ x 1 : t ) w_t^{(i)}=\frac{p(z_t^{(i)}|x_{1:t})}{q(z_t^{(i)}|x_{1:t})} wt(i)=q(zt(i)∣x1:t)p(zt(i)∣x1:t)
所以,当 t = 1 t=1 t=1时,求 w 1 ( i ) , i = 1 , 2 , ⋯ , N w_1^{(i)},i = 1,2,\cdots,N w1(i),i=1,2,⋯,N,即求 w 1 1 , w 1 2 , ⋯ , w 1 N w_1^1,w_1^2,\cdots,w_1^N w11,w12,⋯,w1N;
当 t = 2 t=2 t=2时,求 w 2 ( i ) , i = 1 , 2 , ⋯ , N w_2^{(i)},i = 1,2,\cdots,N w2(i),i=1,2,⋯,N,即求 w 2 1 , w 2 2 , ⋯ , w 2 N w_2^1,w_2^2,\cdots,w_2^N w21,w22,⋯,w2N;
⋮ \vdots ⋮
不难发现,如果要计算 w w w的值,我们每个时刻都要算 N N N次,而且 p ( z t ( i ) ∣ x 1 : t ) p(z_t^{(i)}|x_{1:t}) p(zt(i)∣x1:t)也是非常难求的,所以我们需要简化 w w w的求解。
SIS(Sequential Importance Sampling)的引入就是为了寻找 w t ( i ) w_t^{(i)} wt(i)与 w t − 1 ( i ) w_{t-1}^{(i)} wt−1(i)之间的关系。不是直接求 p ( z t ∣ x 1 : t ) p(z_t|x_{1:t}) p(zt∣x1:t),而是求 p ( z 1 : t ∣ x 1 : t ) p(z_{1:t}|x_{1:t}) p(z1:t∣x1:t)这个边缘概率。所以对于 w t ( i ) w_t^{(i)} wt(i)有:
w t ( i ) ∝ p ( z 1 : t ∣ x 1 : t ) q ( z 1 : t ∣ x 1 : t ) w_t^{(i)}\propto\frac{p(z_{1:t}|x_{1:t})}{q(z_{1:t}|x_{1:t})} wt(i)∝q(z1:t∣x1:t)p(z1:t∣x1:t)
我们首先看分子,
p ( z 1 : t ∣ x 1 : t ) = p ( z 1 : t , x 1 : t ) p ( x 1 : t ) ⏟ C = 1 C p ( z 1 : t , x 1 : t ) = 1 C p ( x t ∣ z 1 : t , x 1 : t − 1 ) ⋅ p ( z 1 : t , x 1 : t ) = 1 C p ( x t ∣ z t ) ⋅ p ( z 1 : t , x 1 : t ) = 1 C p ( x t ∣ z t ) ⋅ p ( z t ∣ z 1 : t − 1 , x 1 : t − 1 ) ⋅ p ( z 1 : t − 1 , x 1 : t − 1 ) = 1 C p ( x t ∣ z t ) ⋅ p ( z t ∣ z t − 1 ) ⋅ p ( z 1 : t − 1 , x 1 : t − 1 ) = 1 C p ( x t ∣ z t ) ⋅ p ( z t ∣ z t − 1 ) ⋅ p ( z 1 : t − 1 ∣ x 1 : t − 1 ) ⋅ p ( x 1 : t − 1 ) ⏟ D = D C p ( x t ∣ z t ) ⋅ p ( z t ∣ z t − 1 ) ⋅ p ( z 1 : t − 1 ∣ x 1 : t − 1 ) {p(z_{1:t}|x_{1:t})}=\frac{ {p(z_{1:t},x_{1:t})}}{\underset{C}{\underbrace {p(x_{1:t})}}}\\=\frac{1}{C}p(z_{1:t},x_{1:t})\\=\frac{1}{C}p(x_t|z_{1:t},x_{1:t-1})\cdot p(z_{1:t},x_{1:t})\\=\frac{1}{C}p(x_t|z_t)\cdot p(z_{1:t},x_{1:t})\\=\frac{1}{C}p(x_t|z_t)\cdot p(z_t|z_{1:t-1},x_{1:t-1})\cdot p(z_{1:t-1},x_{1:t-1})\\=\frac{1}{C}p(x_t|z_t)\cdot p(z_t|z_{t-1})\cdot p(z_{1:t-1},x_{1:t-1})\\=\frac{1}{C}p(x_t|z_t)\cdot p(z_t|z_{t-1})\cdot p(z_{1:t-1}|x_{1:t-1})\cdot\underset{D}{\underbrace{ p(x_{1:t-1})}}\\=\frac{D}{C}p(x_t|z_t)\cdot p(z_t|z_{t-1})\cdot{\color{red}{p(z_{1:t-1}|x_{1:t-1})}} p(z1:t∣x1:t)=C p(x1:t)p(z1:t,x1:t)=C1p(z1:t,x1:t)=C1p(xt∣z1:t,x1:t−1)⋅p(z1:t,x1:t)=C1p(xt∣zt)⋅p(z1:t,x1:t)=C1p(xt∣zt)⋅p(zt∣z1:t−1,x1:t−1)⋅p(z1:t−1,x1:t−1)=C1p(xt∣zt)⋅p(zt∣zt−1)⋅p(z1:t−1,x1:t−1)=C1p(xt∣zt)⋅p(zt∣zt−1)⋅p(z1:t−1∣x1:t−1)⋅D p(x1:t−1)=CDp(xt∣zt)⋅p(zt∣zt−1)⋅p(z1:t−1∣x1:t−1)
然后我们看分母,
假定有: q ( z 1 : t ∣ x 1 : t ) = q ( z t ∣ z 1 : t − 1 , x 1 : t ) ⋅ q ( z 1 : t − 1 ∣ x 1 : t − 1 ) q(z_{1:t}|x_{1:t})=q(z_t|z_{1:t-1},x_{1:t})\cdot q(z_{1:t-1}|x_{1:t-1}) q(z1:t∣x1:t)=q(zt∣z1:t−1,x1:t)⋅q(z1:t−1∣x1:t−1)
所以,可以推出:
w t ( i ) ∝ p ( z 1 : t ∣ x 1 : t ) q ( z 1 : t ∣ x 1 : t ) ∝ p ( x t ∣ z t ) ⋅ p ( z t ∣ z t − 1 ) ⋅ p ( z 1 : t − 1 ∣ x 1 : t − 1 ) q ( z t ∣ z 1 : t − 1 , x 1 : t ) ⋅ q ( z 1 : t − 1 ∣ x 1 : t − 1 ) = p ( x t ∣ z t ) ⋅ p ( z t ∣ z t − 1 ) q ( z t ∣ z 1 : t − 1 , x 1 : t ) ⋅ w t − 1 ( i ) w_t^{(i)}\propto\frac{p(z_{1:t}|x_{1:t})}{q(z_{1:t}|x_{1:t})}\propto\frac{p(x_t|z_t)\cdot p(z_t|z_{t-1})\cdot{ {\color{blue}p(z_{1:t-1}|x_{1:t-1})}}}{q(z_t|z_{1:t-1},x_{1:t})\cdot{\color{blue} q(z_{1:t-1}|x_{1:t-1})}}\\=\frac{p(x_t|z_t)\cdot p(z_t|z_{t-1})}{q(z_t|z_{1:t-1},x_{1:t})}\cdot w_{t-1}^{(i)} wt(i)∝q(z1:t∣x1:t)p(z1:t∣x1:t)∝q(zt∣z1:t−1,x1:t)⋅q(z1:t−1∣x1:t−1)p(xt∣zt)⋅p(zt∣zt−1)⋅p(z1:t−1∣x1:t−1)=q(zt∣z1:t−1,x1:t)p(xt∣zt)⋅p(zt∣zt−1)⋅wt−1(i)
所以,当我们在 t = 1 t=1 t=1时刻求出 w 1 ( i ) , N w_1^{(i)},N w1(i),N个值的时候,再求 t = 2 t=2 t=2时刻时,只需要套用上述公式,就能直接求出 w 2 ( i ) w_2^{(i)} w2(i),也就解决了 w w w计算难的问题。所以,
E z ∣ x [ f ( z ) ] = ∫ f ( z ) p ( z ∣ x ) d z = ∫ f ( z ) p ( z ∣ x ) q ( z ∣ x ) q ( z ∣ x ) d z = 1 N ∑ i = 1 N f ( z ( i ) ) p ( z ( i ) ∣ x 1 , x 2 , ⋯ , x t ) q ( z ( i ) ∣ x 1 , x 2 , ⋯ , x t ) = 1 N ∑ i = 1 N f ( z ( i ) ) w ( i ) = ∑ i = 1 N f ( z ( i ) ) w ^ ( i ) E_{z|x}[f(z)]=\int f(z)p(z|x){d}z\\=\int f(z)\frac{p(z|x)}{q(z|x)}q(z|x){d}z\\=\frac{1}{N}\sum^N_{i=1}{f(z^{(i)})}\frac{p(z^{(i)}|x_1,x_2,\cdots,x_t)}{q(z^{(i)}|x_1,x_2,\cdots,x_t)}\\=\frac{1}{N}\sum^N_{i=1}{f(z^{(i)})}w^{(i)}\\=\sum^N_{i=1}{f(z^{(i)})}\hat{w}^{(i)} Ez∣x[f(z)]=∫f(z)p(z∣x)dz=∫f(z)q(z∣x)p(z∣x)q(z∣x)dz=N1i=1∑Nf(z(i))q(z(i)∣x1,x2,⋯,xt)p(z(i)∣x1,x2,⋯,xt)=N1i=1∑Nf(z(i))w(i)=i=1∑Nf(z(i))w^(i)
algorithm:
前提: t − 1 t-1 t−1时刻,采样已经完成,即 w t − 1 ( i ) w_{t-1}^{(i)} wt−1(i)已知,
t t t时刻: for i = 1 , 2 , ⋯ , N i=1,2,\cdots,N i=1,2,⋯,N
z t ( i ) ∼ q ( z t ∣ z t − 1 , x 1 : t ) w t ( i ) ∝ w t − 1 ( i ) ⋅ p ( x t ∣ z t ( i ) ) ⋅ p ( z t ( i ) ∣ z t − 1 ( i ) ) q ( z t ( i ) ∣ z t − 1 ( i ) , x 1 : t ) z_t^{(i)}\sim q(z_t|z_{t-1},x_{1:t})\\w_t^{(i)}\propto w_{t-1}^{(i)}\cdot \frac{p(x_t|z_t^{(i)})\cdot p(z_t^{(i)}|z_{t-1}^{(i)})}{q(z_t^{(i)}|z_{t-1}^{(i)},x_{1:t})} zt(i)∼q(zt∣zt−1,x1:t)wt(i)∝wt−1(i)⋅q(zt(i)∣zt−1(i),x1:t)p(xt∣zt(i))⋅p(zt(i)∣zt−1(i))
end
w t ( i ) w_t^{(i)} wt(i)归一化,即 ∑ i = 1 N w t ( i ) = 1 \sum_{i=1}^Nw_t^{(i)}=1 ∑i=1Nwt(i)=1。
问题:权值退化,随着 i i i增大, w t ( i ) w_t^{(i)} wt(i)的权值会接近于 0 0 0.
方法:
Basic Particle Filter = SIS + Resampling
如何选择一个合适的 q ( z ) q(z) q(z)?
我们一般选择,
q ( z t ∣ z 1 : t − 1 , x 1 : t ) = p ( z t ∣ z t − 1 ) q(z_t|z_{1:t-1},x_{1:t}) = p(z_t|z_{t-1}) q(zt∣z1:t−1,x1:t)=p(zt∣zt−1)
此时,
w t ( i ) = w t − 1 ( i ) ⋅ p ( x t ∣ z t ( i ) ) ⋅ p ( z t ( i ) ∣ z t − 1 ( i ) ) q ( z t ( i ) ∣ z t − 1 ( i ) , x 1 : t ) = w t − 1 ( i ) ⋅ p ( x t ∣ z t ( i ) ) ⋅ p ( z t ( i ) ∣ z t − 1 ( i ) ) p ( z t ( i ) ∣ z t − 1 ( i ) ) = w t − 1 ( i ) ⋅ p ( x t ∣ z t ( i ) ) w_t^{(i)}=w_{t-1}^{(i)}\cdot \frac{p(x_t|z_t^{(i)})\cdot p(z_t^{(i)}|z_{t-1}^{(i)})}{q(z_t^{(i)}|z_{t-1}^{(i)},x_{1:t})}\\=w_{t-1}^{(i)}\cdot \frac{p(x_t|z_t^{(i)})\cdot p(z_t^{(i)}|z^{(i)}_{t-1})}{p(z^{(i)}_t|z^{(i)}_{t-1})}\\=w_{t-1}^{(i)}\cdot p(x_t|z_t^{(i)}) wt(i)=wt−1(i)⋅q(zt(i)∣zt−1(i),x1:t)p(xt∣zt(i))⋅p(zt(i)∣zt−1(i))=wt−1(i)⋅p(zt(i)∣zt−1(i))p(xt∣zt(i))⋅p(zt(i)∣zt−1(i))=wt−1(i)⋅p(xt∣zt(i))
其中,
z t ( i ) ∼ p ( z t ∣ z t − 1 ( i ) ) z_t^{(i)}\sim p(z_t|z_{t-1}^{(i)}) zt(i)∼p(zt∣zt−1(i))
此算法叫做SIR Filter(Sampling Importance Resampling Filter)= SIS + Resampling+ q ( z ) ⏟ = p ( z t ∣ z t − 1 ) \underset{= p(z_t|z_{t-1})}{\underbrace{q(z)}} =p(zt∣zt−1) q(z)
\;\;\;\; “ g e n e r a t e ⏟ z t = p ( z t ∣ z t − 1 ) a n d t e s t ⏟ w t ( i ) = w t − 1 ( i ) ⋅ p ( x t ∣ z t ( i ) ) \;\;\;\;\;\;\underset{z_t= p(z_t|z_{t-1})}{\underbrace{generate}}\;\;\;\;\;\;\;\; and\;\;\underset{w_t^{(i)}=w_{t-1}^{(i)}\cdot \color{blue}p(x_t|z_t^{(i)})}{\underbrace{test}} zt=p(zt∣zt−1) generateandwt(i)=wt−1(i)⋅p(xt∣zt(i)) test”
下一章传送门:白板推导系列笔记(十七)-条件随机场