【Practical】HMM中的概率推导

  • 对HMM中出现的概率公式进行推导,关于HMM可以阅读《隐马尔可夫模型》.

文章目录

  • 符号约定.
  • 前向过程.
  • 后向过程.
  • 前后向算法.
  • Viterbi Algorithm.

符号约定.

  • 状态转移概率矩阵 A A A 中的元素: A [ i , j ] = P ( z t = s j ∣ z t − 1 = s i ) A[i,j]=P(z_t=s_j|z_{t-1}=s_i) A[i,j]=P(zt=sjzt1=si)
  • 输出概率矩阵 B B B 中的元素: B [ j , k ] = P ( x t = v k ∣ z t = s j ) B[j,k]=P(x_t=v_k|z_t=s_j) B[j,k]=P(xt=vkzt=sj)公式中有些地方会以 B [ j , ∗ x t ] B[j,*x_t] B[j,xt] 表示上述概率。
  • 状态空间: S = { s 1 , s 2 , ⋯   } S=\{s_1,s_2,\cdots\} S={ s1,s2,}
  • 输出空间: V = { v 1 , v 2 , ⋯   } V=\{v_1,v_2,\cdots\} V={ v1,v2,}
  • 初始状态概率: π i = P ( z 1 = s i ) \pi_i=P(z_1=s_i) πi=P(z1=si)
  • 全体参数: θ = ( A , B , π ⃗ ) \theta=(\mathcal{A},\mathcal{B},\vec{\pi}) θ=(A,B,π )

前向过程.

  • 定义前向概率如下: α ( i , t ) = P ( x 1 , x 2 , ⋯   , x t , z t = s i ; θ ) \alpha(i,t)=P(x_1,x_2,\cdots,x_t,z_t=s_i;\theta) α(i,t)=P(x1,x2,,xt,zt=si;θ)
  • 其边界条件: α ( i , 1 ) = P ( x 1 , z 1 = s i ; θ ) = P ( x 1 ∣ z 1 = s i ; θ ) ⋅ P ( z 1 = s i ) = π i ⋅ B [ i , ∗ x 1 ] \begin{aligned}\alpha(i,1) &=P(x_1,z_1=s_i;\theta)\\ &=P(x_1|z_1=s_i;\theta)·P(z_1=s_i)\\ &=\pi_i·B[i,*x_1] \end{aligned} α(i,1)=P(x1,z1=si;θ)=P(x1z1=si;θ)P(z1=si)=πiB[i,x1]
  • 考虑递推公式: α ( j , t + 1 ) = P ( x 1 , x 2 , ⋯   , x t + 1 , z t + 1 = s j ; θ ) = ∑ i = 1 ∣ S ∣ P ( x 1 , x 2 , ⋯   , x t + 1 , z t = s i , z t + 1 , z t + 1 = s j ; θ ) = ∑ i = 1 ∣ S ∣ P ( x t + 1 ∣ x 1 , x 2 , ⋯   , x t , z t = s i , z t + 1 = s j ; θ ) ⋅ P ( z t + 1 = s j ∣ z t = s i , x 1 x 2 , ⋯   , x t ; θ ) ⋅ P ( x 1 , x 2 , ⋯   , x t , z t = s i ; θ ) = ∑ i = 1 ∣ S ∣ P ( x t + 1 ∣ z t + 1 = s j ; θ ) ⋅ P ( z t + 1 = s j ∣ z t = s i ; θ ) ⋅ P ( x 1 , x 2 , ⋯   , x t , z t = s i ; θ ) = ∑ i = 1 ∣ S ∣ B [ j , ∗ x t + 1 ] ⋅ A [ i , j ] ⋅ α ( i , t ) \begin{aligned}\alpha(j,t+1) &=P(x_1,x_2,\cdots,x_{t+1},z_{t+1}=s_j;\theta)\\ &=\sum_{i=1}^{|S|}P(x_1,x_2,\cdots,x_{t+1},z_t=s_i,z_{t+1},z_{t+1}=s_j;\theta)\\ &=\sum_{i=1}^{|S|}P(x_{t+1}|x_1,x_2,\cdots,x_t,z_t=s_i,z_{t+1}=s_j;\theta)·P(z_{t+1}=s_j|z_t=s_i,x_1x_2,\cdots,x_t;\theta)\\&·P(x_1,x_2,\cdots,x_t,z_t=s_i;\theta)\\ &=\sum_{i=1}^{|S|}P(x_{t+1}|z_{t+1}=s_j;\theta)·P(z_{t+1}=s_j|z_t=s_i;\theta)·P(x_1,x_2,\cdots,x_t,z_t=s_i;\theta)\\ &=\sum_{i=1}^{|S|}B[j,*x_{t+1}]·A[i,j]·\alpha(i,t) \end{aligned} α(j,t+1)=P(x1,x2,,xt+1,zt+1=sj;θ)=i=1SP(x1,x2,,xt+1,zt=si,zt+1,zt+1=sj;θ)=i=1SP(xt+1x1,x2,,xt,zt=si,zt+1=sj;θ)P(zt+1=sjzt=si,x1x2,,xt;θ)P(x1,x2,,xt,zt=si;θ)=i=1SP(xt+1zt+1=sj;θ)P(zt+1=sjzt=si;θ)P(x1,x2,,xt,zt=si;θ)=i=1SB[j,xt+1]A[i,j]α(i,t) α ( i , T ) = P ( x 1 , x 2 , ⋯   , x T , z T = s i ; θ ) \alpha(i,T)=P(x_1,x_2,\cdots,x_T,z_T=s_i;\theta) α(i,T)=P(x1,x2,,xT,zT=si;θ)
  • 基于上述公式得到: P ( x ⃗ ; θ ) = P ( x 1 , x 2 , ⋯   , x T ; θ ) = ∑ i = 1 ∣ S ∣ P ( x 1 , x 2 , ⋯   , x T , z T = s i ; θ ) = ∑ i = 1 ∣ S ∣ α ( i , T ) \begin{aligned}P(\vec x;\theta) &=P(x_1,x_2,\cdots,x_T;\theta)\\ &=\sum_{i=1}^{|S|}P(x_1,x_2,\cdots,x_T,z_T=s_i;\theta)\\ &=\sum_{i=1}^{|S|}\alpha(i,T) \end{aligned} P(x ;θ)=P(x1,x2,,xT;θ)=i=1SP(x1,x2,,xT,zT=si;θ)=i=1Sα(i,T)

后向过程.

  • 定义后向概率如下: β ( j , t ) = P ( x t + 1 , x t + 2 , ⋯   , x T ∣ z t = s j ; θ ) \beta(j,t)=P(x_{t+1},x_{t+2},\cdots,x_T|z_t=s_j;\theta) β(j,t)=P(xt+1,xt+2,,xTzt=sj;θ)
  • 给定递推初始值: β ( j , T ) = 1 \beta(j,T)=1 β(j,T)=1
  • 考虑递推公式: β ( i , t − 1 ) = P ( x t , x t + 1 , ⋯   , x T ∣ z t − 1 = s i ; θ ) = ∑ j = 1 ∣ S ∣ P ( x t , x t + 1 , ⋯   , x T , z t = s j ∣ z t − 1 = s i ; θ ) = ∑ j = 1 ∣ S ∣ P ( x t ∣ x t + 1 , x t + 2 , ⋯   , x T , z t = s j , z t − 1 = s i ; θ ) ⋅ P ( x t + 1 , x t + 2 , ⋯   , x T ∣ z t = s j , z t − 1 = s i ; θ ) ⋅ P ( z t = s j ∣ z t − 1 = s i ; θ ) = ∑ j = 1 ∣ S ∣ P ( x t ∣ z t = s j ; θ ) ⋅ P ( x t + 1 , x t + 2 , ⋯   , x T ∣ z t = s i ; θ ) ⋅ P ( z t = s j ∣ z t − 1 = s i ; θ ) = ∑ j = 1 ∣ S ∣ B [ j , ∗ x t ] ⋅ β ( j , t ) ⋅ A [ i , j ] \begin{aligned}\beta(i,t-1) &=P(x_t,x_{t+1},\cdots,x_T|z_{t-1}=s_i;\theta)\\ &=\sum_{j=1}^{|S|}P(x_t,x_{t+1},\cdots,x_T,z_t=s_j|z_{t-1}=s_i;\theta)\\ &=\sum_{j=1}^{|S|}P(x_t|x_{t+1},x_{t+2},\cdots,x_T,z_t=s_j,z_{t-1}=s_i;\theta)·P(x_{t+1},x_{t+2},\cdots,x_T|z_{t}=s_j,z_{t-1}=s_i;\theta)\\ &·P(z_{t}=s_j|z_{t-1}=s_i;\theta)\\ &=\sum_{j=1}^{|S|}P(x_t|z_t=s_j;\theta)·P(x_{t+1},x_{t+2},\cdots,x_T|z_t=s_i;\theta)·P(z_t=s_j|z_{t-1}=s_i;\theta)\\ &=\sum_{j=1}^{|S|}B[j,*x_t]·\beta(j,t)·A[i,j] \end{aligned} β(i,t1)=P(xt,xt+1,,xTzt1=si;θ)=j=1SP(xt,xt+1,,xT,zt=sjzt1=si;θ)=j=1SP(xtxt+1,xt+2,,xT,zt=sj,zt1=si;θ)P(xt+1,xt+2,,xTzt=sj,zt1=si;θ)P(zt=sjzt1=si;θ)=j=1SP(xtzt=sj;θ)P(xt+1,xt+2,,xTzt=si;θ)P(zt=sjzt1=si;θ)=j=1SB[j,xt]β(j,t)A[i,j] β ( i , 1 ) = P ( x 2 , x 3 , ⋯   , x T ∣ z 1 = s i ; θ ) \beta(i,1)=P(x_2,x_3,\cdots,x_T|z_1=s_i;\theta) β(i,1)=P(x2,x3,,xTz1=si;θ)
  • 基于上述公式得到: P ( x ⃗ ; θ ) = P ( x 1 , x 2 , ⋯   , x T ; θ ) = ∑ i = 1 ∣ S ∣ P ( x 1 , x 2 , ⋯   , x T , z 1 = s i ; θ ) = ∑ i = 1 ∣ S ∣ P ( x 1 ∣ x 2 , x 3 , ⋯   , x T , z 1 = s i ; θ ) ⋅ P ( x 2 , x 3 , ⋯   , x T ∣ z 1 = s i ; θ ) ⋅ P ( z 1 = s i ; θ ) = ∑ i = 1 ∣ S ∣ B [ i , ∗ x 1 ] ⋅ β ( i , 1 ) ⋅ π i \begin{aligned}P(\vec x;\theta) &=P(x_1,x_2,\cdots,x_T;\theta)\\ &=\sum_{i=1}^{|S|}P(x_1,x_2,\cdots,x_T,z_1=s_i;\theta)\\ &=\sum_{i=1}^{|S|}P(x_1|x_2,x_3,\cdots,x_T,z_1=s_i;\theta)\\ &·P(x_2,x_3,\cdots,x_T|z_1=s_i;\theta)·P(z_1=s_i;\theta)\\ &=\sum_{i=1}^{|S|}B[i,*x_1]·\beta(i,1)·\pi_i \end{aligned} P(x ;θ)=P(x1,x2,,xT;θ)=i=1SP(x1,x2,,xT,z1=si;θ)=i=1SP(x1x2,x3,,xT,z1=si;θ)P(x2,x3,,xTz1=si;θ)P(z1=si;θ)=i=1SB[i,x1]β(i,1)πi

前后向算法.

  • 基于上面给出的前向概率与后向概率,我们得到: P ( x ⃗ ; θ ) = P ( x 1 , x 2 , ⋯   , x T ; θ ) = ∑ i = 1 ∣ S ∣ ∑ j = 1 ∣ S ∣ P ( x 1 , x 2 , ⋯   , x T , z t = s i , z t + 1 = s j ; θ ) = ∑ i = 1 ∣ S ∣ ∑ j = 1 ∣ S ∣ P ( x t + 2 , x t + 3 , ⋯   , x T ∣ x 1 , x 2 , ⋯   , x t + 1 , z t = s i , z t + 1 = s j ; θ ) ⋅ P ( x t + 1 ∣ x 1 , x 2 , ⋯   , x t , z t = s i , z t + 1 = s j ; θ ) ⋅ P ( z t + 1 = s j ∣ z t = s i , x 1 , x 2 , ⋯   , x t ; θ ) ⋅ P ( x 1 , x 2 , ⋯   , x t , z t = s i ; θ ) = ∑ i = 1 ∣ S ∣ ∑ j = 1 ∣ S ∣ P ( x t + 2 , x t + 3 , ⋯   , x T ∣ z t + 1 = s j ; θ ) ⋅ P ( x t + 1 ∣ z t + 1 = s j ; θ ) ⋅ P ( z t + 1 = s j ∣ z t = s i ; θ ) ⋅ P ( x 1 , x 2 , ⋯   , x t , z t = s i ; θ ) = ∑ i = 1 ∣ S ∣ ∑ j = 1 ∣ S ∣ β ( j , t + 1 ) ⋅ B [ j , ∗ x t + 1 ] ⋅ A [ i , j ] ⋅ α ( i , t ) \begin{aligned}P(\vec x;\theta) &=P(x_1,x_2,\cdots,x_T;\theta)\\ &=\sum_{i=1}^{|S|}\sum_{j=1}^{|S|}P(x_1,x_2,\cdots,x_T,z_t=s_i,z_{t+1}=s_j;\theta)\\ &=\sum_{i=1}^{|S|}\sum_{j=1}^{|S|}P(x_{t+2},x_{t+3},\cdots,x_T|x_1,x_2,\cdots,x_{t+1},z_t=s_i,z_{t+1}=s_j;\theta)\\ &·P(x_{t+1}|x_1,x_2,\cdots,x_t,z_t=s_i,z_{t+1}=s_j;\theta)·P(z_{t+1}=s_j|z_t=s_i,x_1,x_2,\cdots,x_t;\theta)·P(x_1,x_2,\cdots,x_t,z_t=s_i;\theta)\\ &=\sum_{i=1}^{|S|}\sum_{j=1}^{|S|}P(x_{t+2},x_{t+3},\cdots,x_T|z_{t+1}=s_j;\theta)·P(x_{t+1}|z_{t+1}=s_j;\theta)·P(z_{t+1}=s_j|z_t=s_i;\theta)\\&·P(x_1,x_2,\cdots,x_t,z_t=s_i;\theta)\\ &=\sum_{i=1}^{|S|}\sum_{j=1}^{|S|}\beta(j,t+1)·B[j,*x_{t+1}]·A[i,j]·\alpha(i,t) \end{aligned} P(x ;θ)=P(x1,x2,,xT;θ)=i=1Sj=1SP(x1,x2,,xT,zt=si,zt+1=sj;θ)=i=1Sj=1SP(xt+2,xt+3,,xTx1,x2,,xt+1,zt=si,zt+1=sj;θ)P(xt+1x1,x2,,xt,zt=si,zt+1=sj;θ)P(zt+1=sjzt=si,x1,x2,,xt;θ)P(x1,x2,,xt,zt=si;θ)=i=1Sj=1SP(xt+2,xt+3,,xTzt+1=sj;θ)P(xt+1zt+1=sj;θ)P(zt+1=sjzt=si;θ)P(x1,x2,,xt,zt=si;θ)=i=1Sj=1Sβ(j,t+1)B[j,xt+1]A[i,j]α(i,t)

Viterbi Algorithm.

  • 定义: δ ( i , t ) = max ⁡ z 1 , z 2 , ⋯   , z t − 1 P ( z t = s i , z t − 1 , ⋯   , z 1 , x t , x t − 1 , ⋯   , x 1 ; θ ) \delta(i,t)=\max_{z_1,z_2,\cdots,z_{t-1}}P(z_t=s_i,z_{t-1},\cdots,z_1,x_t,x_{t-1},\cdots,x_1;\theta) δ(i,t)=z1,z2,,zt1maxP(zt=si,zt1,,z1,xt,xt1,,x1;θ)
  • 给定边界条件: δ ( i , 1 ) = P ( z 1 = s i , x 1 ; θ ) = P ( x 1 ∣ z 1 = s i ; θ ) ⋅ P ( z 1 = s i ; θ ) = B [ 1 , ∗ x 1 ] ⋅ π i \delta(i,1)=P(z_1=s_i,x_1;\theta)=P(x_1|z_1=s_i;\theta)·P(z_1=s_i;\theta)=B[1,*x_1]·\pi_i δ(i,1)=P(z1=si,x1;θ)=P(x1z1=si;θ)P(z1=si;θ)=B[1,x1]πi
  • 考虑如下递推式: δ ( j , t + 1 ) = max ⁡ z 1 , z 2 , ⋯   , z t P ( z t + 1 = s j , z t , ⋯   , z 1 , x t + 1 , x t , ⋯   , x 1 ; θ ) = max ⁡ z 1 , z 2 , ⋯   , z t P ( x t + 1 ∣ z t + 1 = s j ; θ ) ⋅ P ( z t + 1 = s j , z t , ⋯   , z 1 , x t , x t − 1 , ⋯   , x 1 ; θ ) = P ( x t + 1 ∣ z t + 1 = s j ; θ ) ⋅ max ⁡ z 1 , z 2 , ⋯   , z t P ( z t + 1 = s j , z t , ⋯   , z 1 , x t , x t − 1 , ⋯   , x 1 ; θ ) = B [ j , ∗ x t + 1 ] ⋅ max ⁡ i ∈ [ 1 , ∣ S ∣ ] [ P ( z t + 1 = s j ∣ z t = s i ; θ ) ⋅ max ⁡ z 1 , z 2 , ⋯   , z t − 1 P ( z t = s i , z t − 1 , ⋯   , z 1 , x t , x t − 1 , ⋯   , x 1 ; θ ) ] = B [ j , ∗ x t + 1 ] ⋅ max ⁡ i ∈ [ 1 , ∣ S ∣ ] [ A [ i , j ] ⋅ δ ( i , t ) ] \begin{aligned}\delta(j,t+1) &=\max_{z_1,z_2,\cdots,z_t}P(z_{t+1}=s_j,z_t,\cdots,z_1,x_{t+1},x_t,\cdots,x_1;\theta)\\ &=\max_{z_1,z_2,\cdots,z_t}P(x_{t+1}|z_{t+1}=s_j;\theta)·P(z_{t+1}=s_j,z_{t},\cdots,z_1,x_t,x_{t-1},\cdots,x_1;\theta)\\ &=P(x_{t+1}|z_{t+1}=s_j;\theta)·\max_{z1,z_2,\cdots,z_t}P(z_{t+1}=s_j,z_t,\cdots,z_1,x_t,x_{t-1},\cdots,x_1;\theta)\\ &=B[j,*x_{t+1}]·\max_{i\in[1,|S|]}\Big[P(z_{t+1}=s_j|z_t=s_i;\theta)·\max_{z_1,z_2,\cdots,z_{t-1}}P(z_{t}=s_i,z_{t-1},\cdots,z_1,x_t,x_{t-1},\cdots,x_1;\theta)\Big]\\ &=B[j,*x_{t+1}]·\max_{i\in[1,|S|]}\Big[A[i,j]·\delta(i,t)\Big] \end{aligned} δ(j,t+1)=z1,z2,,ztmaxP(zt+1=sj,zt,,z1,xt+1,xt,,x1;θ)=z1,z2,,ztmaxP(xt+1zt+1=sj;θ)P(zt+1=sj,zt,,z1,xt,xt1,,x1;θ)=P(xt+1zt+1=sj;θ)z1,z2,,ztmaxP(zt+1=sj,zt,,z1,xt,xt1,,x1;θ)=B[j,xt+1]i[1,S]max[P(zt+1=sjzt=si;θ)z1,z2,,zt1maxP(zt=si,zt1,,z1,xt,xt1,,x1;θ)]=B[j,xt+1]i[1,S]max[A[i,j]δ(i,t)]
  • t = T t=T t=T 时我们得到: δ ( i , T ) = max ⁡ z 1 , z 2 , ⋯   , z T − 1 P ( z T = s i , z T − 1 , ⋯   , z 1 , x T , x T − 1 , ⋯   , x 1 ; θ ) \delta(i,T)=\max_{z_1,z_2,\cdots,z_{T-1}}P(z_T=s_i,z_{T-1},\cdots,z_1,x_T,x_{T-1},\cdots,x_1;\theta) δ(i,T)=z1,z2,,zT1maxP(zT=si,zT1,,z1,xT,xT1,,x1;θ)因此对于给定的观测序列,概率最大的状态序列即为: max ⁡ i ∈ [ 1 , ∣ S ∣ ] δ ( i , T ) \max_{i\in[1,|S|]}\delta(i,T) i[1,S]maxδ(i,T)

你可能感兴趣的:(Practical,机器学习)