1.给定HMM模型 λ = ( A , B , π ) \lambda = (A,B,\pi) λ=(A,B,π)和观测序列 O = ( o 1 , o 2 , ⋅ ⋅ ⋅ , o T ) O = (o_{1},o_{2}, \cdot \cdot \cdot , o_{T}) O=(o1,o2,⋅⋅⋅,oT),概率计算问题需要计算在模型λ下观测序列O出现的概率 P ( O ∣ λ ) P(O|\lambda) P(O∣λ).
2.最直接的方法是按照概率公式直接计算:通过列举所有可能的、长度为T的状态序列 S = ( s 1 , s 2 , ⋅ ⋅ ⋅ , s T ) S = (s_{1},s_{2}, \cdot \cdot \cdot , s_{T}) S=(s1,s2,⋅⋅⋅,sT),求各个状态序列 I 与观测序列 O = ( o 1 , o 2 , ⋅ ⋅ ⋅ , o T ) O = (o_{1},o_{2}, \cdot \cdot \cdot , o_{T}) O=(o1,o2,⋅⋅⋅,oT)的联合概率 P ( O , S ∣ λ ) P(O,S|\lambda) P(O,S∣λ),然后对所有可能的状态序列求和,得到 P ( O ∣ λ ) P(O|\lambda) P(O∣λ).
(1)状态序列 S = ( s 1 , s 2 , ⋅ ⋅ ⋅ , s T ) S = (s_{1},s_{2}, \cdot \cdot \cdot , s_{T}) S=(s1,s2,⋅⋅⋅,sT)的概率为:
P ( S ∣ λ ) = π s 1 , a s 1 s 2 , π s 2 , a s 2 s 3 , ⋅ ⋅ ⋅ , a s T − 1 s T − − − − ( 10.10 ) P(S|\lambda) = \pi_{s_{1}},a_{s_{1}s_{2}},\pi_{s_{2}},a_{s_{2}s_{3}},\cdot \cdot \cdot ,a_{s_{T-1}s_{T}}----(10.10) P(S∣λ)=πs1,as1s2,πs2,as2s3,⋅⋅⋅,asT−1sT−−−−(10.10)
(2)给定状态序列 S = ( s 1 , s 2 , ⋅ ⋅ ⋅ , s T ) S = (s_{1},s_{2}, \cdot \cdot \cdot , s_{T}) S=(s1,s2,⋅⋅⋅,sT),观测序列 O = ( o 1 , o 2 , ⋅ ⋅ ⋅ , o T ) O = (o_{1},o_{2}, \cdot \cdot \cdot , o_{T}) O=(o1,o2,⋅⋅⋅,oT)的条件概率为:
P ( O ∣ S , λ ) = b s 1 ( o 1 ) , b s 2 ( o 2 ) , ⋅ ⋅ ⋅ , b s T ( o T ) − − − − ( 10.11 ) P(O|S,\lambda) = b_{s_{1}}(o_{1}),b_{s_{2}}(o_{2}),\cdot \cdot \cdot ,b_{s_{T}}(o_{T})----(10.11) P(O∣S,λ)=bs1(o1),bs2(o2),⋅⋅⋅,bsT(oT)−−−−(10.11)
(3)O 和 I 同时出现的联合概率为:
P ( O , S ∣ λ ) = P ( O ∣ S , λ ) P ( S ∣ λ ) = π s 1 a s 1 s 2 a s 2 s 3 , ⋅ ⋅ ⋅ , a s T − 1 s T b s 1 ( o 1 ) , b s 2 ( o 2 ) , ⋅ ⋅ ⋅ , b s T ( o T ) − − − − ( 10.12 ) P(O,S|\lambda) = P(O|S,\lambda)P(S|\lambda) = \pi_{s_{1}}a_{s_{1}s_{2}}a_{s_{2}s_{3}},\cdot \cdot \cdot ,a_{s_{T-1}s_{T}}b_{s_{1}}(o_{1}),b_{s_{2}}(o_{2}),\cdot \cdot \cdot ,b_{s_{T}}(o_{T})----(10.12) P(O,S∣λ)=P(O∣S,λ)P(S∣λ)=πs1as1s2as2s3,⋅⋅⋅,asT−1sTbs1(o1),bs2(o2),⋅⋅⋅,bsT(oT)−−−−(10.12)
(4)对所有可能的状态序列 I 求和,得到观测序列 O 的概率:
P ( S ∣ λ ) = ∑ S P ( O ∣ S , λ ) P ( S ∣ λ ) = ∑ S P ( O , S ∣ λ ) = ∑ s 1 , s 1 , ⋅ ⋅ ⋅ , s T π s 1 a s 1 s 2 a s 2 s 3 , ⋅ ⋅ ⋅ , a s T − 1 s T b s 1 ( o 1 ) , b s 2 ( o 2 ) , ⋅ ⋅ ⋅ , b s T ( o T ) − − − − ( 10.13 ) P(S|\lambda) = \sum_{S} P(O|S,\lambda)P(S|\lambda) = \sum_{S} P(O,S|\lambda) = \sum_{s_{1},s_{1}, \cdot \cdot \cdot ,s_{T}} \pi_{s_{1}}a_{s_{1}s_{2}}a_{s_{2}s_{3}},\cdot \cdot \cdot ,a_{s_{T-1}s_{T}}b_{s_{1}}(o_{1}),b_{s_{2}}(o_{2}),\cdot \cdot \cdot ,b_{s_{T}}(o_{T})----(10.13) P(S∣λ)=S∑P(O∣S,λ)P(S∣λ)=S∑P(O,S∣λ)=s1,s1,⋅⋅⋅,sT∑πs1as1s2as2s3,⋅⋅⋅,asT−1sTbs1(o1),bs2(o2),⋅⋅⋅,bsT(oT)−−−−(10.13)
这里就是对S求积分,然后就可以把S积掉,然后得到 P ( O ∣ λ ) P(O|\lambda) P(O∣λ).
上式的算法复杂度为 O ( T ∗ Q T ) O(T*Q^{T}) O(T∗QT),太复杂,实际应用中不太可行。
上式中的时间复杂度 O ( T ∗ Q T ) O(T*Q^{T}) O(T∗QT)是怎么计算的呢?
对于式子 P ( O , S ∣ λ ) = P ( O ∣ S , λ ) P ( S ∣ λ ) = π s 1 a s 1 s 2 a s 2 s 3 , ⋅ ⋅ ⋅ , a s T − 1 s T b s 1 ( o 1 ) , b s 2 ( o 2 ) , ⋅ ⋅ ⋅ , b s T ( o T ) P(O,S|\lambda) = P(O|S,\lambda)P(S|\lambda) = \pi_{s_{1}}a_{s_{1}s_{2}}a_{s_{2}s_{3}},\cdot \cdot \cdot ,a_{s_{T-1}s_{T}}b_{s_{1}}(o_{1}),b_{s_{2}}(o_{2}),\cdot \cdot \cdot ,b_{s_{T}}(o_{T}) P(O,S∣λ)=P(O∣S,λ)P(S∣λ)=πs1as1s2as2s3,⋅⋅⋅,asT−1sTbs1(o1),bs2(o2),⋅⋅⋅,bsT(oT)本身的时间复杂度为T,然后由于需要计算 ∑ s 1 , s 1 , ⋅ ⋅ ⋅ , s T \sum_{s_{1},s_{1}, \cdot \cdot \cdot ,s_{T}} ∑s1,s1,⋅⋅⋅,sT,这里就相当于是 N T N^{T} NT,所以最终的时间复杂度就为 O ( T N T ) O(TN^{T}) O(TNT).
这里我习惯从 i 到 j 可能和公式中的 a j i a_{ji} aji 正好反过来了,看的时候可以注意一下,其实都是一个道理,只是一个符号而已。
同时借着这张图和前向算法的递推式来聊一聊时间复杂度的问题:
由于前向算法的递推公式,所以每一步的计算都用到了上一步的结果,所以使得时间复杂度大大的降低,我们来算一算前向算法的具体的时间复杂度:每一个时刻 t 都有 N 个可能的状态,每一个状态需要运用前向算法的递推公式要计算 N 2 N^2 N2次,一共有T个时刻,所以最终的时间复杂度为 O ( T N 2 ) O(TN^2) O(TN2),提前说一句,当你看完后向算法的时候你会发现,后向算法的时间复杂度也是 O ( T N 2 ) O(TN^2) O(TN2),后面还会提这个问题。
都看到这里了,也是一大堆理论的东西,没有例子终究觉得理解的不够深刻,那我建议去看一看书中的例10.2,自己算一遍,理解肯定非常深刻
(2)递推:对 t = T − 1 , T − 2 , ⋅ ⋅ ⋅ , 1 t = T-1,T-2,\cdot \cdot \cdot ,1 t=T−1,T−2,⋅⋅⋅,1
β t ( i ) = ∑ j = 1 N α t ( j ) a b i ( o t + 1 ) , i = 1 , 2 , ⋅ ⋅ ⋅ , N − − − − ( 10.20 ) \beta_{t}(i) = \sum_{j=1}^{N}\alpha_{t}(j)a_{} b_{i}(o_{t+1}), i = 1,2, \cdot \cdot \cdot , N----(10.20) βt(i)=j=1∑Nαt(j)abi(ot+1),i=1,2,⋅⋅⋅,N−−−−(10.20)
(10.20)推导
β t ( i ) − − − − ( 10.20.1 ) \beta_{t}(i)----(10.20.1) βt(i)−−−−(10.20.1)
= P ( o t + 1 , o t + 2 , ⋅ ⋅ ⋅ , o T ∣ s t = q i , λ ) − − − − ( 10.20.2 ) =P(o_{t+1},o_{t+2}, \cdot \cdot \cdot , o_{T}|s_{t} = q_{i},\lambda)----(10.20.2) =P(ot+1,ot+2,⋅⋅⋅,oT∣st=qi,λ)−−−−(10.20.2)
= ∑ j = 1 N P ( o t + 1 , o t + 2 , ⋅ ⋅ ⋅ , o T , s t + 1 = q j ∣ s t = q i , λ ) − − − − ( 10.20.3 ) =\sum_{j=1}^{N}P(o_{t+1},o_{t+2}, \cdot \cdot \cdot , o_{T},s_{t+1} = q_{j}|s_{t} = q_{i},\lambda)----(10.20.3) =∑j=1NP(ot+1,ot+2,⋅⋅⋅,oT,st+1=qj∣st=qi,λ)−−−−(10.20.3)
= ∑ j = 1 N P ( s t + 1 = q j ∣ s t = q i , λ ) P ( o t + 1 , o t + 2 , ⋅ ⋅ ⋅ , o T ∣ s t + 1 = q j , s t = q i , λ ) − − − − ( 10.20.4 ) =\sum_{j=1}^{N}P( s_{t+1}=q_{j}|s_{t}=q_{i},\lambda)P(o_{t+1},o_{t+2}, \cdot \cdot \cdot , o_{T}|s_{t+1} = q_{j},s_{t} = q_{i},\lambda)----(10.20.4) =∑j=1NP(st+1=qj∣st=qi,λ)P(ot+1,ot+2,⋅⋅⋅,oT∣st+1=qj,st=qi,λ)−−−−(10.20.4)
= ∑ j = 1 N P ( o t + 1 , o t + 2 , ⋅ ⋅ ⋅ , o T ∣ s t + 1 = q j , s t = q i , λ ) a i j − − − − ( 10.20.5 ) =\sum_{j=1}^{N}P(o_{t+1},o_{t+2}, \cdot \cdot \cdot , o_{T}|s_{t+1} = q_{j},s_{t} = q_{i},\lambda)a_{ij}----(10.20.5) =∑j=1NP(ot+1,ot+2,⋅⋅⋅,oT∣st+1=qj,st=qi,λ)aij−−−−(10.20.5)
= ∑ j = 1 N P ( o t + 1 , o t + 2 , ⋅ ⋅ ⋅ , o T ∣ s t + 1 = q j , λ ) a i j − − − − ( 10.20.6 ) =\sum_{j=1}^{N}P(o_{t+1},o_{t+2}, \cdot \cdot \cdot , o_{T}|s_{t+1} = q_{j},\lambda)a_{ij}----(10.20.6) =∑j=1NP(ot+1,ot+2,⋅⋅⋅,oT∣st+1=qj,λ)aij−−−−(10.20.6)
= ∑ j = 1 N P ( o t + 1 ∣ o t + 2 , o t + 3 , ⋅ ⋅ ⋅ , o T , s t + 1 = q j , λ ) P ( o t + 2 , o t + 3 , ⋅ ⋅ ⋅ , o T ∣ s t + 1 = q j , λ ) a i j − − − − ( 10.20.7 ) =\sum_{j=1}^{N}P(o_{t+1}|o_{t+2},o_{t+3}, \cdot \cdot \cdot , o_{T},s_{t+1} = q_{j},\lambda)P(o_{t+2},o_{t+3}, \cdot \cdot \cdot , o_{T}|s_{t+1} = q_{j},\lambda)a_{ij}----(10.20.7) =∑j=1NP(ot+1∣ot+2,ot+3,⋅⋅⋅,oT,st+1=qj,λ)P(ot+2,ot+3,⋅⋅⋅,oT∣st+1=qj,λ)aij−−−−(10.20.7)
= ∑ j = 1 N P ( o t + 1 ∣ s t + 1 = q j , λ ) P ( o t + 2 , o t + 3 , ⋅ ⋅ ⋅ , o T ∣ s t + 1 = q j , λ ) a i j − − − − ( 10.20.8 ) =\sum_{j=1}^{N}P(o_{t+1}|s_{t+1} = q_{j},\lambda)P(o_{t+2},o_{t+3}, \cdot \cdot \cdot , o_{T}|s_{t+1} = q_{j},\lambda)a_{ij}----(10.20.8) =∑j=1NP(ot+1∣st+1=qj,λ)P(ot+2,ot+3,⋅⋅⋅,oT∣st+1=qj,λ)aij−−−−(10.20.8)
= ∑ j = 1 N b j ( o t + 1 ) β t + 1 ( j ) a i j − − − − ( 10.20.9 ) =\sum_{j=1}^{N}b_{j}(o_{t+1})\beta_{t+1}(j)a_{ij}----(10.20.9) =∑j=1Nbj(ot+1)βt+1(j)aij−−−−(10.20.9)
这个用的是 β t ( i ) \beta_{t}(i) βt(i)和 β t + 1 ( j ) \beta_{t+1}(j) βt+1(j)的递推,与公式保持一致
(10.20.5)-(10.20.6)用到的是齐次马尔科夫性假设,(10.20.6)-(10.20.7)用到的是观测独立性假设。
(3)终止:
P ( O ∣ λ ) = ∑ i = 1 Q π i b i ( o 1 ) β 1 ( i ) − − − − ( 10.21 ) P(O|λ) = \sum_{i=1}^{Q}\pi_{i}b_{i}(o_{1})\beta_{1}(i)----(10.21) P(O∣λ)=i=1∑Qπibi(o1)β1(i)−−−−(10.21)
(10.21)推导
P ( O ∣ λ ) − − − − ( 10.21.1 ) P(O|λ)----(10.21.1) P(O∣λ)−−−−(10.21.1)
= P ( o 1 , o 2 , ⋅ ⋅ ⋅ , o T ∣ λ ) − − − − ( 10.21.2 ) = P(o_{1},o_{2},\cdot \cdot \cdot ,o_{T}|λ)----(10.21.2) =P(o1,o2,⋅⋅⋅,oT∣λ)−−−−(10.21.2)
= ∑ N i = 1 P ( o 1 , o 2 , ⋅ ⋅ ⋅ , o T , s 1 = q i ∣ λ ) − − − − ( 10.21.3 ) = \sum_{N}^{i=1}P(o_{1},o_{2},\cdot \cdot \cdot ,o_{T},s_{1} = q_{i}|λ)----(10.21.3) =∑Ni=1P(o1,o2,⋅⋅⋅,oT,s1=qi∣λ)−−−−(10.21.3)
= ∑ N i = 1 P ( o 1 ∣ o 2 , o 3 , ⋅ ⋅ ⋅ , o T , s 1 = q i , λ ) P ( o 2 , o 3 , ⋅ ⋅ ⋅ , o T , s 1 = q i ∣ λ ) − − − − ( 10.21.4 ) =\sum_{N}^{i=1}P(o_{1}|o_{2},o_{3},\cdot \cdot \cdot ,o_{T},s_{1} = q_{i},λ)P(o_{2},o_{3},\cdot \cdot \cdot ,o_{T},s_{1} = q_{i}|λ)----(10.21.4) =∑Ni=1P(o1∣o2,o3,⋅⋅⋅,oT,s1=qi,λ)P(o2,o3,⋅⋅⋅,oT,s1=qi∣λ)−−−−(10.21.4)
= ∑ N i = 1 P ( o 1 ∣ s 1 = q i , λ ) P ( o 2 , o 3 , ⋅ ⋅ ⋅ , o T , s 1 = q i ∣ λ ) − − − − ( 10.21.5 ) =\sum_{N}^{i=1}P(o_{1}|s_{1} = q_{i},λ)P(o_{2},o_{3},\cdot \cdot \cdot ,o_{T},s_{1} = q_{i}|λ)----(10.21.5) =∑Ni=1P(o1∣s1=qi,λ)P(o2,o3,⋅⋅⋅,oT,s1=qi∣λ)−−−−(10.21.5)
= ∑ N i = 1 b i ( o 1 ) P ( o 2 , o 3 , ⋅ ⋅ ⋅ , o T , s 1 = q i ∣ λ ) − − − − ( 10.21.6 ) =\sum_{N}^{i=1}b_{i}(o_{1})P(o_{2},o_{3},\cdot \cdot \cdot ,o_{T},s_{1} = q_{i}|λ)----(10.21.6) =∑Ni=1bi(o1)P(o2,o3,⋅⋅⋅,oT,s1=qi∣λ)−−−−(10.21.6)
= ∑ N i = 1 b i ( o 1 ) P ( o 2 , o 3 , ⋅ ⋅ ⋅ , o T ∣ s 1 = q i , λ ) P ( s 1 = q i ∣ λ ) − − − − ( 10.21.7 ) =\sum_{N}^{i=1}b_{i}(o_{1})P(o_{2},o_{3},\cdot \cdot \cdot ,o_{T}|s_{1} = q_{i},λ)P(s_{1}=q_{i}|\lambda)----(10.21.7) =∑Ni=1bi(o1)P(o2,o3,⋅⋅⋅,oT∣s1=qi,λ)P(s1=qi∣λ)−−−−(10.21.7)
= ∑ N i = 1 π i b i ( o 1 ) β 1 ( i ) − − − − ( 10.21.8 ) =\sum_{N}^{i=1}\pi_{i}b_{i}(o_{1})\beta_{1}(i)----(10.21.8) =∑Ni=1πibi(o1)β1(i)−−−−(10.21.8)
这里(10.21.4)-(10.21.5)用到了观测独立性假设
同样的继续画一张后向算法的图,感受一下上面公式的意思:
用这张图再来看一下后向算法的时间复杂度
现在你会发现后向算法是不是和前向算法一样,都是每次迭代都可以用到上一步的计算结果,使得计算大大的简化,这里后向算法的时间复杂度还是 O ( T N 2 ) O(TN^2) O(TN2),这里就不赘述了,可以参考前向算法。
利用前向概率和后向概率,可以得到关于单个状态和两个状态概率的计算公式。
给定模型参数 λ 和观测O,在时刻 t 处于状态 q i q_{i} qi的概率,记
γ t ( i ) = P ( s i = q i ∣ O , λ ) = P ( s i = q i , O ∣ λ ) P ( O ∣ λ ) − − − − ( 10.23 ) \gamma_{t}(i) = P(s_{i} = q_{i}|O,\lambda) = \frac{P(s_{i} = q_{i},O|\lambda)}{P(O|\lambda)}----(10.23) γt(i)=P(si=qi∣O,λ)=P(O∣λ)P(si=qi,O∣λ)−−−−(10.23)
由前向概率和后向概率定义可知:
α t ( i ) β t ( i ) = P ( s t = q i , O ∣ λ ) \alpha_{t}(i)\beta_{t}(i) = P(s_{t} = q_{i},O|\lambda) αt(i)βt(i)=P(st=qi,O∣λ)
上面这个公式是怎么推导的呢?
(1)前向概率
α t ( i ) = P ( o 1 , o 2 , ⋅ ⋅ ⋅ , o t , s t = q i ∣ λ ) \alpha_{t}(i) = P(o_{1},o_{2}, \cdot \cdot \cdot , o_{t},s_{t} = q_{i}|\lambda) αt(i)=P(o1,o2,⋅⋅⋅,ot,st=qi∣λ)
(2)后向概率
β t ( i ) = P ( o t + 1 , o t + 2 , ⋅ ⋅ ⋅ , o T ∣ s t = q i , λ ) \beta_{t}(i) = P(o_{t+1},o_{t+2}, \cdot \cdot \cdot , o_{T}|s_{t} = q_{i},\lambda) βt(i)=P(ot+1,ot+2,⋅⋅⋅,oT∣st=qi,λ)
(3)推导
α t ( i ) β t ( i ) \alpha_{t}(i)\beta_{t}(i) αt(i)βt(i)
= P ( o 1 , o 2 , ⋅ ⋅ ⋅ , o t , s t = q i ∣ λ ) P ( o t + 1 , o t + 2 , ⋅ ⋅ ⋅ , o T , s t = q i ∣ s t = q i , λ ) = P(o_{1},o_{2}, \cdot \cdot \cdot , o_{t},s_{t} = q_{i}|\lambda)P(o_{t+1},o_{t+2}, \cdot \cdot \cdot , o_{T},s_{t} = q_{i}|s_{t} = q_{i},\lambda) =P(o1,o2,⋅⋅⋅,ot,st=qi∣λ)P(ot+1,ot+2,⋅⋅⋅,oT,st=qi∣st=qi,λ)
= P ( o 1 , o 2 , ⋅ ⋅ ⋅ , o t , s t = q i ∣ λ ) P ( o t + 1 , o t + 2 , ⋅ ⋅ ⋅ , o T , s t = q i ∣ o 1 , o 2 , ⋅ ⋅ ⋅ , o t , s t = q i , λ ) = P(o_{1},o_{2}, \cdot \cdot \cdot , o_{t},s_{t} = q_{i}|\lambda)P(o_{t+1},o_{t+2}, \cdot \cdot \cdot , o_{T},s_{t} = q_{i}|o_{1},o_{2}, \cdot \cdot \cdot , o_{t},s_{t} = q_{i},\lambda) =P(o1,o2,⋅⋅⋅,ot,st=qi∣λ)P(ot+1,ot+2,⋅⋅⋅,oT,st=qi∣o1,o2,⋅⋅⋅,ot,st=qi,λ)
= P ( s t = q i , o 1 , o 2 , ⋅ ⋅ ⋅ , o T ∣ λ ) = P(s_{t} = q_{i},o_{1},o_{2}, \cdot \cdot \cdot , o_{T}|\lambda) =P(st=qi,o1,o2,⋅⋅⋅,oT∣λ)
= P ( s t = q i , O ∣ λ ) = P(s_{t} = q_{i},O|\lambda) =P(st=qi,O∣λ)
最终得到:
γ t ( i ) = α t ( i ) β t ( i ) P ( O ∣ λ ) = α t ( i ) β t ( i ) ∑ j = 1 N α t ( j ) β t ( j ) − − − − ( 10.24 ) \gamma_{t}(i) = \frac{\alpha_{t}(i)\beta_{t}(i)}{P(O|\lambda)} = \frac{\alpha_{t}(i)\beta_{t}(i)}{\sum_{j=1}^{N}\alpha_{t}(j)\beta_{t}(j)}----(10.24) γt(i)=P(O∣λ)αt(i)βt(i)=∑j=1Nαt(j)βt(j)αt(i)βt(i)−−−−(10.24)
给定模型参数 λ 和观测 O,在时刻 t 处于状态 q i q_{i} qi且在时刻 t+1 处于状态 q j q_{j} qj的概率,记
ξ t ( i , j ) = P ( s t = q i , s t + 1 = q j ∣ O , λ ) − − − − ( 10.25 ) \xi_{t}(i,j) =P(s_{t} = q_{i},s_{t+1} = q_{j}|O,\lambda)----(10.25) ξt(i,j)=P(st=qi,st+1=qj∣O,λ)−−−−(10.25)
可以通过前后向概率计算:
ξ t ( i , j ) = P ( s t = q i , s t + 1 = q j ∣ O , λ ) P ( O ∣ λ ) = P ( s t = q i , s t + 1 = q j , O ∣ λ ) ∑ i = 1 N ∑ j = 1 N P ( s t = q i , s t + 1 = q j , O ∣ λ ) \xi_{t}(i,j) = \frac{P(s_{t} = q_{i},s_{t+1} = q_{j}|O,\lambda)}{P(O|\lambda)} = \frac{P(s_{t} = q_{i},s_{t+1} = q_{j},O|\lambda)}{\sum_{i=1}^{N}\sum_{j=1}^{N}P(s_{t} = q_{i},s_{t+1} = q_{j},O|\lambda)} ξt(i,j)=P(O∣λ)P(st=qi,st+1=qj∣O,λ)=∑i=1N∑j=1NP(st=qi,st+1=qj,O∣λ)P(st=qi,st+1=qj,O∣λ)
而
P ( s t = q i , s t + 1 = q j , O ∣ λ ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) P(s_{t} = q_{i},s_{t+1} = q_{j},O|\lambda) = \alpha_{t}(i)a_{ij}b_{j}(o_{t+1})\beta_{t+1}(j) P(st=qi,st+1=qj,O∣λ)=αt(i)aijbj(ot+1)βt+1(j)
所以
ξ t ( i , j ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) ∑ i = 1 N ∑ j = 1 N α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) − − − − ( 10.26 ) \xi_{t}(i,j) = \frac{\alpha_{t}(i)a_{ij}b_{j}(o_{t+1})\beta_{t+1}(j)}{\sum_{i=1}^{N}\sum_{j=1}^{N}\alpha_{t}(i)a_{ij}b_{j}(o_{t+1})\beta_{t+1}(j)}----(10.26) ξt(i,j)=∑i=1N∑j=1Nαt(i)aijbj(ot+1)βt+1(j)αt(i)aijbj(ot+1)βt+1(j)−−−−(10.26)
将 γ t ( i ) \gamma_{t}(i) γt(i)和 ξ t ( i , j ) \xi_{t}(i,j) ξt(i,j)对各个时刻 t 求和,可以得到一些有用的期望值。
(1)在观测 O 下状态 i 出现的期望值:
∑ t = 1 T γ t ( i ) − − − − ( 10.27 ) \sum_{t=1}^{T}\gamma_{t}(i)----(10.27) t=1∑Tγt(i)−−−−(10.27)
(2)在观测 O 下状态 i 转移的期望值:
∑ t = 1 T − 1 γ t ( i ) − − − − ( 10.28 ) \sum_{t=1}^{T-1}\gamma_{t}(i)----(10.28) t=1∑T−1γt(i)−−−−(10.28)
(3)在观测 O 下由状态 i 转移到状态 j 的期望值:
∑ t = 1 T − 1 ξ t ( i , j ) − − − − ( 10.29 ) \sum_{t=1}^{T-1}\xi_{t}(i,j)----(10.29) t=1∑T−1ξt(i,j)−−−−(10.29)
以下是HMM系列文章的参考文献:
感谢以上作者对本文的贡献,如有侵权联系后删除相应内容。