下面介绍HMM预测的两种算法:近似算法与维特比算法(VIterbi algorithm)。
近似算法思想:在每个时刻 t 选择在该时刻最有可能出现的状态 i t ∗ i_{t}^* it∗,从而得到一个状态序列 S = ( s 1 ∗ , s 2 ∗ , ⋅ ⋅ ⋅ , s T ∗ ) S = (s_{1}^*,s_{2}^*,\cdot \cdot \cdot,s_{T}^*) S=(s1∗,s2∗,⋅⋅⋅,sT∗),将它作为预测的结果。具体算法如下:
给定隐马尔可夫模型 λ \lambda λ 和观测序列 O,在时刻 t 处于状态 q i q_{i} qi 的概率 γ t ( i ) \gamma_{t}(i) γt(i) 是
γ t ( i ) = α t ( i ) β t ( i ) P ( O ∣ λ ) = α t ( i ) β t ( i ) ∑ j = 1 N α t ( j ) β t ( j ) − − − − ( 10.42 ) \gamma_{t}(i) = \frac{\alpha_{t}(i)\beta_{t}(i)}{P(O|\lambda)} = \frac{\alpha_{t}(i)\beta_{t}(i)}{\sum_{j=1}^{N}\alpha_{t}(j)\beta_{t}(j)}----(10.42) γt(i)=P(O∣λ)αt(i)βt(i)=∑j=1Nαt(j)βt(j)αt(i)βt(i)−−−−(10.42)
每一个时刻 t 最有可能的状态 s t ∗ s_{t}^* st∗ 是
s t ∗ = a r g m a x 1 ⩽ i ⩽ N [ γ t ( i ) ] , t = 1 , 2 , ⋅ ⋅ ⋅ , T − − − − ( 10.43 ) s_{t}^* = argmax_{1 \leqslant i \leqslant N} \left[ \gamma_{t}(i) \right], t = 1,2, \cdot \cdot \cdot ,T----(10.43) st∗=argmax1⩽i⩽N[γt(i)],t=1,2,⋅⋅⋅,T−−−−(10.43)
从而得到状态序列 S = ( s 1 ∗ , s 2 ∗ , ⋅ ⋅ ⋅ , s T ∗ ) S = (s_{1}^*,s_{2}^*,\cdot \cdot \cdot,s_{T}^*) S=(s1∗,s2∗,⋅⋅⋅,sT∗).
近似算法的优点是计算简单,其缺点是不能保证预测的状态序列整体是最有可能的状态序列,因为预测的序列可能有实际不发生的部分,也即可能存在状态转移概率 a i ∗ j ∗ = 0 a_{i^*j^*} = 0 ai∗j∗=0的相邻状态 i ∗ i^* i∗和 j ∗ j^* j∗出现。尽管如此,近似算法仍然是有用的。
这个地方好好想一下,要是状态3和状态4之间的转移概率为0,但是状态3和状态4分别是t时刻和t+1时刻求得的最有可能的状态,那这里近似算法的出来的序列不就不合理了嘛。
维特比(Viterbi)算法实际是用动态规划解隐马尔可夫模型预测问题,即用动态规划求概率最大路径,这时一条路径对应着一个状态序列。具体算法如下:
首先定义一个路径最大概率 δ \delta δ,所以定义在 t 时刻状态为 i 的所有单个路径中概率的最大值为:
δ t ( i ) = m a x s 1 , s 2 , ⋅ ⋅ ⋅ , s t − 1 P ( s t = q i , s t − 1 , ⋅ ⋅ ⋅ , s 1 , o t , ⋅ ⋅ ⋅ , o 1 ∣ λ ) , i = 1 , 2 , ⋅ ⋅ ⋅ , N − − − − ( 10.44 ) \delta_{t}(i) = \underset{s_{1},s_{2}, \cdot \cdot \cdot ,s_{t-1}}{max}P(s_{t} = q_{i},s_{t-1},\cdot \cdot \cdot ,s_{1},o_{t},\cdot \cdot \cdot ,o_{1}|\lambda), i = 1,2,\cdot \cdot \cdot ,N----(10.44) δt(i)=s1,s2,⋅⋅⋅,st−1maxP(st=qi,st−1,⋅⋅⋅,s1,ot,⋅⋅⋅,o1∣λ),i=1,2,⋅⋅⋅,N−−−−(10.44)
其实上面的话和公式可以用下面这张图很容易解释,就是说现在 t 时刻在状态 q T q_{T} qT,那么我就要求 1~t-1 时刻中可以到达 t 时刻的状态 q T q_{T} qT的所有路径中该路最大的那条路径。
在得到了 δ t ( i ) \delta_{t}(i) δt(i)之后,由于我们最终要求解这条路径,所以要想到用递推的方法去找 δ t + 1 ( i ) \delta_{t+1}(i) δt+1(i)和 δ t ( j ) \delta_{t}(j) δt(j)之间的关系,推导得到关系如下:
δ t + 1 ( i ) = m a x 1 ⩽ j ⩽ N [ δ t ( j ) a j i ] b o t + 1 ( i ) , i = 1 , 2 , ⋅ ⋅ ⋅ , N ; t = 1 , 2 , ⋅ ⋅ ⋅ , T − 1 − − − − ( 10.45 ) \delta_{t+1}(i) = \underset{1 \leqslant j \leqslant N}{max} \left [\delta_{t}(j)a_{ji} \right ]b_{o_{t+1}}(i),i = 1,2,\cdot \cdot \cdot ,N; t = 1,2,\cdot \cdot \cdot ,T-1----(10.45) δt+1(i)=1⩽j⩽Nmax[δt(j)aji]bot+1(i),i=1,2,⋅⋅⋅,N;t=1,2,⋅⋅⋅,T−1−−−−(10.45)
以下是(10.45)的详细公式推导:
δ t + 1 ( i ) = m a x s 1 , s 2 , ⋅ ⋅ ⋅ , s t P ( s t + 1 = q i , s t , ⋅ ⋅ ⋅ , s 1 , o t + 1 , ⋅ ⋅ ⋅ , o 1 ∣ λ ) \delta_{t+1}(i) = \underset{s_{1},s_{2}, \cdot \cdot \cdot ,s_{t}}{max}P(s_{t+1} = q_{i},s_{t},\cdot \cdot \cdot ,s_{1},o_{t+1},\cdot \cdot \cdot ,o_{1}|\lambda) δt+1(i)=s1,s2,⋅⋅⋅,stmaxP(st+1=qi,st,⋅⋅⋅,s1,ot+1,⋅⋅⋅,o1∣λ)
= m a x s 1 , s 2 , ⋅ ⋅ ⋅ , s t ∑ j = 1 N P ( s t + 1 = q i , s t = q j , ⋅ ⋅ ⋅ , s 1 , o t + 1 , ⋅ ⋅ ⋅ , o 1 ∣ λ ) = \underset{s_{1},s_{2}, \cdot \cdot \cdot ,s_{t}}{max}\sum_{j=1}^{N}P(s_{t+1} = q_{i},s_{t}=q_{j},\cdot \cdot \cdot ,s_{1},o_{t+1},\cdot \cdot \cdot ,o_{1}|\lambda) =s1,s2,⋅⋅⋅,stmax∑j=1NP(st+1=qi,st=qj,⋅⋅⋅,s1,ot+1,⋅⋅⋅,o1∣λ)
= m a x s 1 , s 2 , ⋅ ⋅ ⋅ , s t ∑ j = 1 N P ( s t + 1 = q i , s t = q j , ⋅ ⋅ ⋅ , s 1 , o t , ⋅ ⋅ ⋅ , o 1 ∣ λ ) P ( o t + 1 ∣ s t + 1 = q i , s t = q j , ⋅ ⋅ ⋅ , s 1 , o t , ⋅ ⋅ ⋅ , o 1 , λ ) = \underset{s_{1},s_{2}, \cdot \cdot \cdot ,s_{t}}{max}\sum_{j=1}^{N}P(s_{t+1} = q_{i},s_{t}=q_{j},\cdot \cdot \cdot ,s_{1},o_{t},\cdot \cdot \cdot ,o_{1}|\lambda)P(o_{t+1}|s_{t+1} = q_{i},s_{t}=q_{j},\cdot \cdot \cdot ,s_{1},o_{t},\cdot \cdot \cdot ,o_{1},\lambda) =s1,s2,⋅⋅⋅,stmax∑j=1NP(st+1=qi,st=qj,⋅⋅⋅,s1,ot,⋅⋅⋅,o1∣λ)P(ot+1∣st+1=qi,st=qj,⋅⋅⋅,s1,ot,⋅⋅⋅,o1,λ)
= m a x s 1 , s 2 , ⋅ ⋅ ⋅ , s t ∑ j = 1 N P ( s t + 1 = q i , s t = q j , ⋅ ⋅ ⋅ , s 1 , o t , ⋅ ⋅ ⋅ , o 1 ∣ λ ) P ( o t + 1 ∣ s t + 1 = q i , λ ) = \underset{s_{1},s_{2}, \cdot \cdot \cdot ,s_{t}}{max}\sum_{j=1}^{N}P(s_{t+1} = q_{i},s_{t}=q_{j},\cdot \cdot \cdot ,s_{1},o_{t},\cdot \cdot \cdot ,o_{1}|\lambda)P(o_{t+1}|s_{t+1} = q_{i},\lambda) =s1,s2,⋅⋅⋅,stmax∑j=1NP(st+1=qi,st=qj,⋅⋅⋅,s1,ot,⋅⋅⋅,o1∣λ)P(ot+1∣st+1=qi,λ)
= m a x s 1 , s 2 , ⋅ ⋅ ⋅ , s t ∑ j = 1 N P ( s t = q j , ⋅ ⋅ ⋅ , s 1 , o t , ⋅ ⋅ ⋅ , o 1 ∣ λ ) P ( s t + 1 = q i ∣ s t = q j , ⋅ ⋅ ⋅ , s 1 , o t , ⋅ ⋅ ⋅ , o 1 , λ ) P ( o t + 1 ∣ s t + 1 = q i , λ ) = \underset{s_{1},s_{2}, \cdot \cdot \cdot ,s_{t}}{max}\sum_{j=1}^{N}P(s_{t}=q_{j},\cdot \cdot \cdot ,s_{1},o_{t},\cdot \cdot \cdot ,o_{1}|\lambda)P(s_{t+1} = q_{i}|s_{t}=q_{j},\cdot \cdot \cdot ,s_{1},o_{t},\cdot \cdot \cdot ,o_{1},\lambda)P(o_{t+1}|s_{t+1} = q_{i},\lambda) =s1,s2,⋅⋅⋅,stmax∑j=1NP(st=qj,⋅⋅⋅,s1,ot,⋅⋅⋅,o1∣λ)P(st+1=qi∣st=qj,⋅⋅⋅,s1,ot,⋅⋅⋅,o1,λ)P(ot+1∣st+1=qi,λ)
= m a x s 1 , s 2 , ⋅ ⋅ ⋅ , s t ∑ j = 1 N P ( s t = q j , ⋅ ⋅ ⋅ , s 1 , o t , ⋅ ⋅ ⋅ , o 1 ∣ λ ) P ( s t + 1 = q i ∣ s t = q j , λ ) P ( o t + 1 ∣ s t + 1 = q i , λ ) = \underset{s_{1},s_{2}, \cdot \cdot \cdot ,s_{t}}{max}\sum_{j=1}^{N}P(s_{t}=q_{j},\cdot \cdot \cdot ,s_{1},o_{t},\cdot \cdot \cdot ,o_{1}|\lambda)P(s_{t+1} = q_{i}|s_{t}=q_{j},\lambda)P(o_{t+1}|s_{t+1} = q_{i},\lambda) =s1,s2,⋅⋅⋅,stmax∑j=1NP(st=qj,⋅⋅⋅,s1,ot,⋅⋅⋅,o1∣λ)P(st+1=qi∣st=qj,λ)P(ot+1∣st+1=qi,λ)
= m a x s t = q j ∑ j = 1 N δ t ( j ) a j i b o t + 1 ( i ) = \underset{s_{t}=q_{j}}{max}\sum_{j=1}^{N}\delta_{t}(j)a_{ji}b_{o_{t+1}}(i) =st=qjmax∑j=1Nδt(j)ajibot+1(i)
= m a x 1 ⩽ j ⩽ N [ δ t ( j ) a j i ] b o t + 1 ( i ) = \underset{1 \leqslant j \leqslant N}{max} \left [\delta_{t}(j)a_{ji} \right ]b_{o_{t+1}}(i) =1⩽j⩽Nmax[δt(j)aji]bot+1(i)
但是你会发现有一个问题就是我们的 δ t + 1 ( i ) \delta_{t+1}(i) δt+1(i)只是一个概率值,并没有储存这个最大概率的路径,所以我们要再定义一个变量 ψ \psi ψ 用来存储这条概率最大的路径:
ψ t ( i ) = a r g m a x 1 ⩽ j ⩽ N [ δ t − 1 ( j ) a j i ] \psi_{t}(i) = arg \underset{1 \leqslant j \leqslant N}{max} \left [\delta_{t-1}(j)a_{ji} \right ] ψt(i)=arg1⩽j⩽Nmax[δt−1(j)aji]
这里由于 b o t + 1 ( i ) b_{o_{t+1}}(i) bot+1(i)在第 t 个时刻是一个定值,所以在这个式子中是记录概率最大路径的前一个状态 j并把这个值赋值给 ψ t ( i ) \psi_{t}(i) ψt(i)
算法10.5(维特比算法)
输入:模型 λ = ( A , B , π ) \lambda = (A,B,\pi) λ=(A,B,π)和观测 O = ( o 1 , o 2 , ⋅ ⋅ ⋅ , o T ) O = (o_{1},o_{2},\cdot \cdot \cdot ,o_{T}) O=(o1,o2,⋅⋅⋅,oT);
输出:最优路径 I ∗ = ( s 1 ∗ , s 2 ∗ , ⋅ ⋅ ⋅ , s T ∗ ) I^* = (s_{1}^*,s_{2}^*,\cdot \cdot \cdot ,s_{T}^*) I∗=(s1∗,s2∗,⋅⋅⋅,sT∗).
以下是HMM系列文章的参考文献:
感谢以上作者对本文的贡献,如有侵权联系后删除相应内容。