【10.1算法理论部分(4)预测问题(近似算法和维特比算法)】Hidden Markov Algorithm——李航《统计学习方法》公式推导

10.4预测算法(解决Decoding: S ^ = a r g m a x S P ( S ∣ O , λ ) \hat{S} = argmax_{S}P(S|O,\lambda) S^=argmaxSP(SO,λ)

下面介绍HMM预测的两种算法:近似算法与维特比算法(VIterbi algorithm)。

10.4.1 近似算法

近似算法思想:在每个时刻 t 选择在该时刻最有可能出现的状态 i t ∗ i_{t}^* it,从而得到一个状态序列 S = ( s 1 ∗ , s 2 ∗ , ⋅ ⋅ ⋅ , s T ∗ ) S = (s_{1}^*,s_{2}^*,\cdot \cdot \cdot,s_{T}^*) S=(s1,s2,,sT),将它作为预测的结果。具体算法如下:
给定隐马尔可夫模型 λ \lambda λ 和观测序列 O,在时刻 t 处于状态 q i q_{i} qi 的概率 γ t ( i ) \gamma_{t}(i) γt(i)
γ t ( i ) = α t ( i ) β t ( i ) P ( O ∣ λ ) = α t ( i ) β t ( i ) ∑ j = 1 N α t ( j ) β t ( j ) − − − − ( 10.42 ) \gamma_{t}(i) = \frac{\alpha_{t}(i)\beta_{t}(i)}{P(O|\lambda)} = \frac{\alpha_{t}(i)\beta_{t}(i)}{\sum_{j=1}^{N}\alpha_{t}(j)\beta_{t}(j)}----(10.42) γt(i)=P(Oλ)αt(i)βt(i)=j=1Nαt(j)βt(j)αt(i)βt(i)(10.42)
每一个时刻 t 最有可能的状态 s t ∗ s_{t}^* st
s t ∗ = a r g m a x 1 ⩽ i ⩽ N [ γ t ( i ) ] , t = 1 , 2 , ⋅ ⋅ ⋅ , T − − − − ( 10.43 ) s_{t}^* = argmax_{1 \leqslant i \leqslant N} \left[ \gamma_{t}(i) \right], t = 1,2, \cdot \cdot \cdot ,T----(10.43) st=argmax1iN[γt(i)],t=1,2,,T(10.43)
从而得到状态序列 S = ( s 1 ∗ , s 2 ∗ , ⋅ ⋅ ⋅ , s T ∗ ) S = (s_{1}^*,s_{2}^*,\cdot \cdot \cdot,s_{T}^*) S=(s1,s2,,sT).

近似算法的优点是计算简单,其缺点是不能保证预测的状态序列整体是最有可能的状态序列,因为预测的序列可能有实际不发生的部分,也即可能存在状态转移概率 a i ∗ j ∗ = 0 a_{i^*j^*} = 0 aij=0的相邻状态 i ∗ i^* i j ∗ j^* j出现。尽管如此,近似算法仍然是有用的。
这个地方好好想一下,要是状态3和状态4之间的转移概率为0,但是状态3和状态4分别是t时刻和t+1时刻求得的最有可能的状态,那这里近似算法的出来的序列不就不合理了嘛。

10.4.2 维特比算法(VIterbi algorithm)

维特比(Viterbi)算法实际是用动态规划解隐马尔可夫模型预测问题,即用动态规划求概率最大路径,这时一条路径对应着一个状态序列。具体算法如下:
首先定义一个路径最大概率 δ \delta δ,所以定义在 t 时刻状态为 i 的所有单个路径中概率的最大值为:
δ t ( i ) = m a x s 1 , s 2 , ⋅ ⋅ ⋅ , s t − 1 P ( s t = q i , s t − 1 , ⋅ ⋅ ⋅ , s 1 , o t , ⋅ ⋅ ⋅ , o 1 ∣ λ ) , i = 1 , 2 , ⋅ ⋅ ⋅ , N − − − − ( 10.44 ) \delta_{t}(i) = \underset{s_{1},s_{2}, \cdot \cdot \cdot ,s_{t-1}}{max}P(s_{t} = q_{i},s_{t-1},\cdot \cdot \cdot ,s_{1},o_{t},\cdot \cdot \cdot ,o_{1}|\lambda), i = 1,2,\cdot \cdot \cdot ,N----(10.44) δt(i)=s1,s2,⋅⋅⋅,st1maxP(st=qi,st1,,s1,ot,,o1λ),i=1,2,,N(10.44)
其实上面的话和公式可以用下面这张图很容易解释,就是说现在 t 时刻在状态 q T q_{T} qT,那么我就要求 1~t-1 时刻中可以到达 t 时刻的状态 q T q_{T} qT的所有路径中该路最大的那条路径。

【10.1算法理论部分(4)预测问题(近似算法和维特比算法)】Hidden Markov Algorithm——李航《统计学习方法》公式推导_第1张图片

在得到了 δ t ( i ) \delta_{t}(i) δt(i)之后,由于我们最终要求解这条路径,所以要想到用递推的方法去找 δ t + 1 ( i ) \delta_{t+1}(i) δt+1(i) δ t ( j ) \delta_{t}(j) δt(j)之间的关系,推导得到关系如下:
δ t + 1 ( i ) = m a x 1 ⩽ j ⩽ N [ δ t ( j ) a j i ] b o t + 1 ( i ) , i = 1 , 2 , ⋅ ⋅ ⋅ , N ; t = 1 , 2 , ⋅ ⋅ ⋅ , T − 1 − − − − ( 10.45 ) \delta_{t+1}(i) = \underset{1 \leqslant j \leqslant N}{max} \left [\delta_{t}(j)a_{ji} \right ]b_{o_{t+1}}(i),i = 1,2,\cdot \cdot \cdot ,N; t = 1,2,\cdot \cdot \cdot ,T-1----(10.45) δt+1(i)=1jNmax[δt(j)aji]bot+1(i),i=1,2,,N;t=1,2,,T1(10.45)
以下是(10.45)的详细公式推导:

δ t + 1 ( i ) = m a x s 1 , s 2 , ⋅ ⋅ ⋅ , s t P ( s t + 1 = q i , s t , ⋅ ⋅ ⋅ , s 1 , o t + 1 , ⋅ ⋅ ⋅ , o 1 ∣ λ ) \delta_{t+1}(i) = \underset{s_{1},s_{2}, \cdot \cdot \cdot ,s_{t}}{max}P(s_{t+1} = q_{i},s_{t},\cdot \cdot \cdot ,s_{1},o_{t+1},\cdot \cdot \cdot ,o_{1}|\lambda) δt+1(i)=s1,s2,⋅⋅⋅,stmaxP(st+1=qi,st,,s1,ot+1,,o1λ)
= m a x s 1 , s 2 , ⋅ ⋅ ⋅ , s t ∑ j = 1 N P ( s t + 1 = q i , s t = q j , ⋅ ⋅ ⋅ , s 1 , o t + 1 , ⋅ ⋅ ⋅ , o 1 ∣ λ ) = \underset{s_{1},s_{2}, \cdot \cdot \cdot ,s_{t}}{max}\sum_{j=1}^{N}P(s_{t+1} = q_{i},s_{t}=q_{j},\cdot \cdot \cdot ,s_{1},o_{t+1},\cdot \cdot \cdot ,o_{1}|\lambda) =s1,s2,⋅⋅⋅,stmaxj=1NP(st+1=qi,st=qj,,s1,ot+1,,o1λ)
= m a x s 1 , s 2 , ⋅ ⋅ ⋅ , s t ∑ j = 1 N P ( s t + 1 = q i , s t = q j , ⋅ ⋅ ⋅ , s 1 , o t , ⋅ ⋅ ⋅ , o 1 ∣ λ ) P ( o t + 1 ∣ s t + 1 = q i , s t = q j , ⋅ ⋅ ⋅ , s 1 , o t , ⋅ ⋅ ⋅ , o 1 , λ ) = \underset{s_{1},s_{2}, \cdot \cdot \cdot ,s_{t}}{max}\sum_{j=1}^{N}P(s_{t+1} = q_{i},s_{t}=q_{j},\cdot \cdot \cdot ,s_{1},o_{t},\cdot \cdot \cdot ,o_{1}|\lambda)P(o_{t+1}|s_{t+1} = q_{i},s_{t}=q_{j},\cdot \cdot \cdot ,s_{1},o_{t},\cdot \cdot \cdot ,o_{1},\lambda) =s1,s2,⋅⋅⋅,stmaxj=1NP(st+1=qi,st=qj,,s1,ot,,o1λ)P(ot+1st+1=qi,st=qj,,s1,ot,,o1,λ)
= m a x s 1 , s 2 , ⋅ ⋅ ⋅ , s t ∑ j = 1 N P ( s t + 1 = q i , s t = q j , ⋅ ⋅ ⋅ , s 1 , o t , ⋅ ⋅ ⋅ , o 1 ∣ λ ) P ( o t + 1 ∣ s t + 1 = q i , λ ) = \underset{s_{1},s_{2}, \cdot \cdot \cdot ,s_{t}}{max}\sum_{j=1}^{N}P(s_{t+1} = q_{i},s_{t}=q_{j},\cdot \cdot \cdot ,s_{1},o_{t},\cdot \cdot \cdot ,o_{1}|\lambda)P(o_{t+1}|s_{t+1} = q_{i},\lambda) =s1,s2,⋅⋅⋅,stmaxj=1NP(st+1=qi,st=qj,,s1,ot,,o1λ)P(ot+1st+1=qi,λ)
= m a x s 1 , s 2 , ⋅ ⋅ ⋅ , s t ∑ j = 1 N P ( s t = q j , ⋅ ⋅ ⋅ , s 1 , o t , ⋅ ⋅ ⋅ , o 1 ∣ λ ) P ( s t + 1 = q i ∣ s t = q j , ⋅ ⋅ ⋅ , s 1 , o t , ⋅ ⋅ ⋅ , o 1 , λ ) P ( o t + 1 ∣ s t + 1 = q i , λ ) = \underset{s_{1},s_{2}, \cdot \cdot \cdot ,s_{t}}{max}\sum_{j=1}^{N}P(s_{t}=q_{j},\cdot \cdot \cdot ,s_{1},o_{t},\cdot \cdot \cdot ,o_{1}|\lambda)P(s_{t+1} = q_{i}|s_{t}=q_{j},\cdot \cdot \cdot ,s_{1},o_{t},\cdot \cdot \cdot ,o_{1},\lambda)P(o_{t+1}|s_{t+1} = q_{i},\lambda) =s1,s2,⋅⋅⋅,stmaxj=1NP(st=qj,,s1,ot,,o1λ)P(st+1=qist=qj,,s1,ot,,o1,λ)P(ot+1st+1=qi,λ)
= m a x s 1 , s 2 , ⋅ ⋅ ⋅ , s t ∑ j = 1 N P ( s t = q j , ⋅ ⋅ ⋅ , s 1 , o t , ⋅ ⋅ ⋅ , o 1 ∣ λ ) P ( s t + 1 = q i ∣ s t = q j , λ ) P ( o t + 1 ∣ s t + 1 = q i , λ ) = \underset{s_{1},s_{2}, \cdot \cdot \cdot ,s_{t}}{max}\sum_{j=1}^{N}P(s_{t}=q_{j},\cdot \cdot \cdot ,s_{1},o_{t},\cdot \cdot \cdot ,o_{1}|\lambda)P(s_{t+1} = q_{i}|s_{t}=q_{j},\lambda)P(o_{t+1}|s_{t+1} = q_{i},\lambda) =s1,s2,⋅⋅⋅,stmaxj=1NP(st=qj,,s1,ot,,o1λ)P(st+1=qist=qj,λ)P(ot+1st+1=qi,λ)
= m a x s t = q j ∑ j = 1 N δ t ( j ) a j i b o t + 1 ( i ) = \underset{s_{t}=q_{j}}{max}\sum_{j=1}^{N}\delta_{t}(j)a_{ji}b_{o_{t+1}}(i) =st=qjmaxj=1Nδt(j)ajibot+1(i)
= m a x 1 ⩽ j ⩽ N [ δ t ( j ) a j i ] b o t + 1 ( i ) = \underset{1 \leqslant j \leqslant N}{max} \left [\delta_{t}(j)a_{ji} \right ]b_{o_{t+1}}(i) =1jNmax[δt(j)aji]bot+1(i)

但是你会发现有一个问题就是我们的 δ t + 1 ( i ) \delta_{t+1}(i) δt+1(i)只是一个概率值,并没有储存这个最大概率的路径,所以我们要再定义一个变量 ψ \psi ψ 用来存储这条概率最大的路径:
ψ t ( i ) = a r g m a x 1 ⩽ j ⩽ N [ δ t − 1 ( j ) a j i ] \psi_{t}(i) = arg \underset{1 \leqslant j \leqslant N}{max} \left [\delta_{t-1}(j)a_{ji} \right ] ψt(i)=arg1jNmax[δt1(j)aji]
这里由于 b o t + 1 ( i ) b_{o_{t+1}}(i) bot+1(i)在第 t 个时刻是一个定值,所以在这个式子中是记录概率最大路径的前一个状态 j并把这个值赋值给 ψ t ( i ) \psi_{t}(i) ψt(i)

算法10.5(维特比算法)
输入:模型 λ = ( A , B , π ) \lambda = (A,B,\pi) λ=(A,B,π)和观测 O = ( o 1 , o 2 , ⋅ ⋅ ⋅ , o T ) O = (o_{1},o_{2},\cdot \cdot \cdot ,o_{T}) O=(o1,o2,,oT)
输出:最优路径 I ∗ = ( s 1 ∗ , s 2 ∗ , ⋅ ⋅ ⋅ , s T ∗ ) I^* = (s_{1}^*,s_{2}^*,\cdot \cdot \cdot ,s_{T}^*) I=(s1,s2,,sT).

  1. 初始化
    δ 1 ( i ) = π i b s 1 ( o 1 ) , i = 1 , 2 , ⋅ ⋅ ⋅ , N \delta_{1}(i) = \pi_{i}b_{s1}(o_{1}),i = 1,2,\cdot \cdot \cdot ,N δ1(i)=πibs1(o1),i=1,2,,N
    ψ 1 ( i ) = 0 , i = 1 , 2 , ⋅ ⋅ ⋅ , N \psi_{1}(i) = 0,i = 1,2,\cdot \cdot \cdot ,N ψ1(i)=0,i=1,2,,N
  2. 递推。对 t = 2 , 3 , ⋅ ⋅ ⋅ , T t = 2,3,\cdot \cdot \cdot ,T t=2,3,,T
    δ t ( i ) = m a x 1 ⩽ j ⩽ N [ δ t − 1 ( j ) a j i ] b o t + 1 ( i ) , i = 1 , 2 , ⋅ ⋅ ⋅ , N \delta_{t}(i) = \underset{1 \leqslant j \leqslant N}{max} \left [\delta_{t-1}(j)a_{ji} \right ]b_{o_{t+1}}(i),i = 1,2,\cdot \cdot \cdot ,N δt(i)=1jNmax[δt1(j)aji]bot+1(i),i=1,2,,N
    ψ t ( i ) = a r g m a x 1 ⩽ j ⩽ N [ δ t − 1 ( j ) a j i ] , i = 1 , 2 , ⋅ ⋅ ⋅ , N \psi_{t}(i) = arg \underset{1 \leqslant j \leqslant N}{max} \left [\delta_{t-1}(j)a_{ji} \right ],i = 1,2,\cdot \cdot \cdot ,N ψt(i)=arg1jNmax[δt1(j)aji],i=1,2,,N
  3. 终止
    P ∗ = m a x 1 ⩽ j ⩽ N δ T ( i ) P^* = \underset{1 \leqslant j \leqslant N}{max}\delta_{T}(i) P=1jNmaxδT(i)
    i T ∗ = a r g m a x 1 ⩽ j ⩽ N [ δ T ( i ) ] i_{T}^* = arg \underset{1 \leqslant j \leqslant N}{max} \left [ \delta_{T}(i) \right ] iT=arg1jNmax[δT(i)]
  4. 最优路径回溯。对 t = T − 1 , T − 2 , ⋅ ⋅ ⋅ , 1 t = T-1,T-2,\cdot \cdot \cdot ,1 t=T1,T2,,1
    i t ∗ = ψ t + 1 ( i t + 1 ∗ ) i_{t}^* = \psi_{t+1}(i_{t+1}^*) it=ψt+1(it+1)
    最终求得最优路径 I ∗ = ( s 1 ∗ , s 2 ∗ , ⋅ ⋅ ⋅ , s T ∗ ) I^* = (s_{1}^*,s_{2}^*,\cdot \cdot \cdot ,s_{T}^*) I=(s1,s2,,sT).
    又讲了一大堆的理论,快看看书上例10.3,可以帮助你快速消化这些理论

参考文献

以下是HMM系列文章的参考文献:

  1. 李航——《统计学习方法》
  2. YouTube——shuhuai008的视频课程HMM
  3. YouTube——徐亦达机器学习HMM、EM
  4. *[https://www.huaxiaozhuan.com/%E7%BB%9F%E8%AE%A1%E5%AD%A6%E4%B9%A0/chapters/15_HMM.html]:隐马尔可夫模型
  5. [https://sm1les.com/2019/04/10/hidden-markov-model/]:隐马尔可夫模型(HMM)及其三个基本问题
  6. [https://www.cnblogs.com/skyme/p/4651331.html]:一文搞懂HMM(隐马尔可夫模型)
  7. [https://www.zhihu.com/question/55974064]:南屏晚钟的解答

感谢以上作者对本文的贡献,如有侵权联系后删除相应内容。

你可能感兴趣的:(算法,学习方法)