隐马尔科夫模型HMM详解(2)——python实现

目录

      • 学习算法
        • Viterbi学习算法
        • Baum-Welch学习算法
      • python实现

代码:6.1发布
HMM基本概念、概率计算、预测算法请看上一篇文章,感谢您的阅读!

学习算法

已知观测序列 O = ( o 1 , o 2 , … , o T ) , b j ( o t ) = ∑ m = 1 M c j m N ( o t ; μ j m , Σ j m ) O=\left(o_{1}, o_{2}, \ldots, o_{T}\right), b_{j}\left(o_{t}\right)=\sum_{m=1}^{M} c_{j m} \mathcal{N}\left(o_{t} ; \mu_{j m}, \Sigma_{j m}\right) O=(o1,o2,,oT),bj(ot)=m=1McjmN(ot;μjm,Σjm),估计GMM-HMM参数 λ \lambda λ,使 P ( O ∣ λ ) P(O|\lambda) P(Oλ)最大。参数 λ \lambda λ包括:

  • 初始状态概率向量 π = ( π i ) \pi = (\pi_i) π=(πi)

  • 转移概率矩阵 A = [ a i j ] N × N A=[a_{ij}]_{N\times N} A=[aij]N×N

  • 状态 j j j的GMM参数 ( c j m , μ j m , Σ j m ) (c_{jm},\mu_{jm},\Sigma_{jm}) (cjm,μjm,Σjm), j = 1 , 2 , . . . , N ; m = 1 , . . . , M j=1,2,...,N; m=1,...,M j=1,2,...,N;m=1,...,M表示GMM分量标号。

Viterbi学习算法

如果已知状态-观测对齐序列,每个观测 o t o_t ot对应一个具体的状态,状态-观测对齐序列可通过Viterbi解码算法得到,也可通过人工标注得到。知道每个观测对应的状态,则:

π i \pi_i πi可通过最大似然估计得到:

  • C ( i ) C(i) C(i)表示初始状态为 i i i 的次数
  • π ^ i = C ( i ) ∑ k C ( k ) \widehat{\pi}_{i}=\frac{C(i)}{\sum_{k} C(k)} π i=kC(k)C(i)

a i j a_{ij} aij也可通过最大似然估计得到:

  • C ( i → j ) C(i \rightarrow j) C(ij)表示从状态 i i i到状态 j j j的转移次数
  • a ^ i j = C ( i → j ) ∑ k C ( i → k ) \hat{a}_{i j}=\frac{C(i \rightarrow j)}{\sum_{k} C(i \rightarrow k)} a^ij=kC(ik)C(ij)

可得每个状态 j j j对应的观测集合 Z j = ( y 1 , y 2 , … , y N ) Z_{j}=\left(y_{1}, y_{2}, \ldots, y_{N}\right) Zj=(y1,y2,,yN),每个状态对应一个GMM,也就得到了每个GMM对应的观测集合 Z j = ( y 1 , y 2 , … , y N ) Z_{j}=\left(y_{1}, y_{2}, \ldots, y_{N}\right) Zj=(y1,y2,,yN)

问题:用viterbi算法得到对齐序列需要用到模型参数 λ \lambda λ,最初的 λ \lambda λ怎么得到?

当GMM只有一个分量 b j ( o t ) = N ( o t ; μ j , Σ j ) b_{j}\left(o_{t}\right)=\mathcal{N}\left(o_{t} ; \mu_{j}, \Sigma_{j}\right) bj(ot)=N(ot;μj,Σj) ∣ Z j ∣ |Z_j| Zj表示 Z j Z_j Zj的元素个数,则:

μ ^ j = ∑ o t ∈ Z j o t ∣ Z j ∣ \hat{\mu}_{j}=\frac{\sum_{o_{t} \in Z_{j}} o_{t}}{\left|Z_{j}\right|} μ^j=ZjotZjot
Σ ^ j = ∑ o t ∈ Z j ( o t − μ ^ j ) ( o t − μ ^ j ) T ∣ Z j ∣ \hat{\Sigma}_{j}=\frac{\sum_{o_{t} \in Z_{j}}\left(o_{t}-\hat{\mu}_{j}\right)\left(o_{t}-\hat{\mu}_{j}\right)^{T}}{\left|Z_{j}\right|} Σ^j=ZjotZj(otμ^j)(otμ^j)T

当GMM有多个分量 b j ( o t ) = ∑ m = 1 M c j m N ( o t ; μ j m , Σ j m ) b_{j}\left(o_{t}\right)=\sum_{m=1}^{M} c_{j m} \mathcal{N}\left(o_{t} ; \mu_{j m}, \Sigma_{j m}\right) bj(ot)=m=1McjmN(ot;μjm,Σjm),可以利用EM算法进行更新(参考上一篇文章)。

Viterbi学习算法

  1. 初始化GMM-HMM参数 λ = ( π i , a i j , G M M 参 数 ) \lambda = (\pi_i,a_{ij},GMM参数) λ=(πi,aij,GMM),其中每个状态 j j j对应的GMM参数为 ( c j m , μ j m , Σ j m ) (c_{jm},\mu_{jm},\Sigma_{jm}) (cjm,μjm,Σjm)
  2. 基于GMM-HMM参数 λ \lambda λ和Viterbi算法得到状态-观测对齐,得到每个观测对应的隐藏状态
  3. 更新参数 λ \lambda λ
    • π ^ i = C ( i ) ∑ k C ( k ) , C ( i ) \hat{\pi}_{i}=\frac{C(i)}{\sum_{k} C(k)}, C(i) π^i=kC(k)C(i),C(i) 表示初始状态为 i i i 的次数
    • a ^ i j = C ( i → j ) ∑ k C ( i → k ) , C ( i → j ) \hat{a}_{i j}=\frac{C(i \rightarrow j)}{\sum_{k} C(i \rightarrow k)}, C(i \rightarrow j) a^ij=kC(ik)C(ij),C(ij) 表示从状态 i i i 到状态 j j j 的转移次数
    • 用上一篇文章将的EM算法更新GMM参数
  4. 重复2,3步,直到收敛

Baum-Welch学习算法

首先看单分量GMM的情况

Viterbi学习算法是一种近似,只考虑了最优对齐路径。而每个时刻 t t t每个状态 j j j以一定的概率出现,而不是硬对齐(每个时刻只对应一个状态)。

状态占用概率:给定模型 λ \lambda λ和观测 O O O,在时刻 t t t处于状态 q i q_i qi的概率为 γ t ( i ) \gamma_{t}(i) γt(i):
γ t ( i ) = P ( i t = q i ∣ O , λ ) = α t ( i ) β t ( i ) ∑ i = 1 N α T ( i ) \gamma_{t}(i)=P\left({i_{t}=q_{i}} \mid {O, \lambda}\right)=\frac{\alpha_{t}(i) \beta_{t}(i)}{\sum_{i=1}^{N} \alpha_{T}(i)} γt(i)=P(it=qiO,λ)=i=1NαT(i)αt(i)βt(i)
可以将这个概率用于EM算法,学习到参数 λ \lambda λ:

  • E步:估计状态占用概率
  • M步:基于估计的状态占用概率,重新估计参数 λ \lambda λ(最大化)

问题:证明 γ t ( i ) = P ( i t = q i ∣ O , λ ) = α t ( i ) β t ( i ) ∑ i = 1 N α T ( i ) \gamma_{t}(i)=P\left({i_{t}=q_{i}} \mid {O, \lambda}\right)=\frac{\alpha_{t}(i) \beta_{t}(i)}{\sum_{i=1}^{N} \alpha_{T}(i)} γt(i)=P(it=qiO,λ)=i=1NαT(i)αt(i)βt(i)成立。

证:省略 λ \lambda λ o 1 t = o 1 , o 2 , . . . , o t o_1^t = o_1,o_2,...,o_t o1t=o1,o2,...,ot O = o 1 T O=o_1^T O=o1T
P ( i t = q i , O ∣ λ ) = P ( i t = q i , o 1 t , o t + 1 T ) = P ( i t = q i , o 1 t ) P ( o t + 1 T ∣ i t = q i , o 1 t ) = P ( i t = q i , o 1 t ) P ( o t + 1 T ∣ i t = q i ) = α t ( i ) β t ( i ) P ( O ∣ λ ) = ∑ j = 1 N α T ( j ) P ( i t = q i ∣ O , λ ) = P ( i t = q i , O ∣ λ ) P ( O ∣ λ ) = α t ( i ) β t ( i ) ∑ j = 1 N α T ( j ) \begin{aligned} P\left(i_{t}=q_{i}, O \mid \lambda\right) &=P\left(i_{t}=q_{i}, o_{1}^{t}, o_{t+1}^{T}\right) \\ &=P\left(i_{t}=q_{i}, o_{1}^{t}\right) P\left(o_{t+1}^{T} \mid i_{t}=q_{i}, o_{1}^{t}\right) \\ &=P\left(i_{t}=q_{i}, o_{1}^{t}\right) P\left(o_{t+1}^{T} \mid i_{t}=q_{i}\right) \\ &=\alpha_{t}(i) \beta_{t}(i) \\ P(O \mid \lambda) &=\sum_{j=1}^{N} \alpha_{T}(j) \\ P\left(i_{t}=q_{i} \mid O, \lambda\right) &=\frac{P\left(i_{t}=q_{i}, O \mid \lambda\right)}{P(O \mid \lambda)}=\frac{\alpha_{t}(i) \beta_{t}(i)}{\sum_{j=1}^{N} \alpha_{T}(j)} \end{aligned} P(it=qi,Oλ)P(Oλ)P(it=qiO,λ)=P(it=qi,o1t,ot+1T)=P(it=qi,o1t)P(ot+1Tit=qi,o1t)=P(it=qi,o1t)P(ot+1Tit=qi)=αt(i)βt(i)=j=1NαT(j)=P(Oλ)P(it=qi,Oλ)=j=1NαT(j)αt(i)βt(i)


对于某个状态,将所有时刻的状态占用概率相加,可认为是一个软次数,使用该软次数重新估计HMM参数:
μ ^ j = ∑ t = 1 T γ t ( j ) o t ∑ t = 1 T γ t ( j ) Σ ^ j = ∑ t = 1 T γ t ( j ) ( o t − μ ^ j ) ( o t − μ ^ j ) T ∑ t = 1 T γ t ( i ) \begin{array}{c} \hat{\mu}_{j}=\frac{\sum_{t=1}^{T} \gamma_{t}(j) o_{t}}{\sum_{t=1}^{T} \gamma_{t}(j)} \\ \hat{\Sigma}_{j}=\frac{\sum_{t=1}^{T} \gamma_{t}(j)\left(o_{t}-\hat{\mu}_{j}\right)\left(o_{t}-\hat{\mu}_{j}\right)^{T}}{\sum_{t=1}^{T} \gamma_{t}(i)} \end{array} μ^j=t=1Tγt(j)t=1Tγt(j)otΣ^j=t=1Tγt(i)t=1Tγt(j)(otμ^j)(otμ^j)T
对比Viterbi算法(硬次数):
μ ^ j = ∑ o t ∈ Z j o t ∣ Z j ∣ Σ ^ j = ∑ o t ∈ Z j ( o t − μ ^ j ) ( o t − μ ^ j ) T ∣ Z j ∣ \hat{\mu}_{j}=\frac{\sum_{o_{t} \in Z_{j}} o_{t}}{\left|Z_{j}\right|} \\ \hat{\Sigma}_{j}=\frac{\sum_{o_{t} \in Z_{j}}\left(o_{t}-\hat{\mu}_{j}\right)\left(o_{t}-\hat{\mu}_{j}\right)^{T}}{\left|Z_{j}\right|} μ^j=ZjotZjotΣ^j=ZjotZj(otμ^j)(otμ^j)T
当GMM有多个分量

b j ( o t ) = ∑ m = 1 M c j m N ( o t ; μ j m , Σ j m ) b_{j}\left(o_{t}\right)=\sum_{m=1}^{M} c_{j m} \mathcal{N}\left(o_{t} ; \mu_{j m}, \Sigma_{j m}\right) bj(ot)=m=1McjmN(ot;μjm,Σjm) 时,与前面类似,定义给定模型 λ \lambda λ 和观测 O O O ,在时刻 t t t 处于状态 q j q_{j} qj 且为GMM第 k k k 个分量的概率为 ζ t ( j , k ) \zeta_{t}(j, k) ζt(j,k)
ζ t ( j , k ) = P ( i t = q j , m t = k ∣ 0 , λ ) = P ( i t = q j , m t = k , 0 ∣ λ ) P ( O ∣ λ ) = ∑ i α t − 1 ( i ) a i j c j k b j k ( o t ) β t ( j ) ∑ i = 1 N α T ( i ) \begin{aligned} \zeta_{t}(j, k) &=P\left(i_{t}=q_{j}, m_{t}=k \mid 0, \lambda\right) \\ &=\frac{P\left(i_{t}=q_{j}, m_{t}=k, 0 \mid \lambda\right)}{P(O \mid \lambda)} \\ &=\frac{\sum_{i} \alpha_{t-1}(i) a_{i j} c_{j k} b_{j k}\left(o_{t}\right) \beta_{t}(j)}{\sum_{i=1}^{N} \alpha_{T}(i)} \end{aligned} ζt(j,k)=P(it=qj,mt=k0,λ)=P(Oλ)P(it=qj,mt=k,0λ)=i=1NαT(i)iαt1(i)aijcjkbjk(ot)βt(j)
则有
μ ^ j k = ∑ t = 1 T ζ t ( j , k ) o t ∑ t = 1 T ζ t ( j , k ) Σ ^ j k = ∑ t = 1 T ζ t ( j , k ) ( o t − μ ^ j k ) ( o t − μ ^ j k ) T ∑ t = 1 T ζ t ( j , k ) c ^ j k = ∑ t = 1 T ζ t ( j , k ) ∑ t = 1 T ∑ k ζ t ( j , k ) \begin{array}{l} \hat{\mu}_{j k}=\frac{\sum_{t=1}^{T} \zeta_{t}(j, k) o_{t}}{\sum_{t=1}^{T} \zeta_{t}(j, k)} \\ \hat{\Sigma}_{j k}=\frac{\sum_{t=1}^{T} \zeta_{t}(j, k)\left(o_{t}-\hat{\mu}_{j k}\right)\left(o_{t}-\hat{\mu}_{j k}\right)^{T}}{\sum_{t=1}^{T} \zeta_{t}(j, k)} \\ \hat{c}_{j k}=\frac{\sum_{t=1}^{T} \zeta_{t}(j, k)}{\sum_{t=1}^{T} \sum_{k} \zeta_{t}(j, k)} \end{array} μ^jk=t=1Tζt(j,k)t=1Tζt(j,k)otΣ^jk=t=1Tζt(j,k)t=1Tζt(j,k)(otμ^jk)(otμ^jk)Tc^jk=t=1Tkζt(j,k)t=1Tζt(j,k)
与状态占用概率类似,定义给定模型 λ \lambda λ 和观测 O O O ,在时刻 t t t 处于状态 q i q_{i} qi 且在时刻 t + 1 t+1 t+1 处于 状态 q j q_{j} qj 的概率为 ξ t ( i , j ) \xi_{t}(\boldsymbol{i}, \boldsymbol{j}) ξt(i,j) :
ξ t ( i , j ) = P ( i t = q i , i t + 1 = q j ∣ 0 , λ ) = P ( i t = q i , i t + 1 = q j , O ∣ λ ) P ( O ∣ λ ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) ∑ i = 1 N α T ( i ) \begin{aligned} \xi_{t}(i, j) &=P\left(i_{t}=q_{i}, i_{t+1}=q_{j} \mid 0, \lambda\right) \\ &=\frac{P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, O \mid \lambda\right)}{P(O \mid \lambda)} \\ &=\frac{\alpha_{t}(i) a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j)}{\sum_{i=1}^{N} \alpha_{T}(i)} \end{aligned} ξt(i,j)=P(it=qi,it+1=qj0,λ)=P(Oλ)P(it=qi,it+1=qj,Oλ)=i=1NαT(i)αt(i)aijbj(ot+1)βt+1(j)
且有 γ t ( i ) = ∑ k = 1 N ξ t ( i , k ) \gamma_{t}(i)=\sum_{k=1}^{N} \xi_{t}(i, k) γt(i)=k=1Nξt(i,k)
由该概率可得转移概率和初始概率:
a ^ i j = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 ∑ k = 1 N ξ t ( i , k ) = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 γ t ( i ) π ^ i = γ 1 ( i ) \begin{aligned} \hat{a}_{i j} &=\frac{\sum_{t=1}^{T-1} \xi_{t}(i, j)}{\sum_{t=1}^{T-1} \sum_{k=1}^{N} \xi_{t}(i, k)}=\frac{\sum_{t=1}^{T-1} \xi_{t}(i, j)}{\sum_{t=1}^{T-1} \gamma_{t}(i)} \\ \hat{\pi}_{i} &=\gamma_{1}(i) \end{aligned} a^ijπ^i=t=1T1k=1Nξt(i,k)t=1T1ξt(i,j)=t=1T1γt(i)t=1T1ξt(i,j)=γ1(i)
问题:证明 ξ t ( i , j ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) ∑ i = 1 N α T ( i ) \xi_{t}(i, j)=\frac{\alpha_{t}(i) a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j)}{\sum_{i=1}^{N} \alpha_{T}(i)} ξt(i,j)=i=1NαT(i)αt(i)aijbj(ot+1)βt+1(j)


证:由前向公式的递推证明可知 α t ( i ) a i j b j ( o t + 1 ) = P ( i t = q i , i t + 1 = q j , o 1 t + 1 ∣ λ ) \alpha_{t}(i) a_{i j} b_{j}\left(o_{t+1}\right)=P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, o_{1}^{t+1} \mid \lambda\right) αt(i)aijbj(ot+1)=P(it=qi,it+1=qj,o1t+1λ)
后向公式定义可知 β t + 1 ( j ) = P ( o t + 2 T ∣ i t + 1 = q j , λ ) \beta_{t+1}(j)=P\left(o_{t+2}^{T} \mid i_{t+1}=q_{j}, \lambda\right) βt+1(j)=P(ot+2Tit+1=qj,λ)
P ( i t = q i , i t + 1 = q j , 0 ∣ λ ) = P ( i t = q i , i t + 1 = q j , o 1 t + 1 , o t + 2 T ∣ λ ) = P ( i t = q i , i t + 1 = q j , o 1 t + 1 ∣ o t + 2 T , λ ) P ( o t + 2 T ∣ λ ) = P ( i t = q i , i t + 1 = q j , o 1 t + 1 ∣ λ ) P ( o t + 2 T ∣ i t + 1 = q j , λ ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) \begin{aligned} P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, 0 \mid \lambda\right) &=P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, o_{1}^{t+1}, o_{t+2}^{T} \mid \lambda\right) \\ &=P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, o_{1}^{t+1} \mid o_{t+2}^{T}, \lambda\right) P\left(o_{t+2}^{T} \mid \lambda\right) \\ &=P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, o_{1}^{t+1} \mid \lambda\right) P\left(o_{t+2}^{T} \mid i_{t+1}=q_{j}, \lambda\right) \\ &=\alpha_{t}(i) a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j) \end{aligned} P(it=qi,it+1=qj,0λ)=P(it=qi,it+1=qj,o1t+1,ot+2Tλ)=P(it=qi,it+1=qj,o1t+1ot+2T,λ)P(ot+2Tλ)=P(it=qi,it+1=qj,o1t+1λ)P(ot+2Tit+1=qj,λ)=αt(i)aijbj(ot+1)βt+1(j)


Baum-Welch学习算法总结

  1. 初始化GMM-HMM参数 λ = ( π i , a i j , ( c j m , μ j m , Σ j m ) ) \lambda=\left(\pi_{i}, a_{i j},\left(c_{j m}, \mu_{j m}, \Sigma_{j m}\right)\right) λ=(πi,aij,(cjm,μjm,Σjm))
  2. E步:对所有时间 t t t 、状态 i i i
    • 递推计算前向概率 α t ( i ) \alpha_{t}(i) αt(i) 和后向概率 β t ( i ) \beta_{t}(i) βt(i)
    • 计算 ζ t ( j , k ) = ∑ i α t − 1 ( i ) a i j c j k b j k ( o t ) β t ( j ) ∑ i = 1 N α T ( i ) , ξ t ( i , j ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) ∑ i = 1 N α T ( i ) , γ t ( i ) = ∑ k = 1 N ξ t ( i , k ) \zeta_{t}(j, k)=\frac{\sum_{i} \alpha_{t-1}(i) a_{i j} c_{j k} b_{j k}\left(o_{t}\right) \beta_{t}(j)}{\sum_{i=1}^{N} \alpha_{T}(i)}, \xi_{t}(i, j)=\frac{\alpha_{t}(i) a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j)}{\sum_{i=1}^{N} \alpha_{T}(i)}, \gamma_{t}(i)=\sum_{k=1}^{N} \xi_{t}(i, k) ζt(j,k)=i=1NαT(i)iαt1(i)aijcjkbjk(ot)βt(j),ξt(i,j)=i=1NαT(i)αt(i)aijbj(ot+1)βt+1(j),γt(i)=k=1Nξt(i,k)
  3. M步: 更新参数

μ ^ j k = ∑ t = 1 T ζ t ( j , k ) o t ∑ t = 1 T ζ t ( j , k ) Σ ^ j k = ∑ t = 1 T ζ t ( j , k ) ( o t − μ ^ j k ) ( o t − μ ^ j k ) T ∑ t = 1 T ζ t ( j , k ) c ^ j k = ∑ t = 1 T ζ t ( j , k ) ∑ t = 1 T ∑ k ζ t ( j , k ) a ^ i j = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 ∑ k = 1 N ξ t ( i , k ) = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 γ t ( i ) π ^ i = γ 1 ( i ) \begin{aligned} \hat{\mu}_{j k} &=\frac{\sum_{t=1}^{T} \zeta_{t}(j, k) o_{t}}{\sum_{t=1}^{T} \zeta_{t}(j, k)} \\ \hat{\Sigma}_{j k} &=\frac{\sum_{t=1}^{T} \zeta_{t}(j, k)\left(o_{t}-\hat{\mu}_{j k}\right)\left(o_{t}-\hat{\mu}_{j k}\right)^{T}}{\sum_{t=1}^{T} \zeta_{t}(j, k)} \\ \hat{c}_{j k} &=\frac{\sum_{t=1}^{T} \zeta_{t}(j, k)}{\sum_{t=1}^{T} \sum_{k} \zeta_{t}(j, k)} \\ \hat{a}_{i j} &=\frac{\sum_{t=1}^{T-1} \xi_{t}(i, j)}{\sum_{t=1}^{T-1} \sum_{k=1}^{N} \xi_{t}(i, k)}=\frac{\sum_{t=1}^{T-1} \xi_{t}(i, j)}{\sum_{t=1}^{T-1} \gamma_{t}(i)} \\ \hat{\pi}_{i} &=\gamma_{1}(i) \end{aligned} μ^jkΣ^jkc^jka^ijπ^i=t=1Tζt(j,k)t=1Tζt(j,k)ot=t=1Tζt(j,k)t=1Tζt(j,k)(otμ^jk)(otμ^jk)T=t=1Tkζt(j,k)t=1Tζt(j,k)=t=1T1k=1Nξt(i,k)t=1T1ξt(i,j)=t=1T1γt(i)t=1T1ξt(i,j)=γ1(i)
4. 重复2,3步,直到收敛

python实现

考虑盒子和球模型 λ = ( A , B , π ) \lambda=(A, B, \pi) λ=(A,B,π), 状态集合 Q = { 1 , 2 , 3 } Q=\{1,2,3\} Q={ 1,2,3}, 观测集合 V = { V=\{ V={ 红, 白 } \} }
A = [ 0.5 0.2 0.3 0.3 0.5 0.2 0.2 0.3 0.5 ] , B = [ 0.5 0.5 0.4 0.6 0.7 0.3 ] , π = ( 0.2 , 0.4 , 0.4 ) T A=\left[\begin{array}{lll}0.5 & 0.2 & 0.3 \\ 0.3 & 0.5 & 0.2 \\ 0.2 & 0.3 & 0.5\end{array}\right], \mathrm{B}=\left[\begin{array}{cc}0.5 & 0.5 \\ 0.4 & 0.6 \\ 0.7 & 0.3\end{array}\right], \pi=(0.2,0.4,0.4)^{T} A=0.50.30.20.20.50.30.30.20.5,B=0.50.40.70.50.60.3,π=(0.2,0.4,0.4)T
T = 3 , O = ( T=3, O=( T=3,O=( 红,白, 红)

  • 实现前向算法和后向算法,分别计算 P ( O ∣ λ ) P(O|\lambda) P(Oλ)
  • 实现Viterbi算法,求最优状态序列(最优路径)

前向算法

公式:
初值:

  1. $\alpha_1(i) = \pi_ib_i(o_1),i=1,2,…,N $

  2. 递推:对 t = 1 , 2 , . . . , T − 1 t=1,2,...,T-1 t=1,2,...,T1
    α t + 1 ( i ) = [ ∑ j = 1 N α t ( j ) a j i ] b i ( o t + 1 ) \alpha_{t+1}(i)=\left[\sum_{j=1}^{N} \alpha_{t}(j) a_{j i}\right] b_{i}\left(o_{t+1}\right) αt+1(i)=[j=1Nαt(j)aji]bi(ot+1)

  3. 终止: P ( O ∣ λ ) = ∑ i = 1 N α T ( i ) P(O \mid \lambda)=\sum_{i=1}^{N} \alpha_{T}(i) P(Oλ)=i=1NαT(i)

def forward_algorithm(O, HMM_model):
    """HMM Forward Algorithm.
    Args:
        O: (o1, o2, ..., oT), observations
        HMM_model: (pi, A, B), (init state prob, transition prob, emitting prob)
    Return:
        prob: the probability of HMM_model generating O.
    """
    pi, A, B = HMM_model
    T = len(O)
    N = len(pi)
    prob = 0.0
    
    alphas = np.zeros((N, T))
    for t in range(T):
        for i in range(N):
            if t == 0:
                alphas[i][t] = pi[i] * B[i][O[t]]
            else:
                alphas[i][t] = np.dot([alpha[t-1] for alpha in alphas], [a[i] for a in A]) * B[i][O[t]]
    prob = np.sum([alpha[T-1] for alpha in alphas])
    
    return prob

后向算法

初值: β T ( i ) = 1 , i = 1 , 2 , . . . , N \beta_{T}(i) = 1, i=1,2,...,N βT(i)=1,i=1,2,...,N

  1. 递推:对 t = T − 1 , T − 2 , . . . , 1 t=T-1,T-2,...,1 t=T1,T2,...,1
    β t ( i ) = ∑ j = 1 N a i j b j ( o t + 1 ) β t + 1 ( j ) , i = 1 , 2 , ⋯   , N \beta_{t}(i)=\sum_{j=1}^{N} a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j), \quad i=1,2, \cdots, N βt(i)=j=1Naijbj(ot+1)βt+1(j),i=1,2,,N

  2. 终止: P ( O ∣ λ ) = ∑ i = 1 N π i b i ( o 1 ) β 1 ( i ) P(O \mid \lambda)=\sum_{i=1}^{N} \pi_{i} b_{i}\left(o_{1}\right) \beta_{1}(i) P(Oλ)=i=1Nπibi(o1)β1(i)

def backward_algorithm(O, HMM_model):
    """HMM Backward Algorithm.
    Args:
        O: (o1, o2, ..., oT), observations
        HMM_model: (pi, A, B), (init state prob, transition prob, emitting prob)
    Return:
        prob: the probability of HMM_model generating O.
    """
    pi, A, B = HMM_model
    T = len(O)
    N = len(pi)
    prob = 0.0
    betas = np.zeros((N, T))
    for i in range(N):
        betas[i][0] = 1
    for t in range(1, T):
        for i in range(N):
            for j in range(N):
                betas[i][t] += A[i][j]*B[j][O[T-t]]*betas[j][t-1]
    
    for i in range(N):
        prob += pi[i]*B[i][O[0]]*betas[i][-1]
    
    return prob

Viterbi算法

公式:

δ t + 1 ( i ) = max ⁡ i 1 , i 2 , ⋯   , i t P ( i t + 1 = i , i t , ⋯   , i 1 , o t + 1 , ⋯   , o 1 ∣ λ ) = max ⁡ 1 ⩽ j ⩽ N [ δ t ( j ) a j i ] b i ( o t + 1 ) , i = 1 , 2 , ⋯   , N ; t = 1 , 2 , ⋯   , T − 1 \begin{aligned} \delta_{t+1}(i) &=\max _{i_1, i_2, \cdots, i_t} P\left(i_{t+1}=i, i_{t}, \cdots, i_{1}, o_{t+1}, \cdots, o_{1} \mid \lambda\right) \\ &=\max _{1 \leqslant j \leqslant N}\left[\delta_{t}(j) a_{j i}\right] b_{i}\left(o_{t+1}\right), \quad i=1,2, \cdots, N ; t=1,2, \cdots, T-1 \end{aligned} δt+1(i)=i1,i2,,itmaxP(it+1=i,it,,i1,ot+1,,o1λ)=1jNmax[δt(j)aji]bi(ot+1),i=1,2,,N;t=1,2,,T1

ψ t ( i ) = arg ⁡ max ⁡ 1 ⩽ j ⩽ N [ δ t − 1 ( j ) a j i ] , i = 1 , 2 , ⋯   , N \psi_{t}(i)=\arg \max _{1 \leqslant j \leqslant N}\left[\delta_{t-1}(j) a_{j i}\right], \quad i=1,2, \cdots, N ψt(i)=arg1jNmax[δt1(j)aji],i=1,2,,N

def Viterbi_algorithm(O, HMM_model):
    """Viterbi decoding.
    Args:
        O: (o1, o2, ..., oT), observations
        HMM_model: (pi, A, B), (init state prob, transition prob, emitting prob)
    Returns:
        best_prob: the probability of the best state sequence
        best_path: the best state sequence
    """
    pi, A, B = HMM_model
    T = len(O)
    N = len(pi)
    best_prob, best_path = 0.0, []
    # Begin Assignment
    deltas = np.zeros((N,T), dtype=np.float64)
    nodes = np.zeros((N,T), dtype=np.int)
    for i in range(N):
        deltas[0][i] = pi[i]*B[i][0]
    for t in range(1, T):
        tmp = [deltas[t-1][j] * A[j][i] for j in range(N)]
        nodes[t][i] = int(np.argmax(tmp))
        deltas[t][i] = tmp[nodes[t][i]] * B[i][O[t]]
    best_path = np.zeros((T), dtype=np.int)
    best_path[T-1] = np.argmax(deltas[T-1])
    for t in range(T-2, -1, -1):
        best_path[t] = nodes[t+1][best_path[t+1]]
    # 
    best_prob = deltas[best_path[1]][best_path[-1]]
    # transform the state as index in python start from '0'
    best_path = [(val+1) for val in best_path]
   
    return best_prob, best_path

你可能感兴趣的:(speech,recognition,语音识别,算法,人工智能,python)