HMM的参数学习问题

HMM的参数学习问题

HMM的参数学习问题有两种:

  1. 监督学习:给定观测序列 O=(o1,...,oT) O = ( o 1 , . . . , o T ) 和对应的状态序列 I=(i1,...,iT) I = ( i 1 , . . . , i T ) ,估计参数 λ=(A,B,π) λ = ( A , B , π )

  2. 非监督学习:只给定观测序列 O=(o1,...,oT) O = ( o 1 , . . . , o T ) ,估计参数 λ=(A,B,π) λ = ( A , B , π )


监督学习(极大似然直接估计)

监督学习通过使用训练数据,来得到观测序列和对应的隐状态。然后计算相应的频数值,作为参数的近似估计。


非监督学习(Baum-Welch算法迭代估计)

Baum-Welch算法的本质即EM算法,是用于含有隐向量的模型中,进行参数学习的迭代算法。回顾EM算法的核心,是按照 Θ(g+1) Θ ( g + 1 ) Θ(g) Θ ( g ) 之间的等式关系:

Θ(g+1)=argmaxΘ{Q(Θ,Θ(g))}=argmaxΘzP(Z|X,Θ(g))logP(X,Z|Θ)dz Θ ( g + 1 ) = a r g m a x Θ ⁡ { Q ( Θ , Θ ( g ) ) } = a r g m a x Θ ⁡ ∫ z P ( Z | X , Θ ( g ) ) l o g P ( X , Z | Θ ) d z

不断更新参数,并且保证每一次更新,都能使对数似然函数逐渐增大。

在非监督学习的情况下,我们只有观测序列 O=(o1,...,oT) O = ( o 1 , . . . , o T ) ,而状态序列 I I 被视为不可观测的隐变量,此时HMM就是一个含有隐变量的概率模型:

P(O|λ)=IP(O|I,λ)P(I|λ) P ( O | λ ) = ∑ I P ( O | I , λ ) P ( I | λ )

此时的参数估计可以用EM算法实现。这里,参数 λ λ 的迭代规则为:

λ(g+1)=argmaxλ{Q(λ,λ(g))}=argmaxλIP(I|O,λ(g))logP(O,I|λ)dI λ ( g + 1 ) = a r g m a x λ ⁡ { Q ( λ , λ ( g ) ) } = a r g m a x λ ⁡ ∫ I P ( I | O , λ ( g ) ) l o g P ( O , I | λ ) d I

其中, λ(g) λ ( g ) 是上一次迭代得到的参数, λ(g+1) λ ( g + 1 ) 是下一次迭代更新的参数。

E-step

如上,在HMM中,求期望的公式为:

Q(λ,λ(g))=IP(I|O,λ(g))logP(O,I|λ)dI=IP(I|O,λ(g))logP(O,I|λ) Q ( λ , λ ( g ) ) = ∫ I P ( I | O , λ ( g ) ) l o g P ( O , I | λ ) d I = ∑ I P ( I | O , λ ( g ) ) l o g P ( O , I | λ )

由于 P(I|O,λ(g))=P(O,I|λ(g))P(O|λ(g)) P ( I | O , λ ( g ) ) = P ( O , I | λ ( g ) ) P ( O | λ ( g ) ) ,注意 λ(g) λ ( g ) 是一个常数,因此对于 λ λ 来说, 1P(O|λ(g)) 1 P ( O | λ ( g ) ) 是一个常数因子,不会对 argmax a r g m a x 的结果产生任何影响。因此, Q Q 函数又可写为:

Q(λ,λ(g))=IP(O,I|λ(g))logP(O,I|λ) Q ( λ , λ ( g ) ) = ∑ I P ( O , I | λ ( g ) ) l o g P ( O , I | λ )

在HMM的概率计算问题-直接计算章节,已求得:

P(O,I|λ)=πi1t=1Tbit(ot)t=1T1aitit+1 P ( O , I | λ ) = π i 1 ∏ t = 1 T b i t ( o t ) ∏ t = 1 T − 1 a i t i t + 1

代入 Q Q 函数并展开,记为式1

Q(λ,λ(g))=IP(O,I|λ(g))log[πi1Tt=1bit(ot)T1t=1aitit+1] Q ( λ , λ ( g ) ) = ∑ I P ( O , I | λ ( g ) ) l o g [ π i 1 ∏ t = 1 T b i t ( o t ) ∏ t = 1 T − 1 a i t i t + 1 ]

=IP(O,I|λ(g))logπi1+IP(O,I|λ(g))Tt=1logbit(ot)+IP(O,I|λ(g))T1t=1logaitit+1 = ∑ I P ( O , I | λ ( g ) ) l o g π i 1 + ∑ I P ( O , I | λ ( g ) ) ∑ t = 1 T l o g b i t ( o t ) + ∑ I P ( O , I | λ ( g ) ) ∑ t = 1 T − 1 l o g a i t i t + 1

M-step

上述式1被展开为3项:它们分别包含了初始状态概率向量 πi1 π i 1 观测概率矩阵的元素 bit(ot) b i t ( o t ) 状态转移概率矩阵的元素 aitit+1 a i t i t + 1 ,可以分别用于估计参数 π π BN×M B N × M AN×N A N × N 。现在分别对每一项做最大化,求出下一步的迭代参数。

  • πi1 π i 1

    IP(O,I|λ(g))logπi1 ∑ I P ( O , I | λ ( g ) ) l o g π i 1

    =i1...iT[P(O,I|λ(g))logπi1] = ∑ i 1 . . . ∑ i T [ P ( O , I | λ ( g ) ) l o g π i 1 ]

    =i1logπi1[i2...iTP(O,i1,i2,...,iT|λ(g))] = ∑ i 1 l o g π i 1 [ ∑ i 2 . . . ∑ i T P ( O , i 1 , i 2 , . . . , i T | λ ( g ) ) ]

    =i1logπi1P(O,i1|λ(g)) = ∑ i 1 l o g π i 1 P ( O , i 1 | λ ( g ) )

    =Ni=1logπiP(O,i1=qi|λ(g)) = ∑ i = 1 N l o g π i P ( O , i 1 = q i | λ ( g ) )

    由于初始状态概率必须满足 Ni=1πi=1 ∑ i = 1 N π i = 1 ,因此构造拉格朗日方程:

    L(πi)=Ni=1logπiP(O,i1=qi|λ(g))γ(Ni=1πi1) L ( π i ) = ∑ i = 1 N l o g π i P ( O , i 1 = q i | λ ( g ) ) − γ ( ∑ i = 1 N π i − 1 )

    分别对 πi π i γ γ 求偏导,并令其等于0:

    Lπi=P(O,i1=qi|λ(g))πiγ=0 ∂ L ∂ π i = P ( O , i 1 = q i | λ ( g ) ) π i − γ = 0

    Lγ=(Ni=1πi1)=0 ∂ L ∂ γ = − ( ∑ i = 1 N π i − 1 ) = 0

    联立解得:

π(g+1)i=P(O,i1=qi|λ(g))Ni=1P(O,i1=qi|λ(g))=P(O,i1=qi|λ(g))P(O|λ(g)) π i ( g + 1 ) = P ( O , i 1 = q i | λ ( g ) ) ∑ i = 1 N P ( O , i 1 = q i | λ ( g ) ) = P ( O , i 1 = q i | λ ( g ) ) P ( O | λ ( g ) )

  • bit(ot) b i t ( o t )

    IP(O,I|λ(g))Tt=1logbit(ot) ∑ I P ( O , I | λ ( g ) ) ∑ t = 1 T l o g b i t ( o t )

    =I[P(O,I|λ(g))logbi1(o1)+...+P(O,I|λ(g))logbiT(oT)] = ∑ I [ P ( O , I | λ ( g ) ) l o g b i 1 ( o 1 ) + . . . + P ( O , I | λ ( g ) ) l o g b i T ( o T ) ]

    =IP(O,I|λ(g))logbi1(o1)+...+IP(O,I|λ(g))logbiT(oT) = ∑ I P ( O , I | λ ( g ) ) l o g b i 1 ( o 1 ) + . . . + ∑ I P ( O , I | λ ( g ) ) l o g b i T ( o T )

    =Ni=1P(O,i1=qi|λ(g))logbi(o1)+...+Ni=1P(O,iT=qi|λ(g))logbi(oT) = ∑ i = 1 N P ( O , i 1 = q i | λ ( g ) ) l o g b i ( o 1 ) + . . . + ∑ i = 1 N P ( O , i T = q i | λ ( g ) ) l o g b i ( o T )

    =Ni=1Tt=1P(O,it=qi|λ(g))logbi(ot) = ∑ i = 1 N ∑ t = 1 T P ( O , i t = q i | λ ( g ) ) l o g b i ( o t )

    由于观测概率矩阵的行和均为 1 1 ,即必须满足 N N 个约束条件: Mk=1bi(ot=vk)=1,i{1,2,...,N} ∑ k = 1 M b i ( o t = v k ) = 1 , i ∈ { 1 , 2 , . . . , N } ,因此构造拉格朗日方程:

    L(bi(ot))=Ni=1Tt=1P(O,it=qi|λ(g))logbi(ot)Ni=1γi(Mk=1bi(ot=vk)1) L ( b i ( o t ) ) = ∑ i = 1 N ∑ t = 1 T P ( O , i t = q i | λ ( g ) ) l o g b i ( o t ) − ∑ i = 1 N γ i ( ∑ k = 1 M b i ( o t = v k ) − 1 )

    分别对 bi(ot) b i ( o t ) γi γ i 求偏导,并令其等于0:

    【注】:只有在 ot=vk o t = v k 时, bi(ot) b i ( o t ) bi(vk) b i ( v k ) 的偏导才不为零,以 I(ot=vk) I ( o t = v k ) 表示。

    Lbi(ot)=Tt=1P(O,it=qi|λ(g))bi(ot)Ni=1γi=0 ∂ L ∂ b i ( o t ) = ∑ t = 1 T P ( O , i t = q i | λ ( g ) ) b i ( o t ) − ∑ i = 1 N γ i = 0

    Lγi=(Mk=1bi(ot=vk)1)=0 ∂ L ∂ γ i = − ( ∑ k = 1 M b i ( o t = v k ) − 1 ) = 0

    联立解得:

bi(ot=vk)(g+1)=Tt=1P(O=vk,it=qi|λ(g))Mk=1Tt=1P(O=vk,it=qi|λ(g)) b i ( o t = v k ) ( g + 1 ) = ∑ t = 1 T P ( O = v k , i t = q i | λ ( g ) ) ∑ k = 1 M ∑ t = 1 T P ( O = v k , i t = q i | λ ( g ) )

=Tt=1P(O,it=qi|λ(g))I(ot=vk)Tt=1P(O,it=qi|λ(g)) = ∑ t = 1 T P ( O , i t = q i | λ ( g ) ) I ( o t = v k ) ∑ t = 1 T P ( O , i t = q i | λ ( g ) )

  • aitit+1 a i t i t + 1

    IP(O,I|λ(g))T1t=1logaitit+1 ∑ I P ( O , I | λ ( g ) ) ∑ t = 1 T − 1 l o g a i t i t + 1

    =I[P(O,I|λ(g))logai1i2+...+P(O,I|λ(g))logaiT1iT] = ∑ I [ P ( O , I | λ ( g ) ) l o g a i 1 i 2 + . . . + P ( O , I | λ ( g ) ) l o g a i T − 1 i T ]

    =IP(O,I|λ(g))logai1i2+...+IP(O,I|λ(g))logaiT1iT = ∑ I P ( O , I | λ ( g ) ) l o g a i 1 i 2 + . . . + ∑ I P ( O , I | λ ( g ) ) l o g a i T − 1 i T

    =Ni=1Nj=1P(O,i1=qi,i2=qj|λ(g))logaij+...+Ni=1Nj=1P(O,iT1=qi,iT=qj|λ(g))logaij = ∑ i = 1 N ∑ j = 1 N P ( O , i 1 = q i , i 2 = q j | λ ( g ) ) l o g a i j + . . . + ∑ i = 1 N ∑ j = 1 N P ( O , i T − 1 = q i , i T = q j | λ ( g ) ) l o g a i j

    =Ni=1Nj=1T1t=1P(O,it=qi,it+1=qj|λ(g))logaij = ∑ i = 1 N ∑ j = 1 N ∑ t = 1 T − 1 P ( O , i t = q i , i t + 1 = q j | λ ( g ) ) l o g a i j

    由于状态转移概率矩阵的行和均为 1 1 ,即必须满足 N N 个约束条件 Nj=1aij=1,i{1,2,...,N} ∑ j = 1 N a i j = 1 , i ∈ { 1 , 2 , . . . , N } ,因此构造拉格朗日方程:

    L(aij)=Ni=1Nj=1T1t=1P(O,it=qi,it+1=qj|λ(g))logaijNi=1γi(Nj=1aij1) L ( a i j ) = ∑ i = 1 N ∑ j = 1 N ∑ t = 1 T − 1 P ( O , i t = q i , i t + 1 = q j | λ ( g ) ) l o g a i j − ∑ i = 1 N γ i ( ∑ j = 1 N a i j − 1 )

    分别对 aij a i j γi γ i 求偏导,并令其等于0:

    Laij=T1t=1P(O,it=qi,it+1=qj|λ(g))aijNi=1γi=0 ∂ L ∂ a i j = ∑ t = 1 T − 1 P ( O , i t = q i , i t + 1 = q j | λ ( g ) ) a i j − ∑ i = 1 N γ i = 0

    Lγi=(Nj=1aij1)=0 ∂ L ∂ γ i = − ( ∑ j = 1 N a i j − 1 ) = 0

    联立解得:

a(g+1)ij=T1t=1P(O,it=qi,it+1=qj|λ(g))Nj=1T1t=1P(O,it=qi,it+1=qj|λ(g)) a i j ( g + 1 ) = ∑ t = 1 T − 1 P ( O , i t = q i , i t + 1 = q j | λ ( g ) ) ∑ j = 1 N ∑ t = 1 T − 1 P ( O , i t = q i , i t + 1 = q j | λ ( g ) )

=T1t=1P(O,it=qi,it+1=qj|λ(g))T1t=1P(O,it=qi|λ(g)) = ∑ t = 1 T − 1 P ( O , i t = q i , i t + 1 = q j | λ ( g ) ) ∑ t = 1 T − 1 P ( O , i t = q i | λ ( g ) )

你可能感兴趣的:(HMM的参数学习问题)