–李航《统计学习方法》第2版第10章(附详细推导过程)
隐马尔可夫模型(hidden Markov model,HMM)是时序概率模型,描述由一个隐藏的马尔可夫链随机生成的不可观测的状态随机序列(state sequence),再由每个状态生产一个观测从而产生观测序列(observation sequence)的过程。
令 Q = { q 1 , q 2 , ⋯ , q N } Q=\{q_1,q_2,\cdots,q_N\} Q={q1,q2,⋯,qN}是所有可能的状态的集合, V = { v 1 , v 2 , ⋯ , v M } V=\{v_1,v_2,\cdots,v_M\} V={v1,v2,⋯,vM}是所有可能的观测集合;
I = ( i 1 , i 2 , ⋯ , i T ) I=(i_1,i_2,\cdots,i_T) I=(i1,i2,⋯,iT)是长度为 T T T的状态序列, O = ( o 1 , o 2 , ⋯ , o T ) O=(o_1,o_2,\cdots,o_T) O=(o1,o2,⋯,oT)是对应的观测序列 ;
A = [ a i j ] N × N A=[a_{ij}]_{N\times N} A=[aij]N×N是状态转移概率矩阵,其中: a i j = P ( i t + 1 = q j ∣ i t = q i ) , i , j = 1 , 2 , ⋯ , N a_{ij}=P(i_{t+1}=q_j|i_t=q_i),i,j=1,2,\cdots,N aij=P(it+1=qj∣it=qi),i,j=1,2,⋯,N;
B = [ b j ( k ) ] N × M B=[b_j(k)]_{N\times M} B=[bj(k)]N×M是观测概率矩阵,其中: b j ( k ) = P ( o t = v k ∣ i t = q j ) , j = 1 , 2 , ⋯ , N , k = 1 , 2 , ⋯ , M b_j(k)=P(o_t=v_k|i_t=q_j),j=1,2,\cdots,N,k=1,2,\cdots,M bj(k)=P(ot=vk∣it=qj),j=1,2,⋯,N,k=1,2,⋯,M;
π = ( π i ) \pi=(\pi_i) π=(πi)是初始概率向量,其中: π i = P ( i 1 = q i ) , i = 1 , 2 , ⋯ , N \pi_i=P(i_1=q_i),i=1,2,\cdots,N πi=P(i1=qi),i=1,2,⋯,N。
I I I由 A A A和 π \pi π确定, O O O由 B B B确定,由定义可知,HMM假设如下:
齐次马尔可夫链假设(homogeneous Markov chain),即状态转移概率矩阵 A A A,观测概率矩阵 B B B,均与时刻 t t t无关, λ = ( A , B , π ) \lambda=(A,B,\pi) λ=(A,B,π)唯一确定了一个HMM;
一阶马尔可夫链假设(first-order Markov chain),即任意时刻的状态只依赖于其前一时刻的状态:
P ( i t ∣ i t − 1 , o t − 1 , ⋯ , i 1 , o 1 ) = P ( i t ∣ i t − 1 ) , t = 1 , 2 , ⋯ , T P(i_t|i_{t-1},o_{t-1},\cdots,i_1,o_1)=P(i_t|i_{t-1}),t=1,2,\cdots,T P(it∣it−1,ot−1,⋯,i1,o1)=P(it∣it−1),t=1,2,⋯,T
观测独立性假设,即任意时刻的观测只依赖于该时刻的状态:
P ( o t ∣ o T , i T , o T − 1 , i T − 1 , ⋯ , o t + 1 , i t + 1 , o t , i t , o t − 1 , i t − 1 , ⋯ , o 1 , i 1 ) = P ( o t ∣ i t ) P(o_t|o_T,i_T,o_{T-1},i_{T-1},\cdots,o_{t+1},i_{t+1},o_t,i_t,o_{t-1},i_{t-1},\cdots,o_1,i_1)=P(o_t|i_t) P(ot∣oT,iT,oT−1,iT−1,⋯,ot+1,it+1,ot,it,ot−1,it−1,⋯,o1,i1)=P(ot∣it)
略
给定一个HMM λ \lambda λ,前向概率 α t ( i ) = P ( o 1 , o 2 , ⋯ , o t , i t = q i ) \alpha_t(i)=P(o_1,o_2,\cdots,o_t,i_t=q_i) αt(i)=P(o1,o2,⋯,ot,it=qi)是到时刻 t t t观测序列为 o 1 , o 2 , ⋯ , o t o_1,o_2,\cdots,o_t o1,o2,⋯,ot切状态为 q i q_i qi的概率。
α 1 ( i ) = P ( o 1 , i 1 = q i ) = P ( o 1 ∣ i 1 = q i ) P ( i 1 = q i ) = b i ( o 1 ) π i α t + 1 ( i ) = P ( o 1 , o 2 , ⋯ , o t , o t + 1 , i t + 1 = q i ) = ∑ j = 1 N P ( o 1 , o 2 , ⋯ , o t , o t + 1 , i t = q j , i t + 1 = q i ) = ∑ j = 1 N P ( o t + 1 ∣ o 1 , o 2 , ⋯ , o t , i t = q j , i t + 1 = q i ) P ( i t + 1 = q i ∣ o 1 , o 2 , ⋯ , o t , i t = q j ) P ( o 1 , o 2 , ⋯ , o t , i t = q j ) = ∑ j = 1 N P ( o t + 1 ∣ i t + 1 = q i ) P ( i t + 1 = q i ∣ i t = q j ) P ( o 1 , o 2 , ⋯ , o t , i t = q j ) = ∑ j = 1 N b i ( o t + 1 ) a j i α t ( j ) P ( O ) = P ( o 1 , o 2 , ⋯ , o T ) = ∑ i = 1 N P ( o 1 , o 2 , ⋯ , o T , i T = q i ) = ∑ i = 1 N α T ( i ) \begin{aligned} \alpha_1(i)&=P(o_1,i_1=q_i)=P(o_1|i_1=q_i)P(i_1=q_i)=b_i(o_1)\pi_i\\ \alpha_{t+1}(i)&=P(o_1,o_2,\cdots,o_t,o_{t+1},i_{t+1}=q_i)\\ &=\sum_{j=1}^NP(o_1,o_2,\cdots,o_t,o_{t+1},i_t=q_j,i_{t+1}=q_i)\\ &=\sum_{j=1}^NP(o_{t+1}|o_1,o_2,\cdots,o_t,i_t=q_j,i_{t+1}=q_i)P(i_{t+1}=q_i|o_1,o_2,\cdots,o_t,i_t=q_j)P(o_1,o_2,\cdots,o_t,i_t=q_j)\\ &=\sum_{j=1}^NP(o_{t+1}|i_{t+1}=q_i)P(i_{t+1}=q_i|i_t=q_j)P(o_1,o_2,\cdots,o_t,i_t=q_j)\\ &=\sum_{j=1}^Nb_i(o_{t+1})a_{ji}\alpha_t(j)\\ P(O)&=P(o_1,o_2,\cdots,o_T)=\sum_{i=1}^NP(o_1,o_2,\cdots,o_T,i_T=q_i)=\sum_{i=1}^N\alpha_T(i) \end{aligned} α1(i)αt+1(i)P(O)=P(o1,i1=qi)=P(o1∣i1=qi)P(i1=qi)=bi(o1)πi=P(o1,o2,⋯,ot,ot+1,it+1=qi)=j=1∑NP(o1,o2,⋯,ot,ot+1,it=qj,it+1=qi)=j=1∑NP(ot+1∣o1,o2,⋯,ot,it=qj,it+1=qi)P(it+1=qi∣o1,o2,⋯,ot,it=qj)P(o1,o2,⋯,ot,it=qj)=j=1∑NP(ot+1∣it+1=qi)P(it+1=qi∣it=qj)P(o1,o2,⋯,ot,it=qj)=j=1∑Nbi(ot+1)ajiαt(j)=P(o1,o2,⋯,oT)=i=1∑NP(o1,o2,⋯,oT,iT=qi)=i=1∑NαT(i)
给定HMM λ \lambda λ,定义 t t t时刻状态为 q i q_i qi的条件下, t + 1 t+1 t+1时刻到 T T T时刻观测序列为 o t + 1 , o t + 2 , ⋯ , o T o_{t+1},o{t+2},\cdots,o_T ot+1,ot+2,⋯,oT的概率为后向概率, β t ( i ) = P ( o t + 1 , o t + 2 , ⋯ , o T ∣ i t = q i ) \beta_t(i)=P(o_{t+1},o{t+2},\cdots,o_T|i_t=q_i) βt(i)=P(ot+1,ot+2,⋯,oT∣it=qi)
β T ( i ) = 1 , i = 1 , 2 , ⋯ , N β t ( i ) = P ( o t + 1 , o t + 2 , ⋯ , o T ∣ i t = q i ) = ∑ j = 1 N P ( o t + 1 , o t + 2 , ⋯ , o T , i t + 1 = q j ∣ i t = q i ) = 1 P ( i t = q i ) ∑ j = 1 N P ( o t + 1 , o t + 2 , ⋯ , o T , i t + 1 = q j , i t = q i ) = 1 P ( i t = q i ) ∑ j = 1 N P ( o t + 1 , o t + 2 , ⋯ , o T , i t = q i ∣ i t + 1 = q j ) P ( i t + 1 = q j ) = 1 P ( i t = q i ) ∑ j = 1 N P ( o t + 1 , o t + 2 , ⋯ , o T ∣ i t + 1 = q j ) P ( i t = q i ∣ i t + 1 = q j ) P ( i t + 1 = q j ) ( d − s e p a r a t i o n ) = 1 P ( i t = q i ) ∑ j = 1 N P ( o t + 1 , o t + 2 , ⋯ , o T , i t + 1 = q j ) P ( i t = q i ∣ i t + 1 = q j ) = 1 P ( i t = q i ) ∑ j = 1 N P ( o t + 1 ∣ o t + 2 , ⋯ , o T , i t + 1 = q j ) P ( o t + 2 , ⋯ , o T , i t + 1 = q j ) P ( i t = q i ∣ i t + 1 = q j ) = 1 P ( i t = q i ) ∑ j = 1 N P ( o t + 1 ∣ i t + 1 = q j ) P ( o t + 2 , ⋯ , o T ∣ i t + 1 = q j ) P ( i t + 1 = q j ) P ( i t = q i ∣ i t + 1 = q j ) = 1 P ( i t = q i ) ∑ j = 1 N P ( o t + 1 ∣ i t + 1 = q j ) P ( o t + 2 , ⋯ , o T ∣ i t + 1 = q j ) P ( i t + 1 = q j ∣ i t = q i ) P ( i t = q i ) = ∑ j = 1 N P ( o t + 1 ∣ i t + 1 = q j ) P ( o t + 2 , ⋯ , o T ∣ i t + 1 = q j ) P ( i t + 1 = q j ∣ i t = q i ) = ∑ j = 1 N b j ( o t + 1 ) β t + 1 ( j ) a i j , t = T − 1 , T − 2 , ⋯ , 1 , i = 1 , 2 , ⋯ , N P ( O ) = ∑ i = 1 N β 1 ( i ) b 1 ( o 1 ) π i \begin{aligned} \beta_T(i)=&1,i=1,2,\cdots,N\\ \beta_t(i)=&P(o_{t+1},o_{t+2},\cdots,o_T|i_t=q_i)\\ =&\sum_{j=1}^NP(o_{t+1},o_{t+2},\cdots,o_T,i_{t+1}=q_j|i_t=q_i)\\ =&\frac{1}{P(i_t=q_i)}\sum_{j=1}^NP(o_{t+1},o_{t+2},\cdots,o_T,i_{t+1}=q_j,i_t=q_i)\\ =&\frac{1}{P(i_t=q_i)}\sum_{j=1}^NP(o_{t+1},o_{t+2},\cdots,o_T,i_t=q_i|i_{t+1}=q_j)P(i_{t+1}=q_j)\\ =&\frac{1}{P(i_t=q_i)}\sum_{j=1}^NP(o_{t+1},o_{t+2},\cdots,o_T|i_{t+1}=q_j)P(i_t=q_i|i_{t+1}=q_j)P(i_{t+1}=q_j)(d-separation)\\ =&\frac{1}{P(i_t=q_i)}\sum_{j=1}^NP(o_{t+1},o_{t+2},\cdots,o_T,i_{t+1}=q_j)P(i_t=q_i|i_{t+1}=q_j)\\ =&\frac{1}{P(i_t=q_i)}\sum_{j=1}^NP(o_{t+1}|o_{t+2},\cdots,o_T,i_{t+1}=q_j)P(o_{t+2},\cdots,o_T,i_{t+1}=q_j)P(i_t=q_i|i_{t+1}=q_j)\\ =&\frac{1}{P(i_t=q_i)}\sum_{j=1}^NP(o_{t+1}|i_{t+1}=q_j)P(o_{t+2},\cdots,o_T|i_{t+1}=q_j)P(i_{t+1}=q_j)P(i_t=q_i|i_{t+1}=q_j)\\ =&\frac{1}{P(i_t=q_i)}\sum_{j=1}^NP(o_{t+1}|i_{t+1}=q_j)P(o_{t+2},\cdots,o_T|i_{t+1}=q_j)P(i_{t+1}=q_j|i_t=q_i)P(i_t=q_i)\\ =&\sum_{j=1}^NP(o_{t+1}|i_{t+1}=q_j)P(o_{t+2},\cdots,o_T|i_{t+1}=q_j)P(i_{t+1}=q_j|i_t=q_i)\\ =&\sum_{j=1}^Nb_j(o_{t+1})\beta_{t+1}(j)a_{ij},t=T-1,T-2,\cdots,1,i=1,2,\cdots,N\\ P(O)=&\sum_{i=1}^N\beta_1(i)b_1(o_1)\pi_i \end{aligned} βT(i)=βt(i)===========P(O)=1,i=1,2,⋯,NP(ot+1,ot+2,⋯,oT∣it=qi)j=1∑NP(ot+1,ot+2,⋯,oT,it+1=qj∣it=qi)P(it=qi)1j=1∑NP(ot+1,ot+2,⋯,oT,it+1=qj,it=qi)P(it=qi)1j=1∑NP(ot+1,ot+2,⋯,oT,it=qi∣it+1=qj)P(it+1=qj)P(it=qi)1j=1∑NP(ot+1,ot+2,⋯,oT∣it+1=qj)P(it=qi∣it+1=qj)P(it+1=qj)(d−separation)P(it=qi)1j=1∑NP(ot+1,ot+2,⋯,oT,it+1=qj)P(it=qi∣it+1=qj)P(it=qi)1j=1∑NP(ot+1∣ot+2,⋯,oT,it+1=qj)P(ot+2,⋯,oT,it+1=qj)P(it=qi∣it+1=qj)P(it=qi)1j=1∑NP(ot+1∣it+1=qj)P(ot+2,⋯,oT∣it+1=qj)P(it+1=qj)P(it=qi∣it+1=qj)P(it=qi)1j=1∑NP(ot+1∣it+1=qj)P(ot+2,⋯,oT∣it+1=qj)P(it+1=qj∣it=qi)P(it=qi)j=1∑NP(ot+1∣it+1=qj)P(ot+2,⋯,oT∣it+1=qj)P(it+1=qj∣it=qi)j=1∑Nbj(ot+1)βt+1(j)aij,t=T−1,T−2,⋯,1,i=1,2,⋯,Ni=1∑Nβ1(i)b1(o1)πi
P ( O ) = P ( o 1 , o 2 , ⋯ , o N ) = ∑ i = 1 N ∑ j = 1 N P ( o 1 , o 2 , ⋯ , o N , i t = q i , i t + 1 = q j ) = ∑ i = 1 N ∑ j = 1 N P ( o 1 , o 2 , ⋯ , o t , i t = q i ∣ i t + 1 = q j ) P ( o t + 1 , o t + 2 , ⋯ , o T ∣ i t + 1 = q j ) P ( i t + 1 = q j ) ( d − s e p a r a t i o n ) = ∑ i = 1 N ∑ j = 1 N P ( o 1 , o 2 , ⋯ , o t , i t = q i , i t + 1 = q j ) P ( o t + 1 , o t + 2 , ⋯ , o T ∣ i t + 1 = q j ) = ∑ i = 1 N ∑ j = 1 N P ( i t + 1 = q j ∣ o 1 , o 2 , ⋯ , o t , i t = q i ) P ( o 1 , o 2 , ⋯ , o t , i t = q i ) P ( o t + 1 , o t + 2 , ⋯ , o T ∣ i t + 1 = q j ) = ∑ i = 1 N ∑ j = 1 N P ( i t + 1 = q j ∣ i t = q i ) P ( o 1 , o 2 , ⋯ , o t , i t = q i ) P ( o t + 1 , o t + 2 , ⋯ , o T ∣ i t + 1 = q j ) = ∑ i = 1 N ∑ j = 1 N a i j α t ( i ) P ( o t + 1 , o t + 2 , ⋯ , o T ∣ i t + 1 = q j ) = ∑ i = 1 N ∑ j = 1 N a i j α t ( i ) P ( o t + 1 , o t + 2 , ⋯ , o T , i t + 1 = q j ) 1 P ( i t + 1 = q j ) = ∑ i = 1 N ∑ j = 1 N a i j α t ( i ) P ( o t + 1 ∣ o t + 2 , ⋯ , o T , i t + 1 = q j ) P ( o t + 2 , ⋯ , o T , i t + 1 = q j ) 1 P ( i t + 1 = q j ) = ∑ i = 1 N ∑ j = 1 N a i j α t ( i ) P ( o t + 1 ∣ i t + 1 = q j ) P ( o t + 2 , ⋯ , o T ∣ i t + 1 = q j ) = ∑ i = 1 N ∑ j = 1 N a i j α t ( i ) b j ( o t + 1 ) β t + 1 ( j ) \begin{aligned} P(O)&=P(o_1,o_2,\cdots,o_N)\\ &=\sum_{i=1}^N\sum_{j=1}^NP(o_1,o_2,\cdots,o_N,i_t=q_i,i_{t+1}=q_j)\\ &=\sum_{i=1}^N\sum_{j=1}^NP(o_1,o_2,\cdots,o_t,i_t=q_i|i_{t+1}=q_j)P(o_{t+1},o_{t+2},\cdots,o_T|i_{t+1}=q_j)P(i_{t+1}=q_j)(d-separation)\\ &=\sum_{i=1}^N\sum_{j=1}^NP(o_1,o_2,\cdots,o_t,i_t=q_i,i_{t+1}=q_j)P(o_{t+1},o_{t+2},\cdots,o_T|i_{t+1}=q_j)\\ &=\sum_{i=1}^N\sum_{j=1}^NP(i_{t+1}=q_j|o_1,o_2,\cdots,o_t,i_t=q_i)P(o_1,o_2,\cdots,o_t,i_t=q_i)P(o_{t+1},o_{t+2},\cdots,o_T|i_{t+1}=q_j)\\ &=\sum_{i=1}^N\sum_{j=1}^NP(i_{t+1}=q_j|i_t=q_i)P(o_1,o_2,\cdots,o_t,i_t=q_i)P(o_{t+1},o_{t+2},\cdots,o_T|i_{t+1}=q_j)\\ &=\sum_{i=1}^N\sum_{j=1}^Na_{ij}\alpha_t(i)P(o_{t+1},o_{t+2},\cdots,o_T|i_{t+1}=q_j)\\ &=\sum_{i=1}^N\sum_{j=1}^Na_{ij}\alpha_t(i)P(o_{t+1},o_{t+2},\cdots,o_T,i_{t+1}=q_j)\frac{1}{P(i_{t+1}=q_j)}\\ &=\sum_{i=1}^N\sum_{j=1}^Na_{ij}\alpha_t(i)P(o_{t+1}|o_{t+2},\cdots,o_T,i_{t+1}=q_j)P(o_{t+2},\cdots,o_T,i_{t+1}=q_j)\frac{1}{P(i_{t+1}=q_j)}\\ &=\sum_{i=1}^N\sum_{j=1}^Na_{ij}\alpha_t(i)P(o_{t+1}|i_{t+1}=q_j)P(o_{t+2},\cdots,o_T|i_{t+1}=q_j)\\ &=\sum_{i=1}^N\sum_{j=1}^Na_{ij}\alpha_t(i)b_j(o_{t+1})\beta_{t+1}(j) \end{aligned} P(O)=P(o1,o2,⋯,oN)=i=1∑Nj=1∑NP(o1,o2,⋯,oN,it=qi,it+1=qj)=i=1∑Nj=1∑NP(o1,o2,⋯,ot,it=qi∣it+1=qj)P(ot+1,ot+2,⋯,oT∣it+1=qj)P(it+1=qj)(d−separation)=i=1∑Nj=1∑NP(o1,o2,⋯,ot,it=qi,it+1=qj)P(ot+1,ot+2,⋯,oT∣it+1=qj)=i=1∑Nj=1∑NP(it+1=qj∣o1,o2,⋯,ot,it=qi)P(o1,o2,⋯,ot,it=qi)P(ot+1,ot+2,⋯,oT∣it+1=qj)=i=1∑Nj=1∑NP(it+1=qj∣it=qi)P(o1,o2,⋯,ot,it=qi)P(ot+1,ot+2,⋯,oT∣it+1=qj)=i=1∑Nj=1∑Naijαt(i)P(ot+1,ot+2,⋯,oT∣it+1=qj)=i=1∑Nj=1∑Naijαt(i)P(ot+1,ot+2,⋯,oT,it+1=qj)P(it+1=qj)1=i=1∑Nj=1∑Naijαt(i)P(ot+1∣ot+2,⋯,oT,it+1=qj)P(ot+2,⋯,oT,it+1=qj)P(it+1=qj)1=i=1∑Nj=1∑Naijαt(i)P(ot+1∣it+1=qj)P(ot+2,⋯,oT∣it+1=qj)=i=1∑Nj=1∑Naijαt(i)bj(ot+1)βt+1(j)
给定一个HMM λ \lambda λ和观测 O O O,则 t t t时刻状态为 i t = q i i_t=q_i it=qi的概率:
γ t ( i ) = P ( i t = q i ∣ O ) = 1 P ( O ) P ( O ∣ i t = q i ) P ( i t = q i ) = 1 P ( O ) P ( o 1 , ⋯ , o t ∣ i t = q i ) P ( o t + 1 , ⋯ , o T ∣ i t = q i ) P ( i t = q i ) ( d − s e p e r a t i o n ) = 1 P ( O ) P ( o 1 , ⋯ , o t , i t = q i ) P ( o t + 1 , ⋯ , o T ∣ i t = q i ) = α t ( i ) β t ( i ) P ( O ) = α t ( i ) β t ( i ) ∑ i = 1 N P ( i t = q i , O ) = α t ( i ) β t ( i ) ∑ i = 1 N α t ( i ) β t ( i ) \begin{aligned} \gamma_t(i)&=P(i_t=q_i|O)=\frac{1}{P(O)}P(O|i_t=q_i)P(i_t=q_i)\\ &=\frac{1}{P(O)}P(o_1,\cdots,o_t|i_t=q_i)P(o_{t+1},\cdots,o_T|i_t=q_i)P(i_t=q_i)(d-seperation)\\ &=\frac{1}{P(O)}P(o_1,\cdots,o_t,i_t=q_i)P(o_{t+1},\cdots,o_T|i_t=q_i)\\ &=\frac{\alpha_t(i)\beta_t(i)}{P(O)}=\frac{\alpha_t(i)\beta_t(i)}{\sum_{i=1}^NP(i_t=q_i,O)}=\frac{\alpha_t(i)\beta_t(i)}{\sum_{i=1}^N\alpha_t(i)\beta_t(i)} \end{aligned} γt(i)=P(it=qi∣O)=P(O)1P(O∣it=qi)P(it=qi)=P(O)1P(o1,⋯,ot∣it=qi)P(ot+1,⋯,oT∣it=qi)P(it=qi)(d−seperation)=P(O)1P(o1,⋯,ot,it=qi)P(ot+1,⋯,oT∣it=qi)=P(O)αt(i)βt(i)=∑i=1NP(it=qi,O)αt(i)βt(i)=∑i=1Nαt(i)βt(i)αt(i)βt(i)
给定一个HMM λ \lambda λ和观测 O O O,则 t t t时刻状态为 i t = q i i_t=q_i it=qi且 i t + 1 = q j i_{t+1}=q_j it+1=qj的概率:
ξ t ( i , j ) = P ( i t = q i , i t + 1 = q j ∣ O ) = 1 P ( O ) P ( i t = q i , i t + 1 = q j , O ) = a i j α t ( i ) b j ( o t + 1 ) β t + 1 ( j ) P ( O ) = a i j α t ( i ) b j ( o t + 1 ) β t + 1 ( j ) ∑ i = 1 N ∑ j = 1 N a i j α t ( i ) b j ( o t + 1 ) β t + 1 ( j ) \begin{aligned} \xi_t(i,j)&=P(i_t=q_i,i_{t+1}=q_j|O)\\ &=\frac{1}{P(O)}P(i_t=q_i,i_{t+1}=q_j,O)\\ &=\frac{a_{ij}\alpha_t(i)b_j(o_{t+1})\beta_{t+1}(j)}{P(O)}\\ &=\frac{a_{ij}\alpha_t(i)b_j(o_{t+1})\beta_{t+1}(j)}{\sum_{i=1}^N\sum_{j=1}^Na_{ij}\alpha_t(i)b_j(o_{t+1})\beta_{t+1}(j)} \end{aligned} ξt(i,j)=P(it=qi,it+1=qj∣O)=P(O)1P(it=qi,it+1=qj,O)=P(O)aijαt(i)bj(ot+1)βt+1(j)=∑i=1N∑j=1Naijαt(i)bj(ot+1)βt+1(j)aijαt(i)bj(ot+1)βt+1(j)
一些期望:观测 O O O下状态为 q i q_i qi的期望为 ∑ t = 1 T γ t ( i ) \sum_{t=1}^T\gamma_t(i) ∑t=1Tγt(i);
观测 O O O下状态由 q i q_i qi转移的期望为 ∑ t = 1 T − 1 γ t ( i ) \sum_{t=1}^{T-1}\gamma_t(i) ∑t=1T−1γt(i);
观测 O O O下状态由 q i q_i qi转移为 q j q_j qj的期望为 ∑ t = 1 T ξ t ( i , j ) \sum_{t=1}^{T}\xi_t(i,j) ∑t=1Tξt(i,j)
略
给定训练数据为观测序列 O 1 , O 2 , ⋯ , O S {O_1,O_2,\cdots,O_S} O1,O2,⋯,OS,学习 λ = ( π , A , B ) \lambda=(\pi,A,B) λ=(π,A,B),将观测序列作为观测数据 O O O,将状态序列作为补课观测数据 I I I,可以将 H M M HMM HMM看做一个含有隐变量的概率模型 P ( O ∣ λ ) = ∑ I P ( O ∣ I , λ ) P ( I ∣ λ ) P(O|\lambda)=\sum_IP(O|I,\lambda)P(I|\lambda) P(O∣λ)=∑IP(O∣I,λ)P(I∣λ),可以利用EM算法进行学习。
确定完全数据的对数似然函数。完全数据 ( O , I ) = ( o 1 , o 2 , ⋯ , o T , i 1 , i 2 , ⋯ , i T ) (O,I)=(o_1,o_2,\cdots,o_T,i_1,i_2,\cdots,i_T) (O,I)=(o1,o2,⋯,oT,i1,i2,⋯,iT),其对数似然函数为 l o g P ( O , I ∣ λ ) logP(O,I|\lambda) logP(O,I∣λ)。
EM算法的E步,求Q函数 Q ( λ , λ ‾ ) Q(\lambda,\overline{\lambda}) Q(λ,λ):
Q ( λ , λ ‾ ) = ∑ I l o g P ( O , I ∣ λ ) P ( O , I ∣ λ ‾ ) Q(\lambda,\overline{\lambda})=\sum_IlogP(O,I|\lambda)P(O,I|\overline{\lambda}) Q(λ,λ)=∑IlogP(O,I∣λ)P(O,I∣λ),其中 λ ‾ \overline{\lambda} λ是模型参数当前的估计值, λ \lambda λ是要极大化的模型参数。
P ( O , I ∣ λ ) = P ( o 1 ∣ o 2 , … , o T , I , λ ) P ( o 2 , … , o T , I , λ ) = P ( o 1 ∣ i 1 , λ ) P ( o 2 , … , o T , I , λ ) = P ( o 1 ∣ i 1 , λ ) P ( o 2 ∣ i 2 , λ ) P ( o 3 , … , o T , I , λ ) = ⋯ = P ( o 1 ∣ i 1 , λ ) P ( o 2 ∣ i 2 , λ ) ⋯ P ( o T ∣ i T , λ ) P ( I ∣ λ ) = ∏ t = 1 T P ( o t ∣ i t , λ ) P ( I ∣ λ ) = ∏ t = 1 T P ( o t ∣ i t , λ ) P ( i T ∣ i 1 , ⋯ , i T − 1 , λ ) P ( i 1 , ⋯ , i T − 1 ∣ λ ) = ∏ t = 1 T P ( o t ∣ i t , λ ) P ( i T ∣ i T − 1 , λ ) P ( i 1 , ⋯ , i T − 1 ∣ λ ) = ⋯ = ∏ t = 1 T P ( o t ∣ i t , λ ) P ( i T ∣ i T − 1 , λ ) P ( i T − 1 ∣ i T − 2 , λ ) ⋯ P ( i 2 ∣ i 1 , λ ) P ( i 1 ∣ λ ) = π i 1 b i 1 ( o 1 ) a i 1 , i 2 b i 2 ( o 2 ) a i 2 , i 3 ⋯ b i T − 1 ( o T − 1 ) a i T − 1 , i T b i T ( O T ) \begin{aligned} P(O,I|\lambda)&=P(o_1|o_2,\dots,o_T,I,\lambda)P(o_2,\dots,o_T,I,\lambda)\\ &=P(o_1|i_1,\lambda)P(o_2,\dots,o_T,I,\lambda)\\ &=P(o_1|i_1,\lambda)P(o_2|i_2,\lambda)P(o_3,\dots,o_T,I,\lambda) =\cdots\\ &=P(o_1|i_1,\lambda)P(o_2|i_2,\lambda)\cdots P(o_T|i_T,\lambda)P(I|\lambda)\\ &=\prod_{t=1}^TP(o_t|i_t,\lambda)P(I|\lambda)\\ &=\prod_{t=1}^TP(o_t|i_t,\lambda)P(i_T|i_1,\cdots,i_{T-1},\lambda)P(i_1,\cdots,i_{T-1}|\lambda)\\ &=\prod_{t=1}^TP(o_t|i_t,\lambda)P(i_T|i_{T-1},\lambda)P(i_1,\cdots,i_{T-1}|\lambda)=\cdots\\ &=\prod_{t=1}^TP(o_t|i_t,\lambda)P(i_T|i_{T-1},\lambda)P(i_{T-1}|i_{T-2},\lambda)\cdots P(i_2|i_1,\lambda)P(i_1|\lambda)\\ &=\pi_{i_1}b_{i_1}(o_1)a_{i_1,i_2}b_{i_2}(o_2)a_{i_2,i_3}\cdots b_{i_{T-1}}(o_{T-1})a_{i_{T-1},i_T}b_{i_T}(O_T) \end{aligned} P(O,I∣λ)=P(o1∣o2,…,oT,I,λ)P(o2,…,oT,I,λ)=P(o1∣i1,λ)P(o2,…,oT,I,λ)=P(o1∣i1,λ)P(o2∣i2,λ)P(o3,…,oT,I,λ)=⋯=P(o1∣i1,λ)P(o2∣i2,λ)⋯P(oT∣iT,λ)P(I∣λ)=t=1∏TP(ot∣it,λ)P(I∣λ)=t=1∏TP(ot∣it,λ)P(iT∣i1,⋯,iT−1,λ)P(i1,⋯,iT−1∣λ)=t=1∏TP(ot∣it,λ)P(iT∣iT−1,λ)P(i1,⋯,iT−1∣λ)=⋯=t=1∏TP(ot∣it,λ)P(iT∣iT−1,λ)P(iT−1∣iT−2,λ)⋯P(i2∣i1,λ)P(i1∣λ)=πi1bi1(o1)ai1,i2bi2(o2)ai2,i3⋯biT−1(oT−1)aiT−1,iTbiT(OT)
EM算法的M步,极大化Q函数求模型参数
Q ( λ , λ ‾ ) = ∑ I l o g π i 1 P ( O , I ∣ λ ‾ ) + ∑ I ( ∑ t = 1 T − 1 l o g a i t , i t + 1 ) P ( O , I ∣ λ ‾ ) + ∑ I ( ∑ t = 1 T l o g b i t ( o t ) ) P ( O , I ∣ λ ‾ ) Q(\lambda,\overline{\lambda})=\sum_Ilog\pi_{i_1}P(O,I|\overline{\lambda})+\sum_I(\sum_{t=1}^{T-1}loga_{i_t,i_{t+1}})P(O,I|\overline{\lambda})+\sum_I(\sum_{t=1}^Tlogb_{i_t}(o_t))P(O,I|\overline{\lambda}) Q(λ,λ)=I∑logπi1P(O,I∣λ)+I∑(t=1∑T−1logait,it+1)P(O,I∣λ)+I∑(t=1∑Tlogbit(ot))P(O,I∣λ)
三部分分别求最大值
∑ I log π i 1 P ( O , I ∣ λ ‾ ) = ∑ i 1 ∑ i 2 ⋯ ∑ i T log π i 1 P ( O , I ∣ λ ‾ ) = ∑ i 1 log π i 1 ∑ i 2 ⋯ ∑ i T P ( O , i 1 , i 2 , ⋯ , i T ∣ λ ‾ ) = ∑ i 1 log π i 1 P ( O , i 1 ∣ λ ‾ ) = ∑ i = 1 N log π i P ( O , i 1 = q i ∣ λ ‾ ) \begin{aligned} \sum_I\log \pi_{i_1}P(O,I|\overline{\lambda})&=\sum_{i_1}\sum_{i_2}\cdots\sum_{i_T}\log\pi_{i_1}P(O,I|\overline{\lambda})\\ &=\sum_{i_1}\log\pi_{i_1}\sum_{i_2}\cdots\sum_{i_T}P(O,i_1,i_2,\cdots,i_T|\overline{\lambda})\\ &=\sum_{i_1}\log\pi_{i_1}P(O,i_1|\overline{\lambda})\\ &=\sum_{i=1}^N\log\pi_iP(O,i_1=q_i|\overline{\lambda}) \end{aligned} I∑logπi1P(O,I∣λ)=i1∑i2∑⋯iT∑logπi1P(O,I∣λ)=i1∑logπi1i2∑⋯iT∑P(O,i1,i2,⋯,iT∣λ)=i1∑logπi1P(O,i1∣λ)=i=1∑NlogπiP(O,i1=qi∣λ)
由于 π i \pi_i πi满足 ∑ i = 1 N π i = 1 \sum_{i=1}^N\pi_i=1 ∑i=1Nπi=1,故可应用拉格朗日乘子法,相应的拉格朗日函数为:
∑ i = 1 N log π i P ( O , i 1 = q i ∣ λ ‾ ) + γ ( ∑ i = 1 N π i − 1 ) \sum_{i=1}^N\log\pi_iP(O,i_1=q_i|\overline{\lambda})+\gamma(\sum_{i=1}^N\pi_i-1) i=1∑NlogπiP(O,i1=qi∣λ)+γ(i=1∑Nπi−1)
对 π i \pi_i πi求偏导
∂ ∂ π i [ ∑ i = 1 N log π i P ( O , i 1 = q i ∣ λ ‾ ) + γ ( ∑ i = 1 N π i − 1 ) ] = 0 1 π i P ( O , i 1 = q i ∣ λ ‾ ) + γ = 0 P ( O , i 1 = q i ∣ λ ‾ ) + γ π i = 0 γ = γ ∑ i = 1 N π i = − ∑ i = 1 N P ( O , i 1 = q i ∣ λ ‾ ) = − P ( O ∣ λ ‾ ) \begin{aligned} &\frac{\partial}{\partial\pi_i}[\sum_{i=1}^N\log\pi_iP(O,i_1=q_i|\overline{\lambda})+\gamma(\sum_{i=1}^N\pi_i-1)]=0\\ &\frac{1}{\pi_i}P(O,i_1=q_i|\overline{\lambda})+\gamma=0\\ &P(O,i_1=q_i|\overline{\lambda})+\gamma\pi_i=0\\ &\gamma=\gamma\sum_{i=1}^N\pi_i=-\sum_{i=1}^NP(O,i_1=q_i|\overline{\lambda})=-P(O|\overline{\lambda}) \end{aligned} ∂πi∂[i=1∑NlogπiP(O,i1=qi∣λ)+γ(i=1∑Nπi−1)]=0πi1P(O,i1=qi∣λ)+γ=0P(O,i1=qi∣λ)+γπi=0γ=γi=1∑Nπi=−i=1∑NP(O,i1=qi∣λ)=−P(O∣λ)
带入可得 π i = P ( O , i 1 = q i ∣ λ ‾ ) P ( O ∣ λ ‾ ) = γ 1 ( i ) \pi_i=\frac{P(O,i_1=q_i|\overline{\lambda})}{P(O|\overline{\lambda})}=\gamma_1(i) πi=P(O∣λ)P(O,i1=qi∣λ)=γ1(i)
∑ I ( ∑ t = 1 T − 1 log a i t , i t + 1 ) P ( O , I ∣ λ ‾ ) = ∑ t = 1 T − 1 ∑ i = 1 N ∑ j = 1 N log a i t , i t + 1 ) P ( O , i t = q i , i t + 1 = q j ∣ λ ‾ ) \sum_I(\sum_{t=1}^{T-1}\log a_{i_t,i_{t+1}})P(O,I|\overline{\lambda})=\sum_{t=1}^{T-1}\sum_{i=1}^N\sum_{j=1}^N\log a_{i_t,i_{t+1}})P(O,i_t=q_i,i_{t+1}=q_j|\overline{\lambda}) I∑(t=1∑T−1logait,it+1)P(O,I∣λ)=t=1∑T−1i=1∑Nj=1∑Nlogait,it+1)P(O,it=qi,it+1=qj∣λ)
根据 ∑ j = 1 N a i j = 1 \sum_{j=1}^Na_{ij}=1 ∑j=1Naij=1,应用拉格朗日乘子法可得:
a i j = ∑ t = 1 T − 1 P ( O , i t = q i , i t + 1 = q j ∣ λ ‾ ) ∑ t = 1 T − 1 P ( O , i t = q i ∣ λ ‾ ) = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 γ t ( i ) a_{ij}=\frac{\sum_{t=1}^{T-1}P(O,i_t=q_i,i_{t+1}=q_j|\overline{\lambda})}{\sum_{t=1}^{T-1}P(O,i_t=q_i|\overline{\lambda})}=\frac{\sum_{t=1}^{T-1}\xi_t(i,j)}{\sum_{t=1}^{T-1}\gamma_t(i)} aij=∑t=1T−1P(O,it=qi∣λ)∑t=1T−1P(O,it=qi,it+1=qj∣λ)=∑t=1T−1γt(i)∑t=1T−1ξt(i,j)
∑ I ( ∑ t = 1 T log b i t ( o t ) ) P ( O , I ∣ λ ‾ ) = ∑ t = 1 T ∑ i = 1 N log b i ( o t ) P ( O , i t = q i ∣ λ ‾ ) \sum_I(\sum_{t=1}^T\log b_{i_t}(o_t))P(O,I|\overline{\lambda})=\sum_{t=1}^T\sum_{i=1}^N\log b_i(o_t)P(O,i_t=q_i|\overline{\lambda}) I∑(t=1∑Tlogbit(ot))P(O,I∣λ)=t=1∑Ti=1∑Nlogbi(ot)P(O,it=qi∣λ)
根据 ∑ k = 1 K b i ( k ) = 1 \sum_{k=1}^Kb_i(k)=1 ∑k=1Kbi(k)=1,应用拉格朗日乘子法:
∂ ∂ b i ( k ) [ ∑ t = 1 T ∑ i = 1 N log b i ( o t ) P ( O , i t = q i ∣ λ ‾ ) + γ ( ∑ k = 1 K b i ( k ) − 1 ) ] = 0 ∑ t = 1 T 1 b i ( o t ) ∂ b i ( o t ) ∂ b i ( k ) P ( O , i t = q i ∣ λ ‾ ) + γ = 0 ∑ t = 1 T 1 b i ( o t ) I ( o t = v k ) P ( O , i t = q i ∣ λ ‾ ) + γ = 0 ∑ t = 1 T 1 b i ( k ) I ( o t = v k ) P ( O , i t = q i ∣ λ ‾ ) + γ = 0 ∑ t = 1 T I ( o t = v k ) P ( O , i t = q i ∣ λ ‾ ) + γ b i ( k ) = 0 ∑ k = 1 K ∑ t = 1 T I ( o t = v k ) P ( O , i t = q i ∣ λ ‾ ) + ∑ k = 1 K γ b i ( k ) = 0 ∑ t = 1 T P ( O , i t = q i ∣ λ ‾ ) ∑ k = 1 K I ( o t = v k ) + ∑ k = 1 K γ b i ( k ) = 0 γ = − ∑ t = 1 T P ( O , i t = q i ∣ λ ‾ ) b i ( k ) = ∑ t = 1 T I ( o t = v k ) P ( O , i t = q i ∣ λ ‾ ) ∑ t = 1 T P ( O , i t = q i ∣ λ ‾ ) = ∑ t = 1 T I ( o t = v k ) γ t ( i ) ∑ t = 1 T γ t ( i ) \begin{aligned} &\frac{\partial}{\partial b_i(k)}[\sum_{t=1}^T\sum_{i=1}^N\log b_i(o_t)P(O,i_t=q_i|\overline{\lambda})+\gamma(\sum_{k=1}^Kb_i(k)-1)]=0\\ &\sum_{t=1}^T\frac{1}{b_i(o_t)}\frac{\partial b_i(o_t)}{\partial b_i(k)}P(O,i_t=q_i|\overline{\lambda})+\gamma=0\\ &\sum_{t=1}^T\frac{1}{b_i(o_t)}I(o_t=v_k)P(O,i_t=q_i|\overline{\lambda})+\gamma=0\\ &\sum_{t=1}^T\frac{1}{b_i(k)}I(o_t=v_k)P(O,i_t=q_i|\overline{\lambda})+\gamma=0\\ &\sum_{t=1}^TI(o_t=v_k)P(O,i_t=q_i|\overline{\lambda})+\gamma b_i(k)=0\\ &\sum_{k=1}^K\sum_{t=1}^TI(o_t=v_k)P(O,i_t=q_i|\overline{\lambda})+\sum_{k=1}^K\gamma b_i(k)=0\\ &\sum_{t=1}^TP(O,i_t=q_i|\overline{\lambda})\sum_{k=1}^KI(o_t=v_k)+\sum_{k=1}^K\gamma b_i(k)=0\\ &\gamma=-\sum_{t=1}^TP(O,i_t=q_i|\overline{\lambda})\\ &b_i(k)=\frac{\sum_{t=1}^TI(o_t=v_k)P(O,i_t=q_i|\overline{\lambda})}{\sum_{t=1}^TP(O,i_t=q_i|\overline{\lambda})}=\frac{\sum_{t=1}^TI(o_t=v_k)\gamma_t(i)}{\sum_{t=1}^T\gamma_t(i)} \end{aligned} ∂bi(k)∂[t=1∑Ti=1∑Nlogbi(ot)P(O,it=qi∣λ)+γ(k=1∑Kbi(k)−1)]=0t=1∑Tbi(ot)1∂bi(k)∂bi(ot)P(O,it=qi∣λ)+γ=0t=1∑Tbi(ot)1I(ot=vk)P(O,it=qi∣λ)+γ=0t=1∑Tbi(k)1I(ot=vk)P(O,it=qi∣λ)+γ=0t=1∑TI(ot=vk)P(O,it=q