李航《统计学习方法》第2版读书笔记第10章(附详细推导过程)

隐马尔可夫模型

–李航《统计学习方法》第2版第10章(附详细推导过程)

一、基本概念

隐马尔可夫模型(hidden Markov model,HMM)是时序概率模型,描述由一个隐藏的马尔可夫链随机生成的不可观测的状态随机序列(state sequence),再由每个状态生产一个观测从而产生观测序列(observation sequence)的过程。

Q = { q 1 , q 2 , ⋯   , q N } Q=\{q_1,q_2,\cdots,q_N\} Q={q1,q2,,qN}是所有可能的状态的集合, V = { v 1 , v 2 , ⋯   , v M } V=\{v_1,v_2,\cdots,v_M\} V={v1,v2,,vM}是所有可能的观测集合;

I = ( i 1 , i 2 , ⋯   , i T ) I=(i_1,i_2,\cdots,i_T) I=(i1,i2,,iT)是长度为 T T T的状态序列, O = ( o 1 , o 2 , ⋯   , o T ) O=(o_1,o_2,\cdots,o_T) O=(o1,o2,,oT)是对应的观测序列 ;

A = [ a i j ] N × N A=[a_{ij}]_{N\times N} A=[aij]N×N是状态转移概率矩阵,其中: a i j = P ( i t + 1 = q j ∣ i t = q i ) , i , j = 1 , 2 , ⋯   , N a_{ij}=P(i_{t+1}=q_j|i_t=q_i),i,j=1,2,\cdots,N aij=P(it+1=qjit=qi),i,j=1,2,,N

B = [ b j ( k ) ] N × M B=[b_j(k)]_{N\times M} B=[bj(k)]N×M是观测概率矩阵,其中: b j ( k ) = P ( o t = v k ∣ i t = q j ) , j = 1 , 2 , ⋯   , N , k = 1 , 2 , ⋯   , M b_j(k)=P(o_t=v_k|i_t=q_j),j=1,2,\cdots,N,k=1,2,\cdots,M bj(k)=P(ot=vkit=qj),j=1,2,,N,k=1,2,,M

π = ( π i ) \pi=(\pi_i) π=(πi)是初始概率向量,其中: π i = P ( i 1 = q i ) , i = 1 , 2 , ⋯   , N \pi_i=P(i_1=q_i),i=1,2,\cdots,N πi=P(i1=qi),i=1,2,,N

I I I A A A π \pi π确定, O O O B B B确定,由定义可知,HMM假设如下:

  • 齐次马尔可夫链假设(homogeneous Markov chain),即状态转移概率矩阵 A A A,观测概率矩阵 B B B,均与时刻 t t t无关, λ = ( A , B , π ) \lambda=(A,B,\pi) λ=(A,B,π)唯一确定了一个HMM;

  • 一阶马尔可夫链假设(first-order Markov chain),即任意时刻的状态只依赖于其前一时刻的状态:

    P ( i t ∣ i t − 1 , o t − 1 , ⋯   , i 1 , o 1 ) = P ( i t ∣ i t − 1 ) , t = 1 , 2 , ⋯   , T P(i_t|i_{t-1},o_{t-1},\cdots,i_1,o_1)=P(i_t|i_{t-1}),t=1,2,\cdots,T P(itit1,ot1,,i1,o1)=P(itit1),t=1,2,,T

  • 观测独立性假设,即任意时刻的观测只依赖于该时刻的状态:

    P ( o t ∣ o T , i T , o T − 1 , i T − 1 , ⋯   , o t + 1 , i t + 1 , o t , i t , o t − 1 , i t − 1 , ⋯   , o 1 , i 1 ) = P ( o t ∣ i t ) P(o_t|o_T,i_T,o_{T-1},i_{T-1},\cdots,o_{t+1},i_{t+1},o_t,i_t,o_{t-1},i_{t-1},\cdots,o_1,i_1)=P(o_t|i_t) P(otoT,iT,oT1,iT1,,ot+1,it+1,ot,it,ot1,it1,,o1,i1)=P(otit)

二、概率计算

直接计算法

前向计算法

给定一个HMM λ \lambda λ,前向概率 α t ( i ) = P ( o 1 , o 2 , ⋯   , o t , i t = q i ) \alpha_t(i)=P(o_1,o_2,\cdots,o_t,i_t=q_i) αt(i)=Po1,o2,,ot,it=qi)是到时刻 t t t观测序列为 o 1 , o 2 , ⋯   , o t o_1,o_2,\cdots,o_t o1,o2,,ot切状态为 q i q_i qi的概率。

α 1 ( i ) = P ( o 1 , i 1 = q i ) = P ( o 1 ∣ i 1 = q i ) P ( i 1 = q i ) = b i ( o 1 ) π i α t + 1 ( i ) = P ( o 1 , o 2 , ⋯   , o t , o t + 1 , i t + 1 = q i ) = ∑ j = 1 N P ( o 1 , o 2 , ⋯   , o t , o t + 1 , i t = q j , i t + 1 = q i ) = ∑ j = 1 N P ( o t + 1 ∣ o 1 , o 2 , ⋯   , o t , i t = q j , i t + 1 = q i ) P ( i t + 1 = q i ∣ o 1 , o 2 , ⋯   , o t , i t = q j ) P ( o 1 , o 2 , ⋯   , o t , i t = q j ) = ∑ j = 1 N P ( o t + 1 ∣ i t + 1 = q i ) P ( i t + 1 = q i ∣ i t = q j ) P ( o 1 , o 2 , ⋯   , o t , i t = q j ) = ∑ j = 1 N b i ( o t + 1 ) a j i α t ( j ) P ( O ) = P ( o 1 , o 2 , ⋯   , o T ) = ∑ i = 1 N P ( o 1 , o 2 , ⋯   , o T , i T = q i ) = ∑ i = 1 N α T ( i ) \begin{aligned} \alpha_1(i)&=P(o_1,i_1=q_i)=P(o_1|i_1=q_i)P(i_1=q_i)=b_i(o_1)\pi_i\\ \alpha_{t+1}(i)&=P(o_1,o_2,\cdots,o_t,o_{t+1},i_{t+1}=q_i)\\ &=\sum_{j=1}^NP(o_1,o_2,\cdots,o_t,o_{t+1},i_t=q_j,i_{t+1}=q_i)\\ &=\sum_{j=1}^NP(o_{t+1}|o_1,o_2,\cdots,o_t,i_t=q_j,i_{t+1}=q_i)P(i_{t+1}=q_i|o_1,o_2,\cdots,o_t,i_t=q_j)P(o_1,o_2,\cdots,o_t,i_t=q_j)\\ &=\sum_{j=1}^NP(o_{t+1}|i_{t+1}=q_i)P(i_{t+1}=q_i|i_t=q_j)P(o_1,o_2,\cdots,o_t,i_t=q_j)\\ &=\sum_{j=1}^Nb_i(o_{t+1})a_{ji}\alpha_t(j)\\ P(O)&=P(o_1,o_2,\cdots,o_T)=\sum_{i=1}^NP(o_1,o_2,\cdots,o_T,i_T=q_i)=\sum_{i=1}^N\alpha_T(i) \end{aligned} α1(i)αt+1(i)P(O)=P(o1,i1=qi)=P(o1i1=qi)P(i1=qi)=bi(o1)πi=P(o1,o2,,ot,ot+1,it+1=qi)=j=1NP(o1,o2,,ot,ot+1,it=qj,it+1=qi)=j=1NP(ot+1o1,o2,,ot,it=qj,it+1=qi)P(it+1=qio1,o2,,ot,it=qj)P(o1,o2,,ot,it=qj)=j=1NP(ot+1it+1=qi)P(it+1=qiit=qj)P(o1,o2,,ot,it=qj)=j=1Nbi(ot+1)ajiαt(j)=P(o1,o2,,oT)=i=1NP(o1,o2,,oT,iT=qi)=i=1NαT(i)

后向计算法

给定HMM λ \lambda λ,定义 t t t时刻状态为 q i q_i qi的条件下, t + 1 t+1 t+1时刻到 T T T时刻观测序列为 o t + 1 , o t + 2 , ⋯   , o T o_{t+1},o{t+2},\cdots,o_T ot+1,ot+2,,oT的概率为后向概率, β t ( i ) = P ( o t + 1 , o t + 2 , ⋯   , o T ∣ i t = q i ) \beta_t(i)=P(o_{t+1},o{t+2},\cdots,o_T|i_t=q_i) βt(i)=P(ot+1,ot+2,,oTit=qi)

β T ( i ) = 1 , i = 1 , 2 , ⋯   , N β t ( i ) = P ( o t + 1 , o t + 2 , ⋯   , o T ∣ i t = q i ) = ∑ j = 1 N P ( o t + 1 , o t + 2 , ⋯   , o T , i t + 1 = q j ∣ i t = q i ) = 1 P ( i t = q i ) ∑ j = 1 N P ( o t + 1 , o t + 2 , ⋯   , o T , i t + 1 = q j , i t = q i ) = 1 P ( i t = q i ) ∑ j = 1 N P ( o t + 1 , o t + 2 , ⋯   , o T , i t = q i ∣ i t + 1 = q j ) P ( i t + 1 = q j ) = 1 P ( i t = q i ) ∑ j = 1 N P ( o t + 1 , o t + 2 , ⋯   , o T ∣ i t + 1 = q j ) P ( i t = q i ∣ i t + 1 = q j ) P ( i t + 1 = q j ) ( d − s e p a r a t i o n ) = 1 P ( i t = q i ) ∑ j = 1 N P ( o t + 1 , o t + 2 , ⋯   , o T , i t + 1 = q j ) P ( i t = q i ∣ i t + 1 = q j ) = 1 P ( i t = q i ) ∑ j = 1 N P ( o t + 1 ∣ o t + 2 , ⋯   , o T , i t + 1 = q j ) P ( o t + 2 , ⋯   , o T , i t + 1 = q j ) P ( i t = q i ∣ i t + 1 = q j ) = 1 P ( i t = q i ) ∑ j = 1 N P ( o t + 1 ∣ i t + 1 = q j ) P ( o t + 2 , ⋯   , o T ∣ i t + 1 = q j ) P ( i t + 1 = q j ) P ( i t = q i ∣ i t + 1 = q j ) = 1 P ( i t = q i ) ∑ j = 1 N P ( o t + 1 ∣ i t + 1 = q j ) P ( o t + 2 , ⋯   , o T ∣ i t + 1 = q j ) P ( i t + 1 = q j ∣ i t = q i ) P ( i t = q i ) = ∑ j = 1 N P ( o t + 1 ∣ i t + 1 = q j ) P ( o t + 2 , ⋯   , o T ∣ i t + 1 = q j ) P ( i t + 1 = q j ∣ i t = q i ) = ∑ j = 1 N b j ( o t + 1 ) β t + 1 ( j ) a i j , t = T − 1 , T − 2 , ⋯   , 1 , i = 1 , 2 , ⋯   , N P ( O ) = ∑ i = 1 N β 1 ( i ) b 1 ( o 1 ) π i \begin{aligned} \beta_T(i)=&1,i=1,2,\cdots,N\\ \beta_t(i)=&P(o_{t+1},o_{t+2},\cdots,o_T|i_t=q_i)\\ =&\sum_{j=1}^NP(o_{t+1},o_{t+2},\cdots,o_T,i_{t+1}=q_j|i_t=q_i)\\ =&\frac{1}{P(i_t=q_i)}\sum_{j=1}^NP(o_{t+1},o_{t+2},\cdots,o_T,i_{t+1}=q_j,i_t=q_i)\\ =&\frac{1}{P(i_t=q_i)}\sum_{j=1}^NP(o_{t+1},o_{t+2},\cdots,o_T,i_t=q_i|i_{t+1}=q_j)P(i_{t+1}=q_j)\\ =&\frac{1}{P(i_t=q_i)}\sum_{j=1}^NP(o_{t+1},o_{t+2},\cdots,o_T|i_{t+1}=q_j)P(i_t=q_i|i_{t+1}=q_j)P(i_{t+1}=q_j)(d-separation)\\ =&\frac{1}{P(i_t=q_i)}\sum_{j=1}^NP(o_{t+1},o_{t+2},\cdots,o_T,i_{t+1}=q_j)P(i_t=q_i|i_{t+1}=q_j)\\ =&\frac{1}{P(i_t=q_i)}\sum_{j=1}^NP(o_{t+1}|o_{t+2},\cdots,o_T,i_{t+1}=q_j)P(o_{t+2},\cdots,o_T,i_{t+1}=q_j)P(i_t=q_i|i_{t+1}=q_j)\\ =&\frac{1}{P(i_t=q_i)}\sum_{j=1}^NP(o_{t+1}|i_{t+1}=q_j)P(o_{t+2},\cdots,o_T|i_{t+1}=q_j)P(i_{t+1}=q_j)P(i_t=q_i|i_{t+1}=q_j)\\ =&\frac{1}{P(i_t=q_i)}\sum_{j=1}^NP(o_{t+1}|i_{t+1}=q_j)P(o_{t+2},\cdots,o_T|i_{t+1}=q_j)P(i_{t+1}=q_j|i_t=q_i)P(i_t=q_i)\\ =&\sum_{j=1}^NP(o_{t+1}|i_{t+1}=q_j)P(o_{t+2},\cdots,o_T|i_{t+1}=q_j)P(i_{t+1}=q_j|i_t=q_i)\\ =&\sum_{j=1}^Nb_j(o_{t+1})\beta_{t+1}(j)a_{ij},t=T-1,T-2,\cdots,1,i=1,2,\cdots,N\\ P(O)=&\sum_{i=1}^N\beta_1(i)b_1(o_1)\pi_i \end{aligned} βT(i)=βt(i)===========P(O)=1,i=1,2,,NP(ot+1,ot+2,,oTit=qi)j=1NP(ot+1,ot+2,,oT,it+1=qjit=qi)P(it=qi)1j=1NP(ot+1,ot+2,,oT,it+1=qj,it=qi)P(it=qi)1j=1NP(ot+1,ot+2,,oT,it=qiit+1=qj)P(it+1=qj)P(it=qi)1j=1NP(ot+1,ot+2,,oTit+1=qj)P(it=qiit+1=qj)P(it+1=qj)(dseparation)P(it=qi)1j=1NP(ot+1,ot+2,,oT,it+1=qj)P(it=qiit+1=qj)P(it=qi)1j=1NP(ot+1ot+2,,oT,it+1=qj)P(ot+2,,oT,it+1=qj)P(it=qiit+1=qj)P(it=qi)1j=1NP(ot+1it+1=qj)P(ot+2,,oTit+1=qj)P(it+1=qj)P(it=qiit+1=qj)P(it=qi)1j=1NP(ot+1it+1=qj)P(ot+2,,oTit+1=qj)P(it+1=qjit=qi)P(it=qi)j=1NP(ot+1it+1=qj)P(ot+2,,oTit+1=qj)P(it+1=qjit=qi)j=1Nbj(ot+1)βt+1(j)aij,t=T1,T2,,1,i=1,2,,Ni=1Nβ1(i)b1(o1)πi

前后向概率的关系

P ( O ) = P ( o 1 , o 2 , ⋯   , o N ) = ∑ i = 1 N ∑ j = 1 N P ( o 1 , o 2 , ⋯   , o N , i t = q i , i t + 1 = q j ) = ∑ i = 1 N ∑ j = 1 N P ( o 1 , o 2 , ⋯   , o t , i t = q i ∣ i t + 1 = q j ) P ( o t + 1 , o t + 2 , ⋯   , o T ∣ i t + 1 = q j ) P ( i t + 1 = q j ) ( d − s e p a r a t i o n ) = ∑ i = 1 N ∑ j = 1 N P ( o 1 , o 2 , ⋯   , o t , i t = q i , i t + 1 = q j ) P ( o t + 1 , o t + 2 , ⋯   , o T ∣ i t + 1 = q j ) = ∑ i = 1 N ∑ j = 1 N P ( i t + 1 = q j ∣ o 1 , o 2 , ⋯   , o t , i t = q i ) P ( o 1 , o 2 , ⋯   , o t , i t = q i ) P ( o t + 1 , o t + 2 , ⋯   , o T ∣ i t + 1 = q j ) = ∑ i = 1 N ∑ j = 1 N P ( i t + 1 = q j ∣ i t = q i ) P ( o 1 , o 2 , ⋯   , o t , i t = q i ) P ( o t + 1 , o t + 2 , ⋯   , o T ∣ i t + 1 = q j ) = ∑ i = 1 N ∑ j = 1 N a i j α t ( i ) P ( o t + 1 , o t + 2 , ⋯   , o T ∣ i t + 1 = q j ) = ∑ i = 1 N ∑ j = 1 N a i j α t ( i ) P ( o t + 1 , o t + 2 , ⋯   , o T , i t + 1 = q j ) 1 P ( i t + 1 = q j ) = ∑ i = 1 N ∑ j = 1 N a i j α t ( i ) P ( o t + 1 ∣ o t + 2 , ⋯   , o T , i t + 1 = q j ) P ( o t + 2 , ⋯   , o T , i t + 1 = q j ) 1 P ( i t + 1 = q j ) = ∑ i = 1 N ∑ j = 1 N a i j α t ( i ) P ( o t + 1 ∣ i t + 1 = q j ) P ( o t + 2 , ⋯   , o T ∣ i t + 1 = q j ) = ∑ i = 1 N ∑ j = 1 N a i j α t ( i ) b j ( o t + 1 ) β t + 1 ( j ) \begin{aligned} P(O)&=P(o_1,o_2,\cdots,o_N)\\ &=\sum_{i=1}^N\sum_{j=1}^NP(o_1,o_2,\cdots,o_N,i_t=q_i,i_{t+1}=q_j)\\ &=\sum_{i=1}^N\sum_{j=1}^NP(o_1,o_2,\cdots,o_t,i_t=q_i|i_{t+1}=q_j)P(o_{t+1},o_{t+2},\cdots,o_T|i_{t+1}=q_j)P(i_{t+1}=q_j)(d-separation)\\ &=\sum_{i=1}^N\sum_{j=1}^NP(o_1,o_2,\cdots,o_t,i_t=q_i,i_{t+1}=q_j)P(o_{t+1},o_{t+2},\cdots,o_T|i_{t+1}=q_j)\\ &=\sum_{i=1}^N\sum_{j=1}^NP(i_{t+1}=q_j|o_1,o_2,\cdots,o_t,i_t=q_i)P(o_1,o_2,\cdots,o_t,i_t=q_i)P(o_{t+1},o_{t+2},\cdots,o_T|i_{t+1}=q_j)\\ &=\sum_{i=1}^N\sum_{j=1}^NP(i_{t+1}=q_j|i_t=q_i)P(o_1,o_2,\cdots,o_t,i_t=q_i)P(o_{t+1},o_{t+2},\cdots,o_T|i_{t+1}=q_j)\\ &=\sum_{i=1}^N\sum_{j=1}^Na_{ij}\alpha_t(i)P(o_{t+1},o_{t+2},\cdots,o_T|i_{t+1}=q_j)\\ &=\sum_{i=1}^N\sum_{j=1}^Na_{ij}\alpha_t(i)P(o_{t+1},o_{t+2},\cdots,o_T,i_{t+1}=q_j)\frac{1}{P(i_{t+1}=q_j)}\\ &=\sum_{i=1}^N\sum_{j=1}^Na_{ij}\alpha_t(i)P(o_{t+1}|o_{t+2},\cdots,o_T,i_{t+1}=q_j)P(o_{t+2},\cdots,o_T,i_{t+1}=q_j)\frac{1}{P(i_{t+1}=q_j)}\\ &=\sum_{i=1}^N\sum_{j=1}^Na_{ij}\alpha_t(i)P(o_{t+1}|i_{t+1}=q_j)P(o_{t+2},\cdots,o_T|i_{t+1}=q_j)\\ &=\sum_{i=1}^N\sum_{j=1}^Na_{ij}\alpha_t(i)b_j(o_{t+1})\beta_{t+1}(j) \end{aligned} P(O)=P(o1,o2,,oN)=i=1Nj=1NP(o1,o2,,oN,it=qi,it+1=qj)=i=1Nj=1NP(o1,o2,,ot,it=qiit+1=qj)P(ot+1,ot+2,,oTit+1=qj)P(it+1=qj)(dseparation)=i=1Nj=1NP(o1,o2,,ot,it=qi,it+1=qj)P(ot+1,ot+2,,oTit+1=qj)=i=1Nj=1NP(it+1=qjo1,o2,,ot,it=qi)P(o1,o2,,ot,it=qi)P(ot+1,ot+2,,oTit+1=qj)=i=1Nj=1NP(it+1=qjit=qi)P(o1,o2,,ot,it=qi)P(ot+1,ot+2,,oTit+1=qj)=i=1Nj=1Naijαt(i)P(ot+1,ot+2,,oTit+1=qj)=i=1Nj=1Naijαt(i)P(ot+1,ot+2,,oT,it+1=qj)P(it+1=qj)1=i=1Nj=1Naijαt(i)P(ot+1ot+2,,oT,it+1=qj)P(ot+2,,oT,it+1=qj)P(it+1=qj)1=i=1Nj=1Naijαt(i)P(ot+1it+1=qj)P(ot+2,,oTit+1=qj)=i=1Nj=1Naijαt(i)bj(ot+1)βt+1(j)

其他一些概率和期望的计算

  1. 给定一个HMM λ \lambda λ和观测 O O O,则 t t t时刻状态为 i t = q i i_t=q_i it=qi的概率:

    γ t ( i ) = P ( i t = q i ∣ O ) = 1 P ( O ) P ( O ∣ i t = q i ) P ( i t = q i ) = 1 P ( O ) P ( o 1 , ⋯   , o t ∣ i t = q i ) P ( o t + 1 , ⋯   , o T ∣ i t = q i ) P ( i t = q i ) ( d − s e p e r a t i o n ) = 1 P ( O ) P ( o 1 , ⋯   , o t , i t = q i ) P ( o t + 1 , ⋯   , o T ∣ i t = q i ) = α t ( i ) β t ( i ) P ( O ) = α t ( i ) β t ( i ) ∑ i = 1 N P ( i t = q i , O ) = α t ( i ) β t ( i ) ∑ i = 1 N α t ( i ) β t ( i ) \begin{aligned} \gamma_t(i)&=P(i_t=q_i|O)=\frac{1}{P(O)}P(O|i_t=q_i)P(i_t=q_i)\\ &=\frac{1}{P(O)}P(o_1,\cdots,o_t|i_t=q_i)P(o_{t+1},\cdots,o_T|i_t=q_i)P(i_t=q_i)(d-seperation)\\ &=\frac{1}{P(O)}P(o_1,\cdots,o_t,i_t=q_i)P(o_{t+1},\cdots,o_T|i_t=q_i)\\ &=\frac{\alpha_t(i)\beta_t(i)}{P(O)}=\frac{\alpha_t(i)\beta_t(i)}{\sum_{i=1}^NP(i_t=q_i,O)}=\frac{\alpha_t(i)\beta_t(i)}{\sum_{i=1}^N\alpha_t(i)\beta_t(i)} \end{aligned} γt(i)=P(it=qiO)=P(O)1P(Oit=qi)P(it=qi)=P(O)1P(o1,,otit=qi)P(ot+1,,oTit=qi)P(it=qi)(dseperation)=P(O)1P(o1,,ot,it=qi)P(ot+1,,oTit=qi)=P(O)αt(i)βt(i)=i=1NP(it=qi,O)αt(i)βt(i)=i=1Nαt(i)βt(i)αt(i)βt(i)
    给定一个HMM λ \lambda λ和观测 O O O,则 t t t时刻状态为 i t = q i i_t=q_i it=qi i t + 1 = q j i_{t+1}=q_j it+1=qj的概率:

    ξ t ( i , j ) = P ( i t = q i , i t + 1 = q j ∣ O ) = 1 P ( O ) P ( i t = q i , i t + 1 = q j , O ) = a i j α t ( i ) b j ( o t + 1 ) β t + 1 ( j ) P ( O ) = a i j α t ( i ) b j ( o t + 1 ) β t + 1 ( j ) ∑ i = 1 N ∑ j = 1 N a i j α t ( i ) b j ( o t + 1 ) β t + 1 ( j ) \begin{aligned} \xi_t(i,j)&=P(i_t=q_i,i_{t+1}=q_j|O)\\ &=\frac{1}{P(O)}P(i_t=q_i,i_{t+1}=q_j,O)\\ &=\frac{a_{ij}\alpha_t(i)b_j(o_{t+1})\beta_{t+1}(j)}{P(O)}\\ &=\frac{a_{ij}\alpha_t(i)b_j(o_{t+1})\beta_{t+1}(j)}{\sum_{i=1}^N\sum_{j=1}^Na_{ij}\alpha_t(i)b_j(o_{t+1})\beta_{t+1}(j)} \end{aligned} ξt(i,j)=P(it=qi,it+1=qjO)=P(O)1P(it=qi,it+1=qj,O)=P(O)aijαt(i)bj(ot+1)βt+1(j)=i=1Nj=1Naijαt(i)bj(ot+1)βt+1(j)aijαt(i)bj(ot+1)βt+1(j)

  2. 一些期望:观测 O O O下状态为 q i q_i qi的期望为 ∑ t = 1 T γ t ( i ) \sum_{t=1}^T\gamma_t(i) t=1Tγt(i)

    观测 O O O下状态由 q i q_i qi转移的期望为 ∑ t = 1 T − 1 γ t ( i ) \sum_{t=1}^{T-1}\gamma_t(i) t=1T1γt(i)

    观测 O O O下状态由 q i q_i qi转移为 q j q_j qj的期望为 ∑ t = 1 T ξ t ( i , j ) \sum_{t=1}^{T}\xi_t(i,j) t=1Tξt(i,j)

三、学习算法

监督学习算法

Baum-Welch算法

给定训练数据为观测序列 O 1 , O 2 , ⋯   , O S {O_1,O_2,\cdots,O_S} O1,O2,,OS,学习 λ = ( π , A , B ) \lambda=(\pi,A,B) λ=(π,A,B),将观测序列作为观测数据 O O O,将状态序列作为补课观测数据 I I I,可以将 H M M HMM HMM看做一个含有隐变量的概率模型 P ( O ∣ λ ) = ∑ I P ( O ∣ I , λ ) P ( I ∣ λ ) P(O|\lambda)=\sum_IP(O|I,\lambda)P(I|\lambda) P(Oλ)=IP(OI,λ)P(Iλ),可以利用EM算法进行学习。

  1. 确定完全数据的对数似然函数。完全数据 ( O , I ) = ( o 1 , o 2 , ⋯   , o T , i 1 , i 2 , ⋯   , i T ) (O,I)=(o_1,o_2,\cdots,o_T,i_1,i_2,\cdots,i_T) O,I=(o1,o2,,oT,i1,i2,,iT),其对数似然函数为 l o g P ( O , I ∣ λ ) logP(O,I|\lambda) logP(O,Iλ)

  2. EM算法的E步,求Q函数 Q ( λ , λ ‾ ) Q(\lambda,\overline{\lambda}) Q(λ,λ)

    Q ( λ , λ ‾ ) = ∑ I l o g P ( O , I ∣ λ ) P ( O , I ∣ λ ‾ ) Q(\lambda,\overline{\lambda})=\sum_IlogP(O,I|\lambda)P(O,I|\overline{\lambda}) Q(λ,λ)=IlogP(O,Iλ)P(O,Iλ),其中 λ ‾ \overline{\lambda} λ是模型参数当前的估计值, λ \lambda λ是要极大化的模型参数。

    P ( O , I ∣ λ ) = P ( o 1 ∣ o 2 , … , o T , I , λ ) P ( o 2 , … , o T , I , λ ) = P ( o 1 ∣ i 1 , λ ) P ( o 2 , … , o T , I , λ ) = P ( o 1 ∣ i 1 , λ ) P ( o 2 ∣ i 2 , λ ) P ( o 3 , … , o T , I , λ ) = ⋯ = P ( o 1 ∣ i 1 , λ ) P ( o 2 ∣ i 2 , λ ) ⋯ P ( o T ∣ i T , λ ) P ( I ∣ λ ) = ∏ t = 1 T P ( o t ∣ i t , λ ) P ( I ∣ λ ) = ∏ t = 1 T P ( o t ∣ i t , λ ) P ( i T ∣ i 1 , ⋯   , i T − 1 , λ ) P ( i 1 , ⋯   , i T − 1 ∣ λ ) = ∏ t = 1 T P ( o t ∣ i t , λ ) P ( i T ∣ i T − 1 , λ ) P ( i 1 , ⋯   , i T − 1 ∣ λ ) = ⋯ = ∏ t = 1 T P ( o t ∣ i t , λ ) P ( i T ∣ i T − 1 , λ ) P ( i T − 1 ∣ i T − 2 , λ ) ⋯ P ( i 2 ∣ i 1 , λ ) P ( i 1 ∣ λ ) = π i 1 b i 1 ( o 1 ) a i 1 , i 2 b i 2 ( o 2 ) a i 2 , i 3 ⋯ b i T − 1 ( o T − 1 ) a i T − 1 , i T b i T ( O T ) \begin{aligned} P(O,I|\lambda)&=P(o_1|o_2,\dots,o_T,I,\lambda)P(o_2,\dots,o_T,I,\lambda)\\ &=P(o_1|i_1,\lambda)P(o_2,\dots,o_T,I,\lambda)\\ &=P(o_1|i_1,\lambda)P(o_2|i_2,\lambda)P(o_3,\dots,o_T,I,\lambda) =\cdots\\ &=P(o_1|i_1,\lambda)P(o_2|i_2,\lambda)\cdots P(o_T|i_T,\lambda)P(I|\lambda)\\ &=\prod_{t=1}^TP(o_t|i_t,\lambda)P(I|\lambda)\\ &=\prod_{t=1}^TP(o_t|i_t,\lambda)P(i_T|i_1,\cdots,i_{T-1},\lambda)P(i_1,\cdots,i_{T-1}|\lambda)\\ &=\prod_{t=1}^TP(o_t|i_t,\lambda)P(i_T|i_{T-1},\lambda)P(i_1,\cdots,i_{T-1}|\lambda)=\cdots\\ &=\prod_{t=1}^TP(o_t|i_t,\lambda)P(i_T|i_{T-1},\lambda)P(i_{T-1}|i_{T-2},\lambda)\cdots P(i_2|i_1,\lambda)P(i_1|\lambda)\\ &=\pi_{i_1}b_{i_1}(o_1)a_{i_1,i_2}b_{i_2}(o_2)a_{i_2,i_3}\cdots b_{i_{T-1}}(o_{T-1})a_{i_{T-1},i_T}b_{i_T}(O_T) \end{aligned} P(O,Iλ)=P(o1o2,,oT,I,λ)P(o2,,oT,I,λ)=P(o1i1,λ)P(o2,,oT,I,λ)=P(o1i1,λ)P(o2i2,λ)P(o3,,oT,I,λ)==P(o1i1,λ)P(o2i2,λ)P(oTiT,λ)P(Iλ)=t=1TP(otit,λ)P(Iλ)=t=1TP(otit,λ)P(iTi1,,iT1,λ)P(i1,,iT1λ)=t=1TP(otit,λ)P(iTiT1,λ)P(i1,,iT1λ)==t=1TP(otit,λ)P(iTiT1,λ)P(iT1iT2,λ)P(i2i1,λ)P(i1λ)=πi1bi1(o1)ai1,i2bi2(o2)ai2,i3biT1(oT1)aiT1,iTbiT(OT)

  3. EM算法的M步,极大化Q函数求模型参数

    Q ( λ , λ ‾ ) = ∑ I l o g π i 1 P ( O , I ∣ λ ‾ ) + ∑ I ( ∑ t = 1 T − 1 l o g a i t , i t + 1 ) P ( O , I ∣ λ ‾ ) + ∑ I ( ∑ t = 1 T l o g b i t ( o t ) ) P ( O , I ∣ λ ‾ ) Q(\lambda,\overline{\lambda})=\sum_Ilog\pi_{i_1}P(O,I|\overline{\lambda})+\sum_I(\sum_{t=1}^{T-1}loga_{i_t,i_{t+1}})P(O,I|\overline{\lambda})+\sum_I(\sum_{t=1}^Tlogb_{i_t}(o_t))P(O,I|\overline{\lambda}) Q(λ,λ)=Ilogπi1P(O,Iλ)+I(t=1T1logait,it+1P(O,Iλ)+I(t=1Tlogbit(ot)P(O,Iλ)

    三部分分别求最大值

    • 第一部分:

    ∑ I log ⁡ π i 1 P ( O , I ∣ λ ‾ ) = ∑ i 1 ∑ i 2 ⋯ ∑ i T log ⁡ π i 1 P ( O , I ∣ λ ‾ ) = ∑ i 1 log ⁡ π i 1 ∑ i 2 ⋯ ∑ i T P ( O , i 1 , i 2 , ⋯   , i T ∣ λ ‾ ) = ∑ i 1 log ⁡ π i 1 P ( O , i 1 ∣ λ ‾ ) = ∑ i = 1 N log ⁡ π i P ( O , i 1 = q i ∣ λ ‾ ) \begin{aligned} \sum_I\log \pi_{i_1}P(O,I|\overline{\lambda})&=\sum_{i_1}\sum_{i_2}\cdots\sum_{i_T}\log\pi_{i_1}P(O,I|\overline{\lambda})\\ &=\sum_{i_1}\log\pi_{i_1}\sum_{i_2}\cdots\sum_{i_T}P(O,i_1,i_2,\cdots,i_T|\overline{\lambda})\\ &=\sum_{i_1}\log\pi_{i_1}P(O,i_1|\overline{\lambda})\\ &=\sum_{i=1}^N\log\pi_iP(O,i_1=q_i|\overline{\lambda}) \end{aligned} Ilogπi1P(O,Iλ)=i1i2iTlogπi1P(O,Iλ)=i1logπi1i2iTP(O,i1,i2,,iTλ)=i1logπi1P(O,i1λ)=i=1NlogπiP(O,i1=qiλ)
    由于 π i \pi_i πi满足 ∑ i = 1 N π i = 1 \sum_{i=1}^N\pi_i=1 i=1Nπi=1,故可应用拉格朗日乘子法,相应的拉格朗日函数为:

    ∑ i = 1 N log ⁡ π i P ( O , i 1 = q i ∣ λ ‾ ) + γ ( ∑ i = 1 N π i − 1 ) \sum_{i=1}^N\log\pi_iP(O,i_1=q_i|\overline{\lambda})+\gamma(\sum_{i=1}^N\pi_i-1) i=1NlogπiP(O,i1=qiλ)+γ(i=1Nπi1)
    π i \pi_i πi求偏导

    ∂ ∂ π i [ ∑ i = 1 N log ⁡ π i P ( O , i 1 = q i ∣ λ ‾ ) + γ ( ∑ i = 1 N π i − 1 ) ] = 0 1 π i P ( O , i 1 = q i ∣ λ ‾ ) + γ = 0 P ( O , i 1 = q i ∣ λ ‾ ) + γ π i = 0 γ = γ ∑ i = 1 N π i = − ∑ i = 1 N P ( O , i 1 = q i ∣ λ ‾ ) = − P ( O ∣ λ ‾ ) \begin{aligned} &\frac{\partial}{\partial\pi_i}[\sum_{i=1}^N\log\pi_iP(O,i_1=q_i|\overline{\lambda})+\gamma(\sum_{i=1}^N\pi_i-1)]=0\\ &\frac{1}{\pi_i}P(O,i_1=q_i|\overline{\lambda})+\gamma=0\\ &P(O,i_1=q_i|\overline{\lambda})+\gamma\pi_i=0\\ &\gamma=\gamma\sum_{i=1}^N\pi_i=-\sum_{i=1}^NP(O,i_1=q_i|\overline{\lambda})=-P(O|\overline{\lambda}) \end{aligned} πi[i=1NlogπiP(O,i1=qiλ)+γ(i=1Nπi1)]=0πi1P(O,i1=qiλ)+γ=0P(O,i1=qiλ)+γπi=0γ=γi=1Nπi=i=1NP(O,i1=qiλ)=P(Oλ)
    带入可得 π i = P ( O , i 1 = q i ∣ λ ‾ ) P ( O ∣ λ ‾ ) = γ 1 ( i ) \pi_i=\frac{P(O,i_1=q_i|\overline{\lambda})}{P(O|\overline{\lambda})}=\gamma_1(i) πi=P(Oλ)P(O,i1=qiλ)=γ1(i)

    • 第二部分:

    ∑ I ( ∑ t = 1 T − 1 log ⁡ a i t , i t + 1 ) P ( O , I ∣ λ ‾ ) = ∑ t = 1 T − 1 ∑ i = 1 N ∑ j = 1 N log ⁡ a i t , i t + 1 ) P ( O , i t = q i , i t + 1 = q j ∣ λ ‾ ) \sum_I(\sum_{t=1}^{T-1}\log a_{i_t,i_{t+1}})P(O,I|\overline{\lambda})=\sum_{t=1}^{T-1}\sum_{i=1}^N\sum_{j=1}^N\log a_{i_t,i_{t+1}})P(O,i_t=q_i,i_{t+1}=q_j|\overline{\lambda}) I(t=1T1logait,it+1P(O,Iλ)=t=1T1i=1Nj=1Nlogait,it+1P(O,it=qi,it+1=qjλ)
    根据 ∑ j = 1 N a i j = 1 \sum_{j=1}^Na_{ij}=1 j=1Naij=1,应用拉格朗日乘子法可得:

    a i j = ∑ t = 1 T − 1 P ( O , i t = q i , i t + 1 = q j ∣ λ ‾ ) ∑ t = 1 T − 1 P ( O , i t = q i ∣ λ ‾ ) = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 γ t ( i ) a_{ij}=\frac{\sum_{t=1}^{T-1}P(O,i_t=q_i,i_{t+1}=q_j|\overline{\lambda})}{\sum_{t=1}^{T-1}P(O,i_t=q_i|\overline{\lambda})}=\frac{\sum_{t=1}^{T-1}\xi_t(i,j)}{\sum_{t=1}^{T-1}\gamma_t(i)} aij=t=1T1P(O,it=qiλ)t=1T1P(O,it=qi,it+1=qjλ)=t=1T1γt(i)t=1T1ξt(i,j)

    • 第三部分:

    ∑ I ( ∑ t = 1 T log ⁡ b i t ( o t ) ) P ( O , I ∣ λ ‾ ) = ∑ t = 1 T ∑ i = 1 N log ⁡ b i ( o t ) P ( O , i t = q i ∣ λ ‾ ) \sum_I(\sum_{t=1}^T\log b_{i_t}(o_t))P(O,I|\overline{\lambda})=\sum_{t=1}^T\sum_{i=1}^N\log b_i(o_t)P(O,i_t=q_i|\overline{\lambda}) I(t=1Tlogbit(ot)P(O,Iλ)=t=1Ti=1Nlogbi(ot)P(O,it=qiλ)
    根据 ∑ k = 1 K b i ( k ) = 1 \sum_{k=1}^Kb_i(k)=1 k=1Kbi(k)=1,应用拉格朗日乘子法:

    ∂ ∂ b i ( k ) [ ∑ t = 1 T ∑ i = 1 N log ⁡ b i ( o t ) P ( O , i t = q i ∣ λ ‾ ) + γ ( ∑ k = 1 K b i ( k ) − 1 ) ] = 0 ∑ t = 1 T 1 b i ( o t ) ∂ b i ( o t ) ∂ b i ( k ) P ( O , i t = q i ∣ λ ‾ ) + γ = 0 ∑ t = 1 T 1 b i ( o t ) I ( o t = v k ) P ( O , i t = q i ∣ λ ‾ ) + γ = 0 ∑ t = 1 T 1 b i ( k ) I ( o t = v k ) P ( O , i t = q i ∣ λ ‾ ) + γ = 0 ∑ t = 1 T I ( o t = v k ) P ( O , i t = q i ∣ λ ‾ ) + γ b i ( k ) = 0 ∑ k = 1 K ∑ t = 1 T I ( o t = v k ) P ( O , i t = q i ∣ λ ‾ ) + ∑ k = 1 K γ b i ( k ) = 0 ∑ t = 1 T P ( O , i t = q i ∣ λ ‾ ) ∑ k = 1 K I ( o t = v k ) + ∑ k = 1 K γ b i ( k ) = 0 γ = − ∑ t = 1 T P ( O , i t = q i ∣ λ ‾ ) b i ( k ) = ∑ t = 1 T I ( o t = v k ) P ( O , i t = q i ∣ λ ‾ ) ∑ t = 1 T P ( O , i t = q i ∣ λ ‾ ) = ∑ t = 1 T I ( o t = v k ) γ t ( i ) ∑ t = 1 T γ t ( i ) \begin{aligned} &\frac{\partial}{\partial b_i(k)}[\sum_{t=1}^T\sum_{i=1}^N\log b_i(o_t)P(O,i_t=q_i|\overline{\lambda})+\gamma(\sum_{k=1}^Kb_i(k)-1)]=0\\ &\sum_{t=1}^T\frac{1}{b_i(o_t)}\frac{\partial b_i(o_t)}{\partial b_i(k)}P(O,i_t=q_i|\overline{\lambda})+\gamma=0\\ &\sum_{t=1}^T\frac{1}{b_i(o_t)}I(o_t=v_k)P(O,i_t=q_i|\overline{\lambda})+\gamma=0\\ &\sum_{t=1}^T\frac{1}{b_i(k)}I(o_t=v_k)P(O,i_t=q_i|\overline{\lambda})+\gamma=0\\ &\sum_{t=1}^TI(o_t=v_k)P(O,i_t=q_i|\overline{\lambda})+\gamma b_i(k)=0\\ &\sum_{k=1}^K\sum_{t=1}^TI(o_t=v_k)P(O,i_t=q_i|\overline{\lambda})+\sum_{k=1}^K\gamma b_i(k)=0\\ &\sum_{t=1}^TP(O,i_t=q_i|\overline{\lambda})\sum_{k=1}^KI(o_t=v_k)+\sum_{k=1}^K\gamma b_i(k)=0\\ &\gamma=-\sum_{t=1}^TP(O,i_t=q_i|\overline{\lambda})\\ &b_i(k)=\frac{\sum_{t=1}^TI(o_t=v_k)P(O,i_t=q_i|\overline{\lambda})}{\sum_{t=1}^TP(O,i_t=q_i|\overline{\lambda})}=\frac{\sum_{t=1}^TI(o_t=v_k)\gamma_t(i)}{\sum_{t=1}^T\gamma_t(i)} \end{aligned} bi(k)[t=1Ti=1Nlogbi(ot)P(O,it=qiλ)+γ(k=1Kbi(k)1)]=0t=1Tbi(ot)1bi(k)bi(ot)P(O,it=qiλ)+γ=0t=1Tbi(ot)1I(ot=vk)P(O,it=qiλ)+γ=0t=1Tbi(k)1I(ot=vk)P(O,it=qiλ)+γ=0t=1TI(ot=vk)P(O,it=q

你可能感兴趣的:(机器学习,机器学习,概率论)