统计学习方法-隐马尔可夫-公式推导

统计学习方法-隐马尔可夫-公式推导

  • 状态序列概率 P ( I ∣ λ ) P(I|\lambda) P(Iλ)
  • 已知状态序列下的观测序列概率 P ( O ∣ I , λ ) P(O|I, \lambda) P(OI,λ)
  • 完全数据序列概率 P ( O , I ∣ λ ) P(O, I|\lambda) P(O,Iλ)
  • 未知状态序列下的观测序列概率 P ( O ∣ λ ) P(O| \lambda) P(Oλ)

状态序列概率 P ( I ∣ λ ) P(I|\lambda) P(Iλ)

长度为 T T T的状态序列 I = ( i 1 , i 2 , ⋯   , i T ) { I } = (i_1, i_2, \cdots, i_T) I=(i1,i2,,iT)的概率 P ( I ∣ λ ) P(I|{\lambda}) P(Iλ)可表示为
P ( I ∣ λ ) = π i 1 a i 1 i 2 a i 2 i 3 ⋯ a i T − 1 i T (1) P(I|{\lambda}) = {{\pi}_{i_1} {a_{i_1 i_2}} {a_{i_2 i_3}} \cdots {a_{i_{T-1} i_{T}}}} \tag{1} P(Iλ)=πi1ai1i2ai2i3aiT1iT(1)

推导过程如下:
step1:
P ( I ∣ λ ) = P ( i 1 , i 2 , ⋯   , i T ∣ λ ) P(I|{\lambda}) = P(i_1, i_2, \cdots, i_T | {\lambda}) P(Iλ)=P(i1,i2,,iTλ)
根据上式,首先要明白的是概率 P ( I ∣ λ ) P(I|{\lambda}) P(Iλ)本质上为状态 I = ( i 1 , i 2 , ⋯   , i T ) { I } = (i_1, i_2, \cdots, i_T) I=(i1,i2,,iT)联合概率,因而可以写成等式右端的形式。

step2:
根据联合概率和条件概率的关系
P ( A B ) = P ( A / B ) P ( B ) (2) P(AB) = P(A/B)P(B) \tag{2} P(AB)=P(A/B)P(B)(2)
将概率 P ( i 1 , i 2 , ⋯   , i T ∣ λ ) P(i_1, i_2, \cdots, i_T | {\lambda}) P(i1,i2,,iTλ) 展开为
P ( i 1 , i 2 , ⋯   , i T ∣ λ ) = P ( i T ∣ i T − 1 , ⋯   , i 1 , λ ) P ( i T − 1 , i T − 2 , ⋯   , i 1 ∣ λ ) P ( i T − 1 , i T − 2 , ⋯   , i 1 ∣ λ ) = P ( i T − 2 ∣ i T − 3 , ⋯   , i 1 , λ ) P ( i T − 2 , i T − 3 , ⋯   , i 1 ∣ λ ) ⋯ P ( i 2 , i 1 ∣ λ ) = P ( i 2 ∣ i 1 , λ ) (3) P(i_1, i_2, \cdots, i_T | {\lambda}) = P(i_T | i_{T-1}, \cdots, i_1 , {\lambda}) P(i_{T-1}, i_{T-2}, \cdots, i_1 | {\lambda}) \\[0.5em] P(i_{T-1}, i_{T-2}, \cdots, i_1 | {\lambda}) = P(i_{T-2} | i_{T-3}, \cdots, i_1 , {\lambda}) P(i_{T-2}, i_{T-3}, \cdots, i_1 | {\lambda}) \\[0.5em] {\cdots} \\[0.2em] P(i_2, i_1 | {\lambda}) = P(i_2 | i_1 , {\lambda}) \tag{3} P(i1,i2,,iTλ)=P(iTiT1,,i1,λ)P(iT1,iT2,,i1λ)P(iT1,iT2,,i1λ)=P(iT2iT3,,i1,λ)P(iT2,iT3,,i1λ)P(i2,i1λ)=P(i2i1,λ)(3)

可继续使用式(2)对概率 P ( i 1 , i 2 , ⋯   , i T ∣ λ ) P(i_1, i_2, \cdots, i_T | {\lambda}) P(i1,i2,,iTλ) 展开为
P ( i 1 , i 2 , ⋯   , i T ∣ λ ) = P ( i T ∣ i T − 1 ⋯   , i 1 , λ ) P ( i T − 2 ∣ i T − 3 , ⋯   , i 1 , λ ) ⋯ P ( i 2 ∣ i 1 , λ ) P ( i 1 ∣ λ ) (4) P(i_1, i_2, \cdots, i_T | {\lambda}) = P(i_T | i_{T-1} \cdots, i_1 , {\lambda}) P(i_{T-2} | i_{T-3} , \cdots, i_1 , {\lambda}) {\cdots} P(i_2|i_1, {\lambda}) P(i_1 | {\lambda}) \tag{4} P(i1,i2,,iTλ)=P(iTiT1,i1,λ)P(iT2iT3,,i1,λ)P(i2i1,λ)P(i1λ)(4)

step3:
根据隐马尔可夫的定义,隐马尔可夫模型有两个基本假设
齐次马尔可夫性假设为其中之一,可简单描述为:

齐次马尔可夫性假设
即假设隐藏的马尔可夫链在任意时刻 t t t的状态只依赖于前一时刻的状态,与其他时刻的状态及观测无关,也与时刻 t t t无关。
P ( i t ∣ i t − 1 , o t − 1 , ⋯   , i 1 , o 1 ) = P ( i t ∣ i t − 1 ) , t = 1 , 2 , ⋯   , T (5) P(i_t | i_{t-1}, o_{t-1}, \cdots, i_1, o_1) = P(i_t | i_{t-1}), t=1,2,\cdots, T \tag{5} P(itit1,ot1,,i1,o1)=P(itit1),t=1,2,,T(5)
以上内容参考统计学习方法(李航)

根据隐马尔可夫的齐次性可得:
P ( i T ∣ i T − 1 , ⋯   , i 1 , λ ) = P ( i T ∣ i T − 1 , λ ) P ( i T − 1 ∣ i T − 2 , ⋯   , i 1 , λ ) = P ( i T − 1 ∣ i T − 2 , λ ) ⋯ P ( i 3 ∣ i 2 , i 1 , λ ) = P ( i 3 ∣ i 2 , λ ) (6) P(i_T | i_{T-1}, \cdots, i_1 , {\lambda}) = P(i_{T} | i_{T-1} , {\lambda}) \\[0.5em] P(i_{T-1} | i_{T-2}, \cdots, i_1 , {\lambda}) = P(i_{T-1} | i_{T-2} , {\lambda}) \\[0.5em] {\cdots} \\[0.2em] P(i_3 | i_2, i_1, {\lambda}) = P(i_3 | i_2, {\lambda}) \tag{6} P(iTiT1,,i1,λ)=P(iTiT1,λ)P(iT1iT2,,i1,λ)=P(iT1iT2,λ)P(i3i2,i1,λ)=P(i3i2,λ)(6)

step4:
将式(6)代入到式(4),可得
P ( I ∣ λ ) = P ( i 1 , i 2 , ⋯   , i T ∣ λ ) = P ( i T ∣ i T − 1 , λ ) P ( i T − 1 ∣ i T − 2 , λ ) ⋯ P ( i 2 ∣ i 1 , λ ) P ( i 1 ∣ λ ) (7) P(I|{\lambda}) = P(i_1, i_2, \cdots, i_T | {\lambda}) = P(i_T | i_{T-1} , {\lambda}) P(i_{T-1} | i_{T-2} , {\lambda}) {\cdots} P(i_2 | i_1, {\lambda}) P(i_1 | {\lambda}) \tag{7} P(Iλ)=P(i1,i2,,iTλ)=P(iTiT1,λ)P(iT1iT2,λ)P(i2i1,λ)P(i1λ)(7)

因此
P ( I ∣ λ ) = π i 1 a i 1 i 2 a i 2 i 3 ⋯ a i T − 1 i T (8) P(I|{\lambda}) = {{\pi}_{i_1} {a_{i_1 i_2}} {a_{i_2 i_3}} \cdots {a_{i_{T-1} i_{T}}}} \tag{8} P(Iλ)=πi1ai1i2ai2i3aiT1iT(8)

已知状态序列下的观测序列概率 P ( O ∣ I , λ ) P(O|I, \lambda) P(OI,λ)

对固定的状态序列 I = ( i 1 , i 2 , ⋯   , i T ) { I } = (i_1, i_2, \cdots, i_T) I=(i1,i2,,iT),其对应的观测序列 O = ( o 1 , o 2 , ⋯   , o T ) {O} = (o_1, o_2, \cdots, o_T) O=(o1,o2,,oT)的概率 P ( O ∣ I , λ ) P(O | I,{\lambda}) P(OI,λ)

P ( O ∣ I , λ ) = b i 1 ( o 1 ) b i 2 ( o 2 ) ⋯ b i T ( o T ) (2.1) P(O|I, {\lambda}) = b_{i_1}(o_1) b_{i_2}(o_2) {\cdots} b_{i_T}(o_T) \tag{2.1} P(OI,λ)=bi1(o1)bi2(o2)biT(oT)(2.1)

推导过程如下:

step1:
P ( O ∣ I , λ ) = P ( o 1 , o 2 , ⋯   , o T ∣ I , λ ) (2.2) P(O|I, {\lambda}) = P(o_1, o_2, \cdots, o_T | I , {\lambda}) \tag{2.2} P(OI,λ)=P(o1,o2,,oTI,λ)(2.2)
根据上式,首先要明白的是概率 P ( I ∣ λ ) P(I|{\lambda}) P(Iλ)本质上为 O = ( o 1 , o 2 , ⋯   , o T ) { O } = (o_1, o_2, \cdots, o_T) O=(o1,o2,,oT)联合概率,因而可以写成等式右端的形式。

step2:
根据联合概率和条件概率的关系 P ( A B ) = P ( A / B ) P ( B ) P(AB) = P(A/B)P(B) P(AB)=P(A/B)P(B)
将概率 P ( o 1 , o 2 , ⋯   , o T ∣ I , λ ) P(o_1, o_2, \cdots, o_T | I , {\lambda}) P(o1,o2,,oTI,λ) 展开为
P ( o 1 , o 2 , ⋯   , o T ∣ I , λ ) = P ( o T ∣ o T − 1 , o T − 2 , ⋯   , o 1 , I , λ ) P ( o T − 1 , o T − 2 , ⋯   , o 1 ∣ I , λ ) P ( o T − 1 , o T − 2 , ⋯   , o 1 ∣ I , λ ) = P ( o T − 1 ∣ o T − 2 , o T − 3 , ⋯   , o 1 , I , λ ) P ( o T − 2 , o T − 3 , ⋯   , o 1 ∣ I , λ ) ⋯ P ( o 2 , o 1 ∣ I , λ ) = P ( o 2 ∣ o 1 , I , λ ) (2.3) P(o_1, o_2, \cdots, o_T | I , {\lambda}) = P(o_T | o_{T-1}, o_{T-2}, \cdots, o_1, I, {\lambda}) P(o_{T-1}, o_{T-2}, \cdots, o_1 | I, {\lambda}) \\[0.5em] P(o_{T-1}, o_{T-2}, \cdots, o_1 | I, {\lambda}) = P(o_{T-1} | o_{T-2}, o_{T-3}, \cdots, o_1, I, {\lambda}) P(o_{T-2}, o_{T-3}, \cdots, o_1 | I, {\lambda}) \\[0.5em] {\cdots} \\[0.2em] P(o_2, o_1 | I, {\lambda}) = P(o_2 | o_1 , I, {\lambda}) \tag{2.3} P(o1,o2,,oTI,λ)=P(oToT1,oT2,,o1,I,λ)P(oT1,oT2,,o1I,λ)P(oT1,oT2,,o1I,λ)=P(oT1oT2,oT3,,o1,I,λ)P(oT2,oT3,,o1I,λ)P(o2,o1I,λ)=P(o2o1,I,λ)(2.3)

将上式代入到式(2.2),可更新概率 P ( o 1 , o 2 , ⋯   , o T ∣ I , λ ) P(o_1, o_2, \cdots, o_T | I , {\lambda}) P(o1,o2,,oTI,λ)

P ( o 1 , o 2 , ⋯   , o T ∣ I , λ ) = P ( o T ∣ o T − 1 , o T − 2 , ⋯   , o 1 , I , λ ) P ( o T − 1 ∣ o T − 2 , o T − 3 , ⋯   , o 1 , I , λ ) ⋯ P ( o 2 ∣ o 1 , λ ) P ( o 1 ∣ λ ) (2.4) P(o_1, o_2, \cdots, o_T | I, {\lambda}) = P(o_T | o_{T-1}, o_{T-2}, \cdots, o_1, I, {\lambda}) P(o_{T-1} | o_{T-2}, o_{T-3}, \cdots, o_1, I, {\lambda}) {\cdots} P(o_2 | o_1 , {\lambda}) P(o_1 | {\lambda}) \tag{2.4} P(o1,o2,,oTI,λ)=P(oToT1,oT2,,o1,I,λ)P(oT1oT2,oT3,,o1,I,λ)P(o2o1,λ)P(o1λ)(2.4)

step3:
根据隐马尔可夫的定义,隐马尔可夫模型的另一个假设为观测独立性假设为其中之一,可简单描述为:

观测独立性假设
即假设任意时刻的观测只依赖于该时刻的马尔科夫链的状态,其其他状态和观测无关
P ( o t ∣ i T , o T , ⋯ , i t , o t , ⋯   , i 1 , o 1 ) = P ( o t ∣ i t ) , t = 1 , 2 , ⋯   , T (2.5) P(o_t | i_T, o_T, {\cdots}, i_t, o_t, \cdots, i_1, o_1) = P(o_t | i_t), t=1,2,\cdots, T \tag{2.5} P(otiT,oT,,it,ot,,i1,o1)=P(otit),t=1,2,,T(2.5)
以上内容参考统计学习方法(李航)

根据隐马尔可夫的观测独立性可得:
P ( o T ∣ o T − 1 , o T − 2 , ⋯   , o 1 , I , λ ) = P ( o T ∣ i T , λ ) P ( o T − 1 ∣ o T − 2 , o T − 3 , ⋯   , o 1 , I , λ ) = P ( o T − 1 ∣ i T − 1 , λ ) ⋯ P ( o 2 ∣ o 1 , I , λ ) = P ( o 2 ∣ i 2 , λ ) (2.6) P(o_T | o_{T-1}, o_{T-2}, \cdots, o_1, I, {\lambda}) = P(o_{T} | i_{T} , {\lambda}) \\[0.5em] P(o_{T-1} | o_{T-2}, o_{T-3}, \cdots, o_1, I, {\lambda}) = P(o_{T-1} | i_{T-1} , {\lambda}) \\[0.5em] {\cdots} \\[0.2em] P(o_2 | o_1 , I, {\lambda}) = P(o_2 | i_2, {\lambda}) \tag{2.6} P(oToT1,oT2,,o1,I,λ)=P(oTiT,λ)P(oT1oT2,oT3,,o1,I,λ)=P(oT1iT1,λ)P(o2o1,I,λ)=P(o2i2,λ)(2.6)

step4:
将式(2.6)代入到式(2.4),可得
P ( O ∣ I , λ ) = P ( o 1 , o 2 , ⋯   , o T ∣ I , λ ) = P ( o T ∣ i T , λ ) P ( o T − 1 ∣ i T − 1 , λ ) ⋯ P ( o 2 ∣ i 2 , λ ) P ( o 1 ∣ i 1 , λ ) (2.7) P(O | I, {\lambda}) = P(o_1, o_2, \cdots, o_T | I , {\lambda}) = P(o_{T} | i_{T} , {\lambda}) P(o_{T-1} | i_{T-1} , {\lambda}) {\cdots} P(o_2 | i_2, {\lambda}) P(o_1 | i_1, {\lambda}) \tag{2.7} P(OI,λ)=P(o1,o2,,oTI,λ)=P(oTiT,λ)P(oT1iT1,λ)P(o2i2,λ)P(o1i1,λ)(2.7)

因此
P ( O ∣ I , λ ) = b i 1 ( o 1 ) b i 2 ( o 2 ) ⋯ b i T ( o T ) (2.8) P(O|I, {\lambda}) = b_{i_1}(o_1) b_{i_2}(o_2) {\cdots} b_{i_T}(o_T) \tag{2.8} P(OI,λ)=bi1(o1)bi2(o2)biT(oT)(2.8)

完全数据序列概率 P ( O , I ∣ λ ) P(O, I|\lambda) P(O,Iλ)

完全数据定义为 { I , O } \{I,O\} {I,O},其中 I I I代表状态序列, O O O代表观测序列
则完全数据出现的概率 P ( O , I ∣ λ ) P(O,I | {\lambda}) P(O,Iλ)
P ( O , I ∣ λ ) = P ( O ∣ I , λ ) P ( I ∣ λ ) = π i 1 b i 1 ( o 1 ) a i 1 i 2 ⋯ a i T − 1 i T b i T ( o T ) P(O,I| \lambda) = P(O|I, \lambda) P(I | \lambda) = {{\pi}_{i_1} b_{i_1}(o_1) a_{i_1 i_2} \cdots a_{i_{T-1} i_T} b_{i_T}(o_T) } P(O,Iλ)=P(OI,λ)P(Iλ)=πi1bi1(o1)ai1i2aiT1iTbiT(oT)

推导过程如下:
参考式(1)和式(2.1),则易得以上结论

未知状态序列下的观测序列概率 P ( O ∣ λ ) P(O| \lambda) P(Oλ)

状态序列的概率 P ( O ∣ λ ) P(O| \lambda) P(Oλ)可展开为
P ( O ∣ λ ) = ∑ I P ( O ∣ I , λ ) P ( I ∣ λ ) = ∑ i 1 , i 2 , ⋯   , i T π i 1 b i 1 ( o 1 ) a i 1 i 2 ⋯ a i T − 1 i T b i T ( o T ) P(O| \lambda)=\sum \limits_{I} P(O|I, \lambda) P(I| \lambda) =\sum \limits_{i_1, i_2, \cdots, i_T} {{\pi}_{i_1} b_{i_1}(o_1) a_{i_1 i_2} \cdots a_{i_{T-1} i_T} b_{i_T}(o_T) } P(Oλ)=IP(OI,λ)P(Iλ)=i1,i2,,iTπi1bi1(o1)ai1i2aiT1iTbiT(oT)

推导过程如下:
step1:
根据全概率公式展开观测序列概率 P ( O ∣ λ ) P(O| \lambda) P(Oλ)
P ( O ∣ λ ) = ∑ I P ( O , I ∣ λ ) = ∑ I P ( O ∣ I , λ ) P ( I ∣ λ ) P(O| \lambda) = \sum \limits_{I} P(O, I | \lambda) = \sum \limits_{I} P(O|I, \lambda) P(I| \lambda) P(Oλ)=IP(O,Iλ)=IP(OI,λ)P(Iλ)

step2:
参考式(1)和式(2.1),则易得以上结论

你可能感兴趣的:(统计学习方法,贝叶斯推理,学习方法,人工智能,隐马尔可夫,HMM)