整理了李航和周志华书上的内容。
将隐马尔可夫模型中的变量分为两组:一组为状态变量 { y 1 , y 2 , … , y T } \left\{ {{y_1},{y_2}, \ldots ,{y_T}} \right\} {y1,y2,…,yT},其中 y i ∈ Y {y_i} \in {\Bbb Y} yi∈Y表示第 i i i时刻的系统状态,通常这个状态变量是隐藏的、不可观测的,因此状态变量也被称为隐变量。第二组变量是观测变量 { x 1 , x 2 , … , x n } \left\{ {{x_1},{x_2}, \ldots ,{x_n}} \right\} {x1,x2,…,xn},其中 x i ∈ X {x_i} \in {\Bbb X} xi∈X表示第 i i i时刻的观测值。状态变量 y i {y_i} yi在多个状态 { s 1 , s 2 , … , x N } \left\{ {{s_1},{s_2}, \ldots, {x_N}} \right\} {s1,s2,…,xN}之间切换,即 Y {\Bbb Y} Y的取值范围是 { s 1 , s 2 , … , s N } \left\{ {{s_1},{s_2}, \ldots ,{s_N}} \right\} {s1,s2,…,sN};观测变量在多个状态 { o 1 , o 2 , … , o M } \left\{ {{o_1},{o_2}, \ldots ,{o_M}} \right\} {o1,o2,…,oM}之间切换,即 X {\Bbb X} X的取值范围是 { o 1 , o 2 , … , o M } \left\{ {{o_1},{o_2}, \ldots ,{o_M}} \right\} {o1,o2,…,oM}。
如图表示了变量之间的依赖关系。在任意时刻,观测变量的取值仅依赖状态变量,即 x t x_t xt仅由 y t y_t yt确定,与其他状态变量的取值无关。同时, t t t时刻的状态变量仅依赖于 t − 1 t-1 t−1时刻的状态 y t − 1 {y_{t-1}} yt−1,与其余的状态变量无关。这就是“马尔科夫链”:系统的下一时刻的状态仅由当前状态决定,不依赖以往的任何状态。
除此之外,确定一个马尔科夫模型还需要三组参数:初始状态概率、状态转移概率以及输出观测概率。
初始状态概率:模型在初始时刻状态变量 y 1 y_1 y1为各个状态的概率,记为 Π = ( π 1 , π 2 , … , π N ) {\Pi } = \left( {{\pi _1},{\pi _2}, \ldots ,{\pi _N}} \right) Π=(π1,π2,…,πN),其中 π i = P ( y 1 = s i ) , 1 ⩽ i ⩽ N {\pi _i} = P\left( {{y_1} = s{}_i} \right),1 \leqslant i \leqslant N πi=P(y1=si),1⩽i⩽N表示模型的初始状态为 s i {s{}_i} si的概率。下文中也表示为 π y 1 = P ( y 1 = s i ) , 1 ⩽ i ⩽ N {\pi _{{y_1}}} = P({y_1} = {s_i}),1 \leqslant i \leqslant N πy1=P(y1=si),1⩽i⩽N。
状态转移概率:状态变量在各个状态转换的概率,记为矩阵 A = [ a i j ] N × N {\bf{A}} = {\left[ {{a_{ij}}} \right]_{N \times N}} A=[aij]N×N,其中 a i j = P ( y t + 1 = s j ∣ y t = s i ) , 1 ⩽ i , j ⩽ N {a_{ij}} = P\left( {{y_{t + 1}} = {s_j}|{y_t} = {s_i}} \right),1 \leqslant i,j \leqslant N aij=P(yt+1=sj∣yt=si),1⩽i,j⩽N表示任意时刻 t t t,若状态为 s i {{s_i}} si,则观测 o j {o_j} oj被获取的概率。下文中下标 i j ij ij会视情况而灵活变化,当 i i i变为 y t y_t yt时,表示此时的状态变量 y t = s i y_t=s_i yt=si;当 j j j变为 x t x_t xt时,表示此时的观测变量 x t = o j x_t=o_j xt=oj。
输出观测概率:在当前状态变量的前提下获取各个观测值的概率,记为矩阵 B = [ b i j ] N × M {\bf{B}} = {\left[ {{b_{ij}}} \right]_{N \times M}} B=[bij]N×M,其中 b i j = P ( x t = o j ∣ y t = s i ) , 1 ⩽ i ⩽ N , 1 ⩽ j ⩽ M {b_{ij}} = P\left( {{x_t} = {o_j}|{y_t} = {s_i}} \right),1 \leqslant i \leqslant N,1 \leqslant j \leqslant M bij=P(xt=oj∣yt=si),1⩽i⩽N,1⩽j⩽M表示任何时刻 t t t,若状态为 s i {{s_i}} si,则观测值 o j {{o_j}} oj被获取的概率。下文中下标 i j ij ij会视情况而灵活变化,当 i i i变为 y t y_t yt时,表示该时刻的状态变量 y t = s i y_t=s_i yt=si;当 j j j变为 x t x_t xt时,表示该时刻的状态变量 x t = s j x_t=s_j xt=sj。
在确定了状态空间 Y {\Bbb Y} Y、观测空间 X {\Bbb X} X和上述三组参数之后就确定了一个隐马尔可夫模型 λ = [ A , B , Π ] {\bf{\lambda }} = \left[ {{\bf{A}},{\bf{B}},\Pi } \right] λ=[A,B,Π],它按如下过程产生观测序列 { x 1 , x 2 , … , x T } \left\{ {{x_1},{x_2}, \ldots, {x_T}} \right\} {x1,x2,…,xT}:
1)设置 t = 1 t=1 t=1,根据初始状态概率 Π \Pi Π选择初始状态 y 1 y_1 y1;
2)根据状态 y t y_t yt和输出观测概率 B \bf{B} B选择观测变量取值 x t x_t xt;
3)根据状态 y t y_t yt和状态转移矩阵 A \bf{A} A转移模型状态,即确定 y t + 1 y_{t+1} yt+1;
4)若 t < T t<T t<T,设置 t = t + 1 t=t+1 t=t+1,并转到第2步,否则停止。
隐马尔可夫模型的三个基本问题:
1)给定模型 λ = [ A , B , Π ] {\bf{\lambda }} = \left[ {{\bf{A}},{\bf{B}},\Pi } \right] λ=[A,B,Π],如何计算观测序列 x = { x 1 , x 2 , ⋯   , x T } {\bf x}=\left\{ {{x_1},{x_2}, \cdots ,{x_T}} \right\} x={x1,x2,⋯,xT}的概率 P ( x ∣ λ ) P\left( {{\bf{x}}|{\bf{\lambda }}} \right) P(x∣λ),即如何评估模型与观测序列的匹配度?
2)给定模型 λ = [ A , B , Π ] {\bf{\lambda }} = \left[ {{\bf{A}},{\bf{B}},\Pi } \right] λ=[A,B,Π]和观测序列 x = { x 1 , x 2 , ⋯   , x T } x=\left\{ {{x_1},{x_2}, \cdots ,{x_T}} \right\} x={x1,x2,⋯,xT},如何找到与此观测序列最为匹配的状态序列 y = { y 1 , y 2 , ⋯   , y T } {\bf y}=\left\{ {{y_1},{y_2}, \cdots ,{y_T}} \right\} y={y1,y2,⋯,yT},即如何根据观测序列推断出隐藏的模型状态?
3)给定观测序列 x = { x 1 , x 2 , ⋯   , x T } {\bf x}=\left\{ {{x_1},{x_2}, \cdots, {x_T}} \right\} x={x1,x2,⋯,xT},如何调整模型参数 λ = [ A , B , Π ] {\bf{\lambda }} = \left[ {{\bf{A}},{\bf{B}},\Pi } \right] λ=[A,B,Π]使得该序列出现的概率 P ( x ∣ λ ) P\left( {{\bf{x}}|{\bf{\lambda }}} \right) P(x∣λ)最大,即如何训练模型?
给定模型为 λ = [ A , B , Π ] {\bf{\lambda }} = \left[ {{\bf{A}},{\bf{B}},\Pi } \right] λ=[A,B,Π]和观测序列 x = { x 1 , x 2 , ⋯   , x T } {\bf x}=\left\{ {{x_1},{x_2}, \cdots, {x_T}} \right\} x={x1,x2,⋯,xT}。对于状态序列 y = { y 1 , y 2 , ⋯   , y T } {\bf y}=\left\{ {{y_1},{y_2}, \cdots ,{y_T}} \right\} y={y1,y2,⋯,yT},每个时刻的状态变量 y t y_t yt都有 { s 1 , s 2 , … , x N } \left\{ {{s_1},{s_2}, \ldots, {x_N}} \right\} {s1,s2,…,xN}N种可能,直接计算法就是列举所有的 y {\bf y} y序列(共 N T {N^T} NT种序列),然后对所有可能的状态序列求和,得到 P ( x ∣ λ ) P\left( {{\bf{x}}|{\bf{\lambda }}} \right) P(x∣λ)。
某一状态序列 y = { y 1 , y 2 , ⋯   , y T } {\bf y}=\left\{ {{y_1},{y_2}, \cdots ,{y_T}} \right\} y={y1,y2,⋯,yT}的概率是 P ( y ∣ λ ) = π y 1 a y 1 y 2 a y 2 y 3 ⋯ a y T − 1 y T P\left( {{\bf{y}}|{\bf{\lambda }}} \right) = {\pi _{{y_1}}}{a_{{y_1}{y_2}}}{a_{{y_2}{y_3}}} \cdots {a_{{y_{T - 1}}{y_T}}} P(y∣λ)=πy1ay1y2ay2y3⋯ayT−1yT,以此为前提的观测序列是 x = { x 1 , x 2 , ⋯   , x T } {\bf x}=\left\{ {{x_1},{x_2}, \cdots, {x_T}} \right\} x={x1,x2,⋯,xT}的概率是 P ( x ∣ y , λ ) = b y 1 x 1 b y 2 x 2 ⋯ b y T x T P\left( {{\bf{x|y,\lambda }}} \right) = {b_{{y_1}{x_1}}}{b_{{y_2}{x_2}}} \cdots {b_{{y_T}{x_T}}} P(x∣y,λ)=by1x1by2x2⋯byTxT, x {\bf{x}} x和 y {\bf{y}} y同时出现的联合概率为 P ( x , y ∣ λ ) = P ( x ∣ y , λ ) P ( y ∣ λ ) = π y 1 b y 1 x 1 a y 1 y 2 b y 2 x 2 ⋯ a y T − 1 y T b y T x T P\left( {{\bf{x,y}}|{\bf{\lambda }}} \right) = P\left( {{\bf{x}}|{\bf{y}},{\bf{\lambda }}} \right)P\left( {{\bf{y}}|{\bf{\lambda }}} \right) = {\pi _{{y_1}}}{b_{{y_1}{x_1}}}{a_{{y_1}{y_2}}}{b_{{y_2}{x_2}}} \cdots {a_{{y_{T - 1}}{y_T}}}{b_{{y_T}{x_T}}} P(x,y∣λ)=P(x∣y,λ)P(y∣λ)=πy1by1x1ay1y2by2x2⋯ayT−1yTbyTxT。对所有的状态序列 y {\bf y} y(共 N T {N^T} NT种序列)求和,得到观测序列 x {\bf x} x的概率 P ( x ∣ λ ) P\left( {{\bf{x}}|{\bf{\lambda }}} \right) P(x∣λ),即 P ( x ∣ λ ) = ∑ y P ( x ∣ y , λ ) P ( y ∣ λ ) = ∑ y 1 , y 2 , ⋯ , y T π y 1 b y 1 x 1 a y 1 y 2 b y 2 x 2 ⋯ a y T − 1 y T b y T x T P\left( {{\bf{x}}|{\bf{\lambda }}} \right) = \sum\limits_{\bf{y}} {P\left( {{\bf{x}}|{\bf{y}},{\bf{\lambda }}} \right)P\left( {{\bf{y}}|{\bf{\lambda }}} \right)} = \sum\limits_{{{\bf{y}}_{\bf{1}}}{\bf{,}}{{\bf{y}}_{\bf{2}}}{\bf{,}} \cdots {\bf{,}}{{\bf{y}}_{\bf{T}}}} {{\pi _{{y_1}}}{b_{{y_1}{x_1}}}{a_{{y_1}{y_2}}}{b_{{y_2}{x_2}}} \cdots {a_{{y_{T - 1}}{y_T}}}{b_{{y_T}{x_T}}}} P(x∣λ)=y∑P(x∣y,λ)P(y∣λ)=y1,y2,⋯,yT∑πy1by1x1ay1y2by2x2⋯ayT−1yTbyTxT直接法计算量极大,是 O ( T N T ) O\left( {T{N^T}} \right) O(TNT)阶的。有效算法:前向-后向算法。
前向概率:给定隐马尔可夫模型 λ {\bf{\lambda }} λ,定义到时刻 t t t部分观测序列为 x 1 , x 2 , ⋯   , x t {x_1},{x_2}, \cdots ,{x_t} x1,x2,⋯,xt且状态为 s i {s_i} si的概率为前向概率,记作 α t ( i ) = P ( x 1 , x 2 , ⋯   , x t , y t = s i ∣ λ ) {\alpha _t}\left( i \right) = P\left( {{x_1},{x_2}, \cdots ,{x_t},{y_t} = {s_i}|{\bf{\lambda }}} \right) αt(i)=P(x1,x2,⋯,xt,yt=si∣λ)可以递推地前向概率 α t ( i ) {\alpha _t}\left( i \right) αt(i)及观测序列概率 P ( x ∣ λ ) P\left( {{\bf{x}}|{\bf{\lambda }}} \right) P(x∣λ)。
算法:
1)初值: α 1 ( i ) = π i b i x 1 , i = 1 , 2 , ⋯   , N {\alpha _1}\left( i \right) = {\pi _i}{b_{i{x_1}}},i = 1,2, \cdots ,N α1(i)=πibix1,i=1,2,⋯,N.
2)递推:对于 t = 1 , 2 , ⋯   , T − 1 t = 1,2, \cdots ,T - 1 t=1,2,⋯,T−1, α t + 1 ( i ) = [ ∑ j = 1 N α t ( j ) a j i ] b i x t + 1 , i = 1 , 2 , ⋯   , N {\alpha _{t + 1}}\left( i \right) = \left[ {\sum\limits_{j = 1}^N {{\alpha _{t}}\left( j \right){a_{ji}}} } \right]{b_{i{x_{t + 1}}}},i = 1,2, \cdots ,N αt+1(i)=[j=1∑Nαt(j)aji]bixt+1,i=1,2,⋯,N.
3)终止: P ( x ∣ λ ) = ∑ i = 1 N α T ( i ) P\left( {{\bf{x}}|{\bf{\lambda }}} \right) = \sum\limits_{i = 1}^N {{\alpha _T}\left( i \right)} P(x∣λ)=i=1∑NαT(i)
例:
考虑盒子和球模型 λ = ( A , B , π ) {\bf{\lambda }} = \left( {{\bf{A}},{\bf{B}},{\bf{\pi }}} \right) λ=(A,B,π),状态集合 S = { 1 , 2 , 3 } {S} = \left\{ {1,2,3} \right\} S={1,2,3},观测集合 O = { 红 , 白 } {O}= \left\{红 ,白 \right\} O={红,白}
A = [ 0.5 0.2 0.3 0.3 0.5 0.2 0.2 0.3 0.5 ] , B = [ 0.5 0.5 0.4 0.6 0.7 0.3 ] , Π = ( 0.2 , 0.4 , 0.4 ) T {\bf{A}}=\left[ \begin{matrix}0.5 & 0.2 & 0.3 \\0.3 & 0.5 & 0.2 \\0.2 & 0.3 & 0.5\end{matrix}\right],{\bf{B}}=\left[ \begin{matrix}0.5 & 0.5 \\0.4 & 0.6 \\0.7 & 0.3 \end{matrix}\right],\Pi={\left( {0.2,0.4,0.4} \right)^T} A=⎣⎡0.50.30.20.20.50.30.30.20.5⎦⎤,B=⎣⎡0.50.40.70.50.60.3⎦⎤,Π=(0.2,0.4,0.4)T设 T = 3 T=3 T=3,观测序列 x = { 红 , 白 , 红 } {\bf{x}} = \left\{ {红,白,红} \right\} x={红,白,红}。
s 1 s 2 s 3 s 1 0.5 0.2 0.3 s 2 0.3 0.5 0.2 s 3 0.2 0.3 0.5 o 1 o 2 s 1 0.5 0.5 s 2 0.4 0.6 s 3 0.7 0.3 \begin{matrix}&{s_1}&{s_2}&{s_3}\\{s_1}&0.5 & 0.2 & 0.3 \\{s_2}&0.3 & 0.5 & 0.2 \\{s_3}&0.2 & 0.3 & 0.5\end{matrix}\qquad \begin{matrix}&{o_1}&{o_2}&\\{s_1}&0.5 & 0.5 \\{s_2}&0.4 & 0.6 \\{s_3}&0.7 & 0.3 \end{matrix} s1s2s3s10.50.30.2s20.20.50.3s30.30.20.5s1s2s3o10.50.40.7o20.50.60.3
1)计算初值( t = 1 t=1 t=1)
α 1 ( 1 ) = π 1 b 1 x 1 = 0.2 × 0.5 = 0.10 {\alpha _1}\left( 1 \right) = {\pi _1}{b_{1{x_1}}} = 0.2 \times 0.5 = 0.10 α1(1)=π1b1x1=0.2×0.5=0.10 α 1 ( 2 ) = π 2 b 2 x 1 = 0.4 × 0.4 = 0.16 {\alpha _1}\left( 2 \right) = {\pi _2}{b_{2{x_1}}} = 0.4 \times 0.4 = 0.16 α1(2)=π2b2x1=0.4×0.4=0.16 α 1 ( 3 ) = π 3 b 3 x 1 = 0.4 × 0.7 = 0.28 {\alpha _1}\left( 3 \right) = {\pi _3}{b_{3{x_1}}} = 0.4 \times 0.7 = 0.28 α1(3)=π3b3x1=0.4×0.7=0.28
2)递推计算
t = 2 : t=2: t=2:
α 2 ( 1 ) = [ ∑ i = 1 3 α 1 ( i ) a i 1 ] b 1 x 2 = ( 0.10 × 0.5 + 0.16 × 0.3 + 0.28 × 0.2 ) × 0.5 = 0.077 {\alpha _2}\left( 1 \right) = \left[ {\sum\limits_{i = 1}^3 {{\alpha _1}\left( i \right){a_{i1}}} } \right]{b_{1{x_2}}} = \left( {0.10 \times 0.5 + 0.16 \times 0.3 + 0.28 \times 0.2} \right) \times 0.5 = 0.077 α2(1)=[i=1∑3α1(i)ai1]b1x2=(0.10×0.5+0.16×0.3+0.28×0.2)×0.5=0.077 α 2 ( 2 ) = [ ∑ i = 1 3 α 1 ( i ) a i 2 ] b 2 x 2 = ( 0.10 × 0.2 + 0.16 × 0.5 + 0.28 × 0.3 ) × 0.6 = 0.1104 {\alpha _2}\left( 2 \right) = \left[ {\sum\limits_{i = 1}^3 {{\alpha _1}\left( i \right){a_{i2}}} } \right]{b_{2{x_2}}} = \left( {0.10 \times 0.2 + 0.16 \times 0.5 + 0.28 \times 0.3} \right) \times 0.6 = 0.1104 α2(2)=[i=1∑3α1(i)ai2]b2x2=(0.10×0.2+0.16×0.5+0.28×0.3)×0.6=0.1104 α 2 ( 3 ) = [ ∑ i = 1 3 α 1 ( i ) a i 3 ] b 3 x 2 = ( 0.10 × 0.3 + 0.16 × 0.2 + 0.28 × 0.5 ) × 0.3 = 0.0606 {\alpha _2}\left( 3 \right) = \left[ {\sum\limits_{i = 1}^3 {{\alpha _1}\left( i \right){a_{i3}}} } \right]{b_{3{x_2}}} = \left( {0.10 \times 0.3 + 0.16 \times 0.2 + 0.28 \times 0.5} \right) \times 0.3 = 0.0606 α2(3)=[i=1∑3α1(i)ai3]b3x2=(0.10×0.3+0.16×0.2+0.28×0.5)×0.3=0.0606
t = 3 : t=3: t=3:
α 3 ( 1 ) = [ ∑ i = 1 3 α 2 ( i ) a i 1 ] b 1 x 3 = ( 0.077 × 0.5 + 0.1104 × 0.3 + 0.0606 × 0.2 ) × 0.5 = 0.04187 {\alpha _3}\left( 1 \right) = \left[ {\sum\limits_{i = 1}^3 {{\alpha _2}\left( i \right){a_{i1}}} } \right]{b_{1{x_3}}} = \left( {0.077 \times 0.5 + 0.1104 \times 0.3 + 0.0606 \times 0.2} \right) \times 0.5 = 0.04187 α3(1)=[i=1∑3α2(i)ai1]b1x3=(0.077×0.5+0.1104×0.3+0.0606×0.2)×0.5=0.04187 α 3 ( 2 ) = [ ∑ i = 1 3 α 2 ( i ) a i 2 ] b 2 x 3 = ( 0.077 × 0.2 + 0.1104 × 0.5 + 0.0606 × 0.3 ) × 0.4 = 0.035512 {\alpha _3}\left( 2 \right) = \left[ {\sum\limits_{i = 1}^3 {{\alpha _2}\left( i \right){a_{i2}}} } \right]{b_{2{x_3}}} = \left( {0.077 \times 0.2 + 0.1104 \times 0.5 + 0.0606 \times 0.3} \right) \times 0.4 = 0.035512 α3(2)=[i=1∑3α2(i)ai2]b2x3=(0.077×0.2+0.1104×0.5+0.0606×0.3)×0.4=0.035512 α 3 ( 3 ) = [ ∑ i = 3 3 α 2 ( i ) a i 3 ] b 3 x 3 = ( 0.077 × 0.3 + 0.1104 × 0.2 + 0.0606 × 0.5 ) × 0.5 = 0.052836 {\alpha _3}\left( 3 \right) = \left[ {\sum\limits_{i = 3}^3 {{\alpha _2}\left( i \right){a_{i3}}} } \right]{b_{3{x_3}}} = \left( {0.077 \times 0.3 + 0.1104 \times 0.2 + 0.0606 \times 0.5} \right) \times 0.5 = 0.052836 α3(3)=[i=3∑3α2(i)ai3]b3x3=(0.077×0.3+0.1104×0.2+0.0606×0.5)×0.5=0.052836
3)终止 P ( x ∣ λ ) = ∑ i = 1 3 α 3 ( i ) = 0.04187 + 0.35512 + 0.052836 = 0.130218 P\left( {{\bf{x}}|{\bf{\lambda }}} \right) = \sum\limits_{i = 1}^3 {{\alpha _3}\left( i \right) = 0.04187 + 0.35512 + 0.052836 = 0.130218} P(x∣λ)=i=1∑3α3(i)=0.04187+0.35512+0.052836=0.130218
这一部分的Java实现。
后向概率:给定隐马尔可夫模型 λ {\bf{\lambda }} λ,定义到时刻 t t t状态变量为 s i {s_i} si的条件下,从 t + 1 t+1 t+1到 T T T的观测序列为 x t + 1 , x t + 2 , ⋯ x T {x_{t + 1}},{x_{t + 2}}, \cdots {x_T} xt+1,xt+2,⋯xT的概率为后向概率,记作 β t ( i ) = P ( x t + 1 , x t + 2 , ⋯   , x T ∣ y t = s i , λ ) {\beta _t}\left( i \right) = P\left( {{x_{t + 1}},{x_{t + 2}}, \cdots, {x_T}|{y_t} = {s_i},{\bf{\lambda }}} \right) βt(i)=P(xt+1,xt+2,⋯,xT∣yt=si,λ)可以递推后向概率 β t {\beta _t} βt及观测序列概率 P ( x ∣ λ ) P\left( {{\bf{x}}|{\bf{\lambda }}} \right) P(x∣λ)。
1)初值: β T ( i ) = 1 , i = 1 , 2 , ⋯   , N {\beta _T}\left( i \right) = 1,i = 1,2, \cdots ,N βT(i)=1,i=1,2,⋯,N
2)递推:对 t = T − 1 , T − 2 , ⋯   , 1 t = T - 1,T - 2, \cdots ,1 t=T−1,T−2,⋯,1 β t ( i ) = ∑ j = 1 N a i j b j x t + 1 β t + 1 ( j ) , i = 1 , 2 , ⋯   , N {\beta _t}\left( i \right) = \sum\limits_{j = 1}^N {{a_{ij}}{b_{j{x_{t + 1}}}}{\beta _{t + 1}\left( j \right)},i = 1,2, \cdots ,N} βt(i)=j=1∑Naijbjxt+1βt+1(j),i=1,2,⋯,N
3)终止: P ( x ∣ λ ) = ∑ i = 1 N π i b i x 1 β 1 ( i ) P\left( {{\bf{x}}|{\bf{\lambda }}} \right) = \sum\limits_{i = 1}^N {{\pi _i}{b_{i{x_1}}}{\beta _1}\left( i \right)} P(x∣λ)=i=1∑Nπibix1β1(i)
例:
模型同上。
1)计算初值( t = 3 t=3 t=3) β 3 ( 1 ) = 1 , β 3 ( 2 ) = 1 , β 3 ( 3 ) = 1 {\beta _3}\left( 1 \right) = 1,{\beta _3}\left( 2 \right) = 1,{\beta _3}\left( 3 \right) = 1 β3(1)=1,β3(2)=1,β3(3)=1
2)递推计算
t = 2 : t=2: t=2: β 2 ( 1 ) = ∑ j = 1 3 a 1 j b j x 3 β 3 ( j ) = 0.5 × 0.5 × 1 + 0.2 × 0.4 × 1 + 0.3 × 0.7 × 1 = 0.54 {\beta _2}\left( 1 \right) = \sum\limits_{j = 1}^3 {{a_{1j}}{b_{j{x_3}}}{\beta _3}\left( j \right)} = 0.5 \times 0.5 \times 1 + 0.2 \times 0.4 \times 1 + 0.3 \times 0.7 \times 1 = 0.54 β2(1)=j=1∑3a1jbjx3β3(j)=0.5×0.5×1+0.2×0.4×1+0.3×0.7×1=0.54 β 2 ( 2 ) = ∑ j = 1 3 a 2 j b j x 3 β 3 ( j ) = 0.3 × 0.5 × 1 + 0.5 × 0.4 × 1 + 0.2 × 0.7 × 1 = 0.49 {\beta _2}\left( 2 \right) = \sum\limits_{j = 1}^3 {{a_{2j}}{b_{j{x_3}}}{\beta _3}\left( j \right)} = 0.3 \times 0.5 \times 1 + 0.5 \times 0.4 \times 1 + 0.2 \times 0.7 \times 1 = 0.49 β2(2)=j=1∑3a2jbjx3β3(j)=0.3×0.5×1+0.5×0.4×1+0.2×0.7×1=0.49 β 2 ( 3 ) = ∑ j = 1 3 a 3 j b j x 3 β 3 ( j ) = 0.2 × 0.5 × 1 + 0.3 × 0.4 × 1 + 0.5 × 0.7 × 1 = 0.57 {\beta _2}\left( 3 \right) = \sum\limits_{j = 1}^3 {{a_{3j}}{b_{j{x_3}}}{\beta _3}\left( j \right)} = 0.2 \times 0.5 \times 1 + 0.3 \times 0.4 \times 1 + 0.5 \times 0.7 \times 1 = 0.57 β2(3)=j=1∑3a3jbjx3β3(j)=0.2×0.5×1+0.3×0.4×1+0.5×0.7×1=0.57
t = 1 : t=1: t=1: β 1 ( 1 ) = ∑ j = 1 3 a 1 j b j x 2 β 2 ( j ) = 0.5 × 0.5 × 0.54 + 0.2 × 0.6 × 0.49 + 0.3 × 0.3 × 0.57 = 0.2451 {\beta _1}\left( 1 \right) = \sum\limits_{j = 1}^3 {{a_{1j}}{b_{j{x_2}}}{\beta _2}\left( j \right)} = 0.5 \times 0.5 \times 0.54 + 0.2 \times 0.6 \times 0.49 + 0.3 \times 0.3 \times 0.57 = 0.2451 β1(1)=j=1∑3a1jbjx2β2(j)=0.5×0.5×0.54+0.2×0.6×0.49+0.3×0.3×0.57=0.2451 β 1 ( 2 ) = ∑ j = 1 3 a 2 j b j x 2 β 2 ( j ) = 0.3 × 0.5 × 0.54 + 0.5 × 0.6 × 0.49 + 0.2 × 0.3 × 0.57 = 0.2622 {\beta _1}\left( 2 \right) = \sum\limits_{j = 1}^3 {{a_{2j}}{b_{j{x_2}}}{\beta _2}\left( j \right)} = 0.3 \times 0.5 \times 0.54 + 0.5 \times 0.6 \times 0.49 + 0.2 \times 0.3 \times 0.57 = 0.2622 β1(2)=j=1∑3a2jbjx2β2(j)=0.3×0.5×0.54+0.5×0.6×0.49+0.2×0.3×0.57=0.2622 β 1 ( 3 ) = ∑ j = 1 3 a 3 j b j x 2 β 2 ( j ) = 0.2 × 0.5 × 0.54 + 0.3 × 0.6 × 0.49 + 0.5 × 0.3 × 0.57 = 0.2277 {\beta _1}\left( 3 \right) = \sum\limits_{j = 1}^3 {{a_{3j}}{b_{j{x_2}}}{\beta _2}\left( j \right)} = 0.2 \times 0.5 \times 0.54 + 0.3 \times 0.6 \times 0.49 + 0.5 \times 0.3 \times 0.57 = 0.2277 β1(3)=j=1∑3a3jbjx2β2(j)=0.2×0.5×0.54+0.3×0.6×0.49+0.5×0.3×0.57=0.2277
3)终止 P ( x ∣ λ ) = 0.2 × 0.5 × 0.2451 + 0.4 × 0.4 × 0.2622 + 0.4 × 0.7 × 0.2277 = 0.130218 P\left( {{\bf{x}}|{\bf{\lambda }}} \right) = 0.2 \times 0.5 \times 0.2451 + 0.4 \times 0.4 \times 0.2622 + 0.4 \times 0.7 \times 0.2277 = 0.130218 P(x∣λ)=0.2×0.5×0.2451+0.4×0.4×0.2622+0.4×0.7×0.2277=0.130218
前向算法和后向算法的计算量都是 O ( N 2 T ) O\left( {{N^2}T} \right) O(N2T)阶的,可以将前向算法和后向算法统一写成 P ( x ∣ λ ) = ∑ i = 1 N ∑ j = 1 N α t ( i ) a i j b j x t + 1 β t + 1 ( j ) , t = 1 , 2 , ⋯   , T − 1 P\left( {x|\lambda } \right) = \sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {{\alpha _t}\left( i \right){a_{ij}}{b_{j{x_{t + 1}}}}{\beta _{t + 1}}\left( j \right)} ,t = 1,2, \cdots ,T - 1} P(x∣λ)=i=1∑Nj=1∑Nαt(i)aijbjxt+1βt+1(j),t=1,2,⋯,T−1
这一部分的Java实现。
1.给定模型 λ \lambda λ和观测 x {\bf{x}} x,在时刻 t t t处于状态 s i s_i si的概率,记 γ t ( i ) = P ( y t = s i ∣ x , λ ) {\gamma _t}\left( i \right) = P\left( {{y_t} = {s_i}|{\bf{x,\lambda }}} \right) γt(i)=P(yt=si∣x,λ)。
由前后向概率,有 α t ( i ) β t ( i ) = P ( y t = s i , x ∣ λ ) {\alpha _t}\left( i \right){\beta _t}\left( i \right) = P\left( {{y_t} = {s_i},{\bf{x}}|{\bf{\lambda }}} \right) αt(i)βt(i)=P(yt=si,x∣λ)。于是有: γ t ( i ) = P ( y t = s i ∣ x , λ ) = P ( y t = s i , x ∣ λ ) P ( x ∣ λ ) = α t ( i ) β t ( i ) ∑ j = 1 N α t ( j ) β t ( j ) {\gamma _t}\left( i \right) = P\left( {{y_t} = {s_i}|{\bf{x,\lambda }}} \right) = \frac{{P\left( {{y_t} = {s_i},{\bf{x}}|{\bf{\lambda }}} \right)}}{{P\left( {{\bf{x}}|{\bf{\lambda }}} \right)}} = \frac{{{\alpha _t}\left( i \right){\beta _t}\left( i \right)}}{{\sum\limits_{j = 1}^N {{\alpha _t}\left( j \right){\beta _t}\left( j \right)} }} γt(i)=P(yt=si∣x,λ)=P(x∣λ)P(yt=si,x∣λ)=j=1∑Nαt(j)βt(j)αt(i)βt(i)
2.给定模型 λ \lambda λ和观测 x {\bf{x}} x,在时刻 t t t处于状态 s i s_i si且在时刻 t + 1 t+1 t+1处于状态 s j s_j sj的概率,记 ξ t ( i , j ) = P ( y t = s i , y t + 1 = s j ∣ x , λ ) {\xi _t}\left( {i,j} \right) = P\left( {{y_t} = {s_i},{y_{t + 1}} = {s_j}|{\bf{x}},{\bf{\lambda }}} \right) ξt(i,j)=P(yt=si,yt+1=sj∣x,λ)。由前后向概率,有 α t ( i ) a i j b j x t + 1 β t + 1 ( j ) = P ( y t = s i , y t + 1 = s j ∣ x , λ ) {\alpha _t}\left( i \right){a_{ij}}{b_{j{x_{t + 1}}}}{\beta _{t + 1}}\left( j \right) = P\left( {{y_t} = {s_i},{y_{t + 1}} = {s_j}|{\bf{x}},{\bf{\lambda }}} \right) αt(i)aijbjxt+1βt+1(j)=P(yt=si,yt+1=sj∣x,λ)。于是有 ξ t ( i , j ) = P ( y t = s i , y t + 1 = s j , x ∣ λ ) P ( x ∣ λ ) = α t ( i ) a i j b j x t + 1 β t + 1 ( j ) ∑ i = 1 N ∑ j = 1 N α t ( i ) a i j b j x t + 1 β t + 1 ( j ) {\xi _t}\left( {i,j} \right) = \frac{{P\left( {{y_t} = {s_i},{y_{t + 1}} = {s_j},{\bf{x}}|{\bf{\lambda }}} \right)}}{{P\left( {{\bf{x}}|{\bf{\lambda }}} \right)}} = \frac{{{\alpha _t}\left( i \right){a_{ij}}{b_{j{x_{t + 1}}}}{\beta _{t + 1}}\left( j \right)}}{{\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {{\alpha _t}\left( i \right){a_{ij}}{b_{j{x_{t + 1}}}}{\beta _{t + 1}}\left( j \right)} } }} ξt(i,j)=P(x∣λ)P(yt=si,yt+1=sj,x∣λ)=i=1∑Nj=1∑Nαt(i)aijbjxt+1βt+1(j)αt(i)aijbjxt+1βt+1(j)
3.将 γ t ( i ) {\gamma _t}\left( i \right) γt(i)和 ξ t ( i , j ) {\xi _t}\left( {i,j} \right) ξt(i,j)对各个时刻 t t t求和,可以得到一些有用的期望值:
1)在观测 x {\bf{x}} x下状态 s i s_i si出现的期望值为 ∑ t = 1 T γ t ( i ) \sum\limits_{t = 1}^T {{\gamma _t}\left( i \right)} t=1∑Tγt(i)。
2)在观测 x {\bf{x}} x下由状态 s i s_i si转移的期望值为 ∑ t = 1 T γ t ( i ) \sum\limits_{t = 1}^T {{\gamma _t}\left( i \right)} t=1∑Tγt(i)。
3)在观测 x {\bf{x}} x下由状态 s i s_i si转移到状态 s j s_j sj的期望值 ∑ t = 1 T ξ t ( i , j ) \sum\limits_{t = 1}^T {{\xi _t}\left( {i,j} \right)} t=1∑Tξt(i,j)。
这些将在HMM的训练中被用到。
下一篇《EM算法》