对于HMM模型的相关简介:维特比(viterbi)算法与中文词性标注(一)
——隐马尔科夫模型
针对HMM模型的第二类问题,根据模型及输出序列,判断状态序列;使用的方法即为维特比(viterbi)算法
一种动态规划算法,以求出篱笆网络的有向图最短路径
对于隐马尔科夫链,图的节点代表状态,节点间的路径代表状态转移,路径的权值代表状态转移的概率
对于整个图而言,从初始点S到终结状态E,假设最优路径在第i时刻时需要转换至 x i j x_{ij} xij,则可以采用单独计算 S → x i j S\to x_{ij} S→xij和 x i j → E x_{ij}\to E xij→E的最优路径,若不采用二者之间任意一个的最优路径,则可以将未采用的部分替换为该部分的最优路径,使得结果更佳。
对于HMM模型中的递归结构,针对问题描述,目前的已知条件是:
隐含状态集合 X = { x 1 , x 2 , . . . , x m } X=\{x_1,x_2,...,x_m\} X={x1,x2,...,xm}
输出值序列 O = { O 1 , O 2 , . . . , O n } O=\{O_1,O_2,...,O_n\} O={O1,O2,...,On}
初始状态序列 P ( x i ∣ S ) P(x_i|S) P(xi∣S)
转移概率序列 P ( x j ∣ x i ) P(x_j|x_i) P(xj∣xi),从 x i x_i xi转换为 x j x_j xj的概率
发射概率序列 P ( O t ∣ x i ) P(O_t|x_i) P(Ot∣xi)
最终所求的是第i时刻在输出 O i O_i Oi的情况下出现状态 x j x_j xj的概率 P i j P_{ij} Pij。
根据每一时刻计算出的所有状态的概率,比较得出最大概率的状态,作为隐含状态序列。
对于上一篇博客中医生看病的例子
医生通过病人对自身身体感受的描述(正常,头晕,冷)来判断病人的病情(健康(health),低烧(low fever),高烧(high fever))
初始状态序列
健康 | 低烧 | 高烧 |
---|---|---|
0.7 | 0.2 | 0.1 |
病人在两天之间病情转换概率,即转移概率序列
前\后 | 健康 | 低烧 | 高烧 |
---|---|---|---|
健康 | 0.8 | 0.15 | 0.05 |
低烧 | 0.4 | 0.3 | 0.3 |
高烧 | 0.2 | 0.5 | 0.3 |
病人在相应病情下的身体感受概率,即发射概率序列
病情\感受 | 正常 | 头晕 | 冷 |
---|---|---|---|
健康 | 0.8 | 0.1 | 0.1 |
低烧 | 0.2 | 0.4 | 0.4 |
高烧 | 0.1 | 0.3 | 0.6 |
已知病人三天的身体感受序列如下:{Day1:正常,Day2:冷,Day3:冷}
P 1 , h e a l t h = 0.8 × 0.7 = 0.56 P_{1,health}=0.8\times 0.7=0.56 P1,health=0.8×0.7=0.56
P 1 , l o w f e v e r = 0.2 × 0.2 = 0.04 P_{1,low fever}=0.2\times 0.2=0.04 P1,lowfever=0.2×0.2=0.04
P 1 , h i g h f e v e r = 0.1 × 0.1 = 0.01 P_{1,high fever}=0.1\times 0.1=0.01 P1,highfever=0.1×0.1=0.01
P 1 , h e a l t h > P 1 , l o w f e v e r > P 1 , h i g h f e v e r P_{1,health}>P_{1,low fever}>P_{1,high fever} P1,health>P1,lowfever>P1,highfever
故第一天病人状态为健康
P 2 , h e a l t h = max { 0.56 × 0.8 , 0.04 × 0.4 , 0.01 × 0.2 } × 0.1 = 0.0448 P_{2,health}=\max\{0.56\times0.8,0.04\times0.4,0.01\times0.2\}\times0.1=0.0448 P2,health=max{0.56×0.8,0.04×0.4,0.01×0.2}×0.1=0.0448
P 2 , l o w f e v e r = max { 0.56 × 0.15 , 0.04 × 0.3 , 0.01 × 0.5 } × 0.4 = 0.0336 P_{2,low fever}=\max\{0.56\times0.15,0.04\times0.3,0.01\times0.5\}\times0.4=0.0336 P2,lowfever=max{0.56×0.15,0.04×0.3,0.01×0.5}×0.4=0.0336
P 2 , h i g h f e v e r = max { 0.56 × 0.05 , 0.04 × 0.3 , 0.01 × 0.3 } × 0.6 = 0.0168 P_{2,high fever}=\max\{0.56\times0.05,0.04\times0.3,0.01\times0.3\}\times0.6=0.0168 P2,highfever=max{0.56×0.05,0.04×0.3,0.01×0.3}×0.6=0.0168
P 2 , h e a l t h > P 2 , l o w f e v e r > P 2 , h i g h f e v e r P_{2,health}>P_{2,low fever}>P_{2,high fever} P2,health>P2,lowfever>P2,highfever
故第二天病人状态为健康
P 3 , h e a l t h = max { 0.0448 × 0.8 , 0.0336 × 0.4 , 0.0168 × 0.2 } × 0.1 = 0.003584 P_{3,health}=\max\{0.0448\times0.8,0.0336\times0.4,0.0168\times0.2\}\times0.1=0.003584 P3,health=max{0.0448×0.8,0.0336×0.4,0.0168×0.2}×0.1=0.003584
P 3 , l o w f e v e r = max { 0.0448 × 0.15 , 0.0336 × 0.3 , 0.0168 × 0.5 } × 0.4 = 0.004032 P_{3,low fever}=\max\{0.0448\times0.15,0.0336\times0.3,0.0168\times0.5\}\times0.4=0.004032 P3,lowfever=max{0.0448×0.15,0.0336×0.3,0.0168×0.5}×0.4=0.004032
P 3 , h i g h f e v e r = max { 0.0448 × 0.05 , 0.0336 × 0.3 , 0.0168 × 0.3 } × 0.6 = 0.006048 P_{3,high fever}=\max\{0.0448\times0.05,0.0336\times0.3,0.0168\times0.3\}\times0.6=0.006048 P3,highfever=max{0.0448×0.05,0.0336×0.3,0.0168×0.3}×0.6=0.006048
P 3 , h i g h f e v e r > P 3 , l o w f e v e r > P 3 , h e a l t h P_{3,high fever}>P_{3,low fever}>P_{3,health} P3,highfever>P3,lowfever>P3,health
故第三天病人状态为高烧
故病人病情状态序列{健康,健康,高烧}
维特比(viterbi)算法与中文词性标注(三)——词性标注实现
参考文献
[1]一文搞懂HMM(隐马尔可夫模型)
[2]HMM模型和Viterbi算法
[3]简单理解viterbi算法