维特比(viterbi)算法与中文词性标注(二)

维特比(viterbi)算法

对于HMM模型的相关简介:维特比(viterbi)算法与中文词性标注(一)
——隐马尔科夫模型

问题描述

针对HMM模型的第二类问题,根据模型及输出序列,判断状态序列;使用的方法即为维特比(viterbi)算法

简介

一种动态规划算法,以求出篱笆网络的有向图最短路径
维特比(viterbi)算法与中文词性标注(二)_第1张图片
对于隐马尔科夫链,图的节点代表状态,节点间的路径代表状态转移,路径的权值代表状态转移的概率

动态规划属性

最优子结构

对于整个图而言,从初始点S到终结状态E,假设最优路径在第i时刻时需要转换至 x i j x_{ij} xij,则可以采用单独计算 S → x i j S\to x_{ij} Sxij x i j → E x_{ij}\to E xijE的最优路径,若不采用二者之间任意一个的最优路径,则可以将未采用的部分替换为该部分的最优路径,使得结果更佳。

递归求解方案

对于HMM模型中的递归结构,针对问题描述,目前的已知条件是:

隐含状态集合 X = { x 1 , x 2 , . . . , x m } X=\{x_1,x_2,...,x_m\} X={x1,x2,...,xm}

输出值序列 O = { O 1 , O 2 , . . . , O n } O=\{O_1,O_2,...,O_n\} O={O1,O2,...,On}

初始状态序列 P ( x i ∣ S ) P(x_i|S) P(xiS)

转移概率序列 P ( x j ∣ x i ) P(x_j|x_i) P(xjxi),从 x i x_i xi转换为 x j x_j xj的概率

发射概率序列 P ( O t ∣ x i ) P(O_t|x_i) P(Otxi)

最终所求的是第i时刻在输出 O i O_i Oi的情况下出现状态 x j x_j xj的概率 P i j P_{ij} Pij

  • 若i=1,则出现 P 1 j P_{1j} P1j的概率取决于初始概率,发射概率
  • 若i>1,则出现 P i j P_{ij} Pij的概率取决于i-1时刻的状态,状态转移概率,发射概率
    P i j = { P ( O i ∣ x j ) × P ( x j ∣ S ) , i = 1 ; P ( O i ∣ x j ) × max ⁡ t = 1 m P ( x j ∣ x t ) P ( i − 1 ) t , i > 1. P_{ij}= \left\{ \begin{array}{lr} P(O_i|x_j)\times P(x_j|S), & i=1;\\ P(O_i|x_j)\times\max_{t=1}^mP(x_j|x_t)P_{(i-1)t}, & i>1.\\ \end{array} \right. Pij={P(Oixj)×P(xjS),P(Oixj)×maxt=1mP(xjxt)P(i1)t,i=1;i>1.

构造最优解

根据每一时刻计算出的所有状态的概率,比较得出最大概率的状态,作为隐含状态序列。

举例说明

对于上一篇博客中医生看病的例子

医生通过病人对自身身体感受的描述(正常,头晕,冷)来判断病人的病情(健康(health),低烧(low fever),高烧(high fever))

初始状态序列

健康 低烧 高烧
0.7 0.2 0.1

病人在两天之间病情转换概率,即转移概率序列

前\后 健康 低烧 高烧
健康 0.8 0.15 0.05
低烧 0.4 0.3 0.3
高烧 0.2 0.5 0.3

病人在相应病情下的身体感受概率,即发射概率序列

病情\感受 正常 头晕
健康 0.8 0.1 0.1
低烧 0.2 0.4 0.4
高烧 0.1 0.3 0.6

已知病人三天的身体感受序列如下:{Day1:正常,Day2:冷,Day3:冷}

Day1

P 1 , h e a l t h = 0.8 × 0.7 = 0.56 P_{1,health}=0.8\times 0.7=0.56 P1,health=0.8×0.7=0.56

P 1 , l o w f e v e r = 0.2 × 0.2 = 0.04 P_{1,low fever}=0.2\times 0.2=0.04 P1,lowfever=0.2×0.2=0.04

P 1 , h i g h f e v e r = 0.1 × 0.1 = 0.01 P_{1,high fever}=0.1\times 0.1=0.01 P1,highfever=0.1×0.1=0.01

P 1 , h e a l t h > P 1 , l o w f e v e r > P 1 , h i g h f e v e r P_{1,health}>P_{1,low fever}>P_{1,high fever} P1,health>P1,lowfever>P1,highfever

故第一天病人状态为健康

Day2

P 2 , h e a l t h = max ⁡ { 0.56 × 0.8 , 0.04 × 0.4 , 0.01 × 0.2 } × 0.1 = 0.0448 P_{2,health}=\max\{0.56\times0.8,0.04\times0.4,0.01\times0.2\}\times0.1=0.0448 P2,health=max{0.56×0.8,0.04×0.4,0.01×0.2}×0.1=0.0448

P 2 , l o w f e v e r = max ⁡ { 0.56 × 0.15 , 0.04 × 0.3 , 0.01 × 0.5 } × 0.4 = 0.0336 P_{2,low fever}=\max\{0.56\times0.15,0.04\times0.3,0.01\times0.5\}\times0.4=0.0336 P2,lowfever=max{0.56×0.15,0.04×0.3,0.01×0.5}×0.4=0.0336

P 2 , h i g h f e v e r = max ⁡ { 0.56 × 0.05 , 0.04 × 0.3 , 0.01 × 0.3 } × 0.6 = 0.0168 P_{2,high fever}=\max\{0.56\times0.05,0.04\times0.3,0.01\times0.3\}\times0.6=0.0168 P2,highfever=max{0.56×0.05,0.04×0.3,0.01×0.3}×0.6=0.0168

P 2 , h e a l t h > P 2 , l o w f e v e r > P 2 , h i g h f e v e r P_{2,health}>P_{2,low fever}>P_{2,high fever} P2,health>P2,lowfever>P2,highfever

故第二天病人状态为健康

Day3

P 3 , h e a l t h = max ⁡ { 0.0448 × 0.8 , 0.0336 × 0.4 , 0.0168 × 0.2 } × 0.1 = 0.003584 P_{3,health}=\max\{0.0448\times0.8,0.0336\times0.4,0.0168\times0.2\}\times0.1=0.003584 P3,health=max{0.0448×0.8,0.0336×0.4,0.0168×0.2}×0.1=0.003584

P 3 , l o w f e v e r = max ⁡ { 0.0448 × 0.15 , 0.0336 × 0.3 , 0.0168 × 0.5 } × 0.4 = 0.004032 P_{3,low fever}=\max\{0.0448\times0.15,0.0336\times0.3,0.0168\times0.5\}\times0.4=0.004032 P3,lowfever=max{0.0448×0.15,0.0336×0.3,0.0168×0.5}×0.4=0.004032

P 3 , h i g h f e v e r = max ⁡ { 0.0448 × 0.05 , 0.0336 × 0.3 , 0.0168 × 0.3 } × 0.6 = 0.006048 P_{3,high fever}=\max\{0.0448\times0.05,0.0336\times0.3,0.0168\times0.3\}\times0.6=0.006048 P3,highfever=max{0.0448×0.05,0.0336×0.3,0.0168×0.3}×0.6=0.006048

P 3 , h i g h f e v e r > P 3 , l o w f e v e r > P 3 , h e a l t h P_{3,high fever}>P_{3,low fever}>P_{3,health} P3,highfever>P3,lowfever>P3,health

故第三天病人状态为高烧

故病人病情状态序列{健康,健康,高烧}

维特比(viterbi)算法与中文词性标注(三)——词性标注实现

参考文献
[1]一文搞懂HMM(隐马尔可夫模型)

[2]HMM模型和Viterbi算法

[3]简单理解viterbi算法

你可能感兴趣的:(机器学习)