hmmlearn使用简介

隐含马尔可夫模型(Hidden Markov Model,HMM)最初是在20世纪60年代后半期,由Leonard E. Baum和其他一些作者在一系列统计学论文中描述的。其最初应用于语音识别领域。

1980年代后半期,HMM开始应用到生物序列,尤其是DNA序列的分析中。随后,在生物信息学领域,HMM逐渐成为一项不可或缺的技术。

本文内容包含来自:
[1] 用hmmlearn学习隐马尔科夫模型HMM
[2] 官方文档

0. 目录

  • 1. hmmlearn
  • 2. MultinomialHMM
    • 2.1 定义、使用:
    • 2.2 维特比算法预测状态
    • 2.3 计算观测的概率
  • 3. 训练与数据准备

1. hmmlearn

hmmlearn曾经是scikit-learn项目的一部分,现已独立成单独的Python包,可直接通过pip进行安装,为无监督隐马尔可夫模型。其官方文档网址为https://hmmlearn.readthedocs.io/en/stable/。其有监督的版本为seqlearn。

pip3 install hmmlearn

hmmlearn提供三种模型:

名称 简介 观测状态
hmm.GaussianHMM Hidden Markov Model with Gaussian emissions. 连续
hmm.GMMHMM Hidden Markov Model with Gaussian mixture emissions. 连续
hmm.MultinomialHMM Hidden Markov Model with multinomial (discrete) emissions 离散

2. MultinomialHMM

方法声明为

class hmmlearn.hmm.MultinomialHMM(n_components=1, startprob_prior=1.0, transmat_prior=1.0,
algorithm='viterbi', random_state=None, n_iter=10, tol=0.01, verbose=False,  params='ste', init_params='ste')

其中,较为常用(或将更新)的参数为:

  • n_components:(int)隐含状态个数
  • n_iter:(int, optional)训练时循环(迭代)最大次数
  • tol:(float, optional)Convergence threshold. EM will stop if the gain in log-likelihood is below this value.
  • verbose:(bool, optional)赋值为True时,会向标准输出输出每次迭代的概率(score)与本次
  • init_params:(string, optional)决定哪些参数会在训练时被初始化。‘s’ for startprob, ‘t’ for transmat, ‘e’ for emissionprob。空字符串""代表全部使用用户提供的参数进行训练。

2.1 定义、使用:

import numpy as np
from hmmlearn import hmm

states = ["box 1", "box 2", "box3"]
n_states = len(states)

observations = ["red", "white"]
n_observations = len(observations)

start_probability = np.array([0.2, 0.4, 0.4])

transition_probability = np.array([
  [0.5, 0.2, 0.3],
  [0.3, 0.5, 0.2],
  [0.2, 0.3, 0.5]
])

emission_probability = np.array([
  [0.5, 0.5],
  [0.4, 0.6],
  [0.7, 0.3]
])

model = hmm.MultinomialHMM(n_components=n_states, n_iter=20, tol=0.001)
model.startprob_=start_probability
model.transmat_=transition_probability
model.emissionprob_=emission_probability

2.2 维特比算法预测状态

有说法称,其返回结果为ln(prob),文档原文为“the log probability”

seen = np.array([[0,1,0]]).T
logprob, box = model.decode(seen, algorithm="viterbi")
print("The ball picked:", ", ".join(map(lambda x: observations[x], seen)))
print("The hidden box", ", ".join(map(lambda x: states[x], box)))

输出为

('The ball picked:', 'red, white, red')
('The hidden box', 'box3, box3, box3')

2.3 计算观测的概率

print model.score(seen)

输出为

-2.03854530992

3. 训练与数据准备

import numpy as np
from hmmlearn import hmm

states = ["box 1", "box 2", "box3"]
n_states = len(states)

observations = ["red", "white"]
n_observations = len(observations)
model = hmm.MultinomialHMM(n_components=n_states, n_iter=20, tol=0.01)

D1 = [[1], [0], [0], [0], [1], [1], [1]]
D2 = [[1], [0], [0], [0], [1], [1], [1], [0], [1], [1]]
D3 = [[1], [0], [0]]

X = numpy.concatenate([D1, D2, D3])

model.fit(X)
print model.startprob_
print model.transmat_
print model.emissionprob_
print model.score(X)

你可能感兴趣的:(生信,python,机器学习)