1、HMM，Hidden Markov model 隐马尔科夫模型

（1）天气举例

假设不能直接观察天气阴晴雨情况，只能看到地面的潮湿情况（假如分为非常潮湿，一般潮湿，不潮湿三种对应A，B，C三种评级）。现在我一连观察了一周的地面潮湿情况（AABBCBA），是否能够判断这一周的天气？
如上所述，有两类状态：一类是地面潮湿状态 observation stata（A、B、C）；一类是天气情况 latent stata（阴晴雨）；
之前已经知道了天气的状态转移概率矩阵，现在主要就需要知道天气情况与地面潮湿情况的对应关系，从而预测天气情况。

（2）Hidden Markov model

HMM理解

HMM随机生成的状态随机序列被称为状态序列；每个状态生成一个观测，由此产生的观测随机序列，被称为观测序列。
HMM是一个双重随机过程---具有一定状态的隐马尔可夫链（天气情况）和随机的观测序列（地面潮湿情况）。
The HMM is a discrete time model: for each point in time t, we have one hidden state that generates one observed event for that time point t.

HMM

HMM应用条件

The hidden states follow a Markov process, i.e., the states over time are not independent of one another, but the current state depends on the previous state only (and not on earlier states)
即MM马尔科夫的基本假设，当前状态只与上一状态有关

assumption1
The distribution generating the observation depends on the state of an underlying, hidden state,。
某时刻的观测结果只与当前时刻对应的隐状态有关

assumption2

发射矩阵 emission distribution matrix

即每种隐状态对应的，可能出现的观测结果的概率情况。
例如晴天天气里，地面出现A、B、C三种情况的概率分别为0.1，0.3，0.6；阴天分别为0.2，0.5，0.3；雨天分别为0.5，0.3，0.2

EM

HMM三要素

综上，以及MM知识点，描述HMM，需要三大参数

three sets of parameters
此外还有隐状态集合，预测值集合
关于预测值，一般以分类类型结果为主；
关于隐状态，我认为在实际探索的过程中往往并不清楚状态的含义，而是根据预测值及专业知识分析状态意义。例如我把天气视为观测结果，那么隐状态可以是梅雨期？台风期？甚至可以说是太阳公公心情的好坏状态。

HMM模型可以解决的三类问题！！！划重点

（1）概率计算问题

给定模型λ=(A,B,π)和观测序列Q={q1,q2,...,qT}，计算模型λ下观测到序列Q出现的概率P(Q|λ)；
例如给定天气与地面潮湿情况的HMM三要素，及一组地面潮湿情况数据。计算这组数据的出现概率是多大
方法：前向-后向算法

（2）学习问题

已知观测序列Q={q1,q2,...,qT}，估计模型λ=(A,B,π)的参数，使得在该模型下观测序列P(Q|λ)最大；
例如已知一组地面潮湿情况数据，估计天气与地面潮湿情况的HMM三要素，从而使出现这组数据的可能性最大；
方法：Maximum likelihood, Expectation Maximization or Baum-Welch algorithm, and Bayesian estimation

（3）预测问题

给定模型λ=(A,B,π)和观测序列Q={q1,q2,...,qT}，求给定观测序列条件概率P(I|Q，λ)最大的状态序列I。
例如已知一周的地面潮湿情况数据，并且已知天气与地面潮湿情况的HMM三要素，从而估计这一周最有可能的天气情况
方法：Viterbi algorithm

个人认为在实验探索中，观察预测值比较容易获得，由此学习建模，估计参数。然后根据模型结果，进行其它观察结果的隐状态序列的预测。

2、R代码实操

已知某特殊DNA序列存在两个区（hidden stata），分别是AT-rich与CG-rich。
前者富含AT碱基，后者CG碱基，具体碱基概率分布(emission matrix)如下图。

HMM
此外也知道了初始状态分布概率以及状态概率转移矩阵，据此随机生成一段符合要求的碱基序列。
第一步：准备TM、EM

states              <- c("AT-rich", "GC-rich") # Define the names of the states
ATrichprobs         <- c(0.7, 0.3)             # Set the probabilities of switching states, where the previous state was "AT-rich"
GCrichprobs         <- c(0.1, 0.9)             # Set the probabilities of switching states, where the previous state was "GC-rich"
thetransitionmatrix <- matrix(c(ATrichprobs, GCrichprobs), 2, 2, byrow = TRUE) # Create a 2 x 2 matrix
rownames(thetransitionmatrix) <- states
colnames(thetransitionmatrix) <- states
thetransitionmatrix     

nucleotides         <- c("A", "C", "G", "T")   # Define the alphabet of nucleotides
ATrichstateprobs    <- c(0.39, 0.1, 0.1, 0.41) # Set the values of the probabilities, for the AT-rich state
GCrichstateprobs    <- c(0.1, 0.41, 0.39, 0.1) # Set the values of the probabilities, for the GC-rich state
theemissionmatrix <- matrix(c(ATrichstateprobs, GCrichstateprobs), 2, 4, byrow = TRUE) # Create a 2 x 4 matrix
rownames(theemissionmatrix) <- states
colnames(theemissionmatrix) <- nucleotides
theemissionmatrix

第二步：编写函数

# Function to generate a DNA sequence, given a HMM and the length of the sequence to be generated.
generatehmmseq <- function(transitionmatrix, emissionmatrix, initialprobs, seqlength)
{
  nucleotides     <- c("A", "C", "G", "T")   # Define the alphabet of nucleotides
  states          <- c("AT-rich", "GC-rich") # Define the names of the states
  mysequence      <- character()             # Create a vector for storing the new sequence
  mystates        <- character()             # Create a vector for storing the state that each position in the new sequence
  # was generated by
  # Choose the state for the first position in the sequence:
  firststate      <- sample(states, 1, rep=TRUE, prob=initialprobs)
  # Get the probabilities of the current nucleotide, given that we are in the state "firststate":
  probabilities   <- emissionmatrix[firststate,]
  # Choose the nucleotide for the first position in the sequence:
  firstnucleotide <- sample(nucleotides, 1, rep=TRUE, prob=probabilities)
  mysequence[1]   <- firstnucleotide         # Store the nucleotide for the first position of the sequence
  mystates[1]     <- firststate              # Store the state that the first position in the sequence was generated by
  
  for (i in 2:seqlength)
  {
    prevstate    <- mystates[i-1]           # Get the state that the previous nucleotide in the sequence was generated by
    # Get the probabilities of the current state, given that the previous nucleotide was generated by state "prevstate"
    stateprobs   <- transitionmatrix[prevstate,]
    # Choose the state for the ith position in the sequence:
    state        <- sample(states, 1, rep=TRUE, prob=stateprobs)
    # Get the probabilities of the current nucleotide, given that we are in the state "state":
    probabilities <- emissionmatrix[state,]
    # Choose the nucleotide for the ith position in the sequence:
    nucleotide   <- sample(nucleotides, 1, rep=TRUE, prob=probabilities)
    mysequence[i] <- nucleotide             # Store the nucleotide for the current position of the sequence
    mystates[i]  <- state                   # Store the state that the current position in the sequence was generated by
  }
  
  for (i in 1:length(mysequence))
  {
    nucleotide   <- mysequence[i]
    state        <- mystates[i]
    print(paste("Position", i, ", State", state, ", Nucleotide = ", nucleotide))
  }
}

第三步：给定初始状态概率进行预测

theinitialprobs <- c(0.5, 0.5)
generatehmmseq(thetransitionmatrix, theemissionmatrix, theinitialprobs, 30)

result

R实操代码本身意义可能不大，但对于我们具体了解HMM模型很有帮助。在具体应用HMM模型时，更多的是采用相应的R包进行分析。这类R包有不少，会挑选几个进行示例学习。

参考文章
1、Hidden Markov Models — Bioinformatics 0.1 documentation
2、01 隐马尔可夫模型 - 马尔可夫链、HMM参数和性质 -
3、Multilevel HMM tutorial
4、马尔可夫链_百度百科

HMM 隐马尔可夫模型初学（二）

1、HMM，Hidden Markov model 隐马尔科夫模型

（1）天气举例

（2）Hidden Markov model

HMM理解

HMM应用条件

发射矩阵 emission distribution matrix

HMM三要素

HMM模型可以解决的三类问题！！！划重点

2、R代码实操

你可能感兴趣的:(HMM 隐马尔可夫模型初学（二）)