1、HMM,Hidden Markov model 隐马尔科夫模型
- 假设不能直接观察天气阴晴雨情况,只能看到地面的潮湿情况(假如分为非常潮湿,一般潮湿,不潮湿三种对应A,B,C三种评级)。现在我一连观察了一周的地面潮湿情况(AABBCBA),是否能够判断这一周的天气?
- 如上所述,有两类状态:一类是地面潮湿状态 observation stata(A、B、C);一类是天气情况 latent stata(阴晴雨);
- 之前已经知道了天气的状态转移概率矩阵,现在主要就需要知道天气情况与地面潮湿情况的对应关系,从而预测天气情况。
(2)Hidden Markov model
- HMM随机生成的状态随机序列被称为状态序列;每个状态生成一个观测,由此产生的观测随机序列,被称为观测序列。
- HMM是一个双重随机过程---具有一定状态的隐马尔可夫链(天气情况)和随机的观测序列(地面潮湿情况)。
The HMM is a discrete time model: for each point in time t, we have one hidden state that generates one observed event for that time point t.
The hidden states follow a Markov process, i.e., the states over time are not independent of one another, but the current state depends on the previous state only (and not on earlier states)
The distribution generating the observation depends on the state of an underlying, hidden state,。
发射矩阵 emission distribution matrix
- 即每种隐状态对应的,可能出现的观测结果的概率情况。
- 此外还有隐状态集合,预测值集合
- 给定模型λ=(A,B,π)和观测序列Q={q1,q2,...,qT},计算模型λ下观测到序列Q出现的概率P(Q|λ);
- 例如给定天气与地面潮湿情况的HMM三要素,及一组地面潮湿情况数据。计算这组数据的出现概率是多大
- 方法:前向-后向算法
- 已知观测序列Q={q1,q2,...,qT},估计模型λ=(A,B,π)的参数,使得在该模型下观测序列P(Q|λ)最大;
- 例如已知一组地面潮湿情况数据,估计天气与地面潮湿情况的HMM三要素,从而使出现这组数据的可能性最大;
- 方法:Maximum likelihood, Expectation Maximization or Baum-Welch algorithm, and Bayesian estimation
- 给定模型λ=(A,B,π)和观测序列Q={q1,q2,...,qT},求给定观测序列条件概率P(I|Q,λ)最大的状态序列I。
- 例如已知一周的地面潮湿情况数据,并且已知天气与地面潮湿情况的HMM三要素,从而估计这一周最有可能的天气情况
- 方法:Viterbi algorithm
已知某特殊DNA序列存在两个区(hidden stata),分别是AT-rich与CG-rich。
前者富含AT碱基,后者CG碱基,具体碱基概率分布(emission matrix)如下图。
states <- c("AT-rich", "GC-rich") # Define the names of the states
ATrichprobs <- c(0.7, 0.3) # Set the probabilities of switching states, where the previous state was "AT-rich"
GCrichprobs <- c(0.1, 0.9) # Set the probabilities of switching states, where the previous state was "GC-rich"
thetransitionmatrix <- matrix(c(ATrichprobs, GCrichprobs), 2, 2, byrow = TRUE) # Create a 2 x 2 matrix
rownames(thetransitionmatrix) <- states
colnames(thetransitionmatrix) <- states
nucleotides <- c("A", "C", "G", "T") # Define the alphabet of nucleotides
ATrichstateprobs <- c(0.39, 0.1, 0.1, 0.41) # Set the values of the probabilities, for the AT-rich state
GCrichstateprobs <- c(0.1, 0.41, 0.39, 0.1) # Set the values of the probabilities, for the GC-rich state
theemissionmatrix <- matrix(c(ATrichstateprobs, GCrichstateprobs), 2, 4, byrow = TRUE) # Create a 2 x 4 matrix
rownames(theemissionmatrix) <- states
colnames(theemissionmatrix) <- nucleotides
- 第二步:编写函数
# Function to generate a DNA sequence, given a HMM and the length of the sequence to be generated.
generatehmmseq <- function(transitionmatrix, emissionmatrix, initialprobs, seqlength)
nucleotides <- c("A", "C", "G", "T") # Define the alphabet of nucleotides
states <- c("AT-rich", "GC-rich") # Define the names of the states
mysequence <- character() # Create a vector for storing the new sequence
mystates <- character() # Create a vector for storing the state that each position in the new sequence
# was generated by
# Choose the state for the first position in the sequence:
firststate <- sample(states, 1, rep=TRUE, prob=initialprobs)
# Get the probabilities of the current nucleotide, given that we are in the state "firststate":
probabilities <- emissionmatrix[firststate,]
# Choose the nucleotide for the first position in the sequence:
firstnucleotide <- sample(nucleotides, 1, rep=TRUE, prob=probabilities)
mysequence[1] <- firstnucleotide # Store the nucleotide for the first position of the sequence
mystates[1] <- firststate # Store the state that the first position in the sequence was generated by
for (i in 2:seqlength)
prevstate <- mystates[i-1] # Get the state that the previous nucleotide in the sequence was generated by
# Get the probabilities of the current state, given that the previous nucleotide was generated by state "prevstate"
stateprobs <- transitionmatrix[prevstate,]
# Choose the state for the ith position in the sequence:
state <- sample(states, 1, rep=TRUE, prob=stateprobs)
# Get the probabilities of the current nucleotide, given that we are in the state "state":
probabilities <- emissionmatrix[state,]
# Choose the nucleotide for the ith position in the sequence:
nucleotide <- sample(nucleotides, 1, rep=TRUE, prob=probabilities)
mysequence[i] <- nucleotide # Store the nucleotide for the current position of the sequence
mystates[i] <- state # Store the state that the current position in the sequence was generated by
for (i in 1:length(mysequence))
nucleotide <- mysequence[i]
state <- mystates[i]
print(paste("Position", i, ", State", state, ", Nucleotide = ", nucleotide))
- 第三步:给定初始状态概率进行预测
theinitialprobs <- c(0.5, 0.5)
generatehmmseq(thetransitionmatrix, theemissionmatrix, theinitialprobs, 30)
1、Hidden Markov Models — Bioinformatics 0.1 documentation
2、01 隐马尔可夫模型 - 马尔可夫链、HMM参数和性质 -
3、Multilevel HMM tutorial