【math】Hiden Markov Model 隐马尔可夫模型了解

文章目录

  • Introduction to Hidden Markov Model
    • Introduction
      • Markov chain
      • Hidden Markov Model(HMM)
    • Three Questions
      • Q1: evaluate problem -- Forward algorithm
      • Q2: decode problem -- Viterbi algorithm
      • Q3: learn problem -- Baum-Welch algorithm
    • Application

Introduction to Hidden Markov Model

Introduction

  • Markov chains were first introduced in 1906 by Andrey Markov
  • HMM was developed by L. E. Baum and coworkers in the 1960s
  • HMM is simplest dynamic Bayesian network and a directed graphic model
  • Application: speech recognition, PageRank(Google), DNA analysis, …

Markov chain

A Markov chain is “a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event”.

【math】Hiden Markov Model 隐马尔可夫模型了解_第1张图片
Space or time can be either discrete(_:t=0, 1, 2,…) or continuous(_:t≥0). (we will focus on Markov chains in discrete space an time)

Example for Markov chain:

  • transition matrix :
  • 5-step transition matrix is ^5 :
    【math】Hiden Markov Model 隐马尔可夫模型了解_第2张图片

Hidden Markov Model(HMM)

HMM is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (i.e. hidden) states.

State: x = (x1, x2, x3) ; 
Observation: y = (y1, y2, y3);
Transition matrix: A = (aij); 
Emission matrix: B = (bij)

【math】Hiden Markov Model 隐马尔可夫模型了解_第3张图片

Example:
【math】Hiden Markov Model 隐马尔可夫模型了解_第4张图片
【math】Hiden Markov Model 隐马尔可夫模型了解_第5张图片

Three Questions

  • Given the model =[, ,], how to calculate the probability of producing the observation ={ 1 _1 1, 2 _2 2,…, _ n | _ i∈}? In other words, how to evaluate the matching degree between the model and the observation ?
  • Given the model =[, ,] and the observation ={ 1 _1 1, 2 _2 2,…, _ n| _ i∈}, how to find most probable state ={ 1 _1 1, 2 _2 2,…, _ n | _ i∈} ? In other words, how to infer the hidden state from the observation ?
  • Given the observation ={ 1 _1 1, 2 _2 2,…, _ n | _ i∈}, how to adjust model parameters =[, ,] to maximize to the probability (│) ? In other words, how to train the model to describe the observation more accurately ?

Q1: evaluate problem – Forward algorithm

Q1: how to evaluate the matching degree between the model and the observation ? ( forward algorithm)

  • the probability of observing event 1 _1 1: 0 _{0} i0 = _ i (= 1 {_1} 1) = 1 _{}_{1} ib1i
  • the probability of observing event + 1 _{+1} j+1 (≥1): _ ij= , + 1 _{,+1} j,i+1 _ k , − 1 _{,−1} i,j1 _{} ij
  • ()=∑ _{} k _{} ij 0 _{0} j0
    【math】Hiden Markov Model 隐马尔可夫模型了解_第6张图片

Q2: decode problem – Viterbi algorithm

Q2: how to infer the hidden state from the observation ? (Viterbi algorithm)

Observation =( 1 _1 1, 2 _2 2,…, _ T), initial prob. =( 1 _1 1, 2 _2 2,…, _ K), transition matrix , emission matrix .
【math】Hiden Markov Model 隐马尔可夫模型了解_第7张图片
Viterbi algorithm(optimal solution) backtracking method; It retains the optimal solution of each choice in the previous step and finds the optimal selection path through the backtracking method.

Example:

  • The observation ={“′Normal′, ′Cold′, ′Dizzy′” } , =[, ,], the hidden state ={ 1 _1 1, 2 _2 2, 3 _3 3}= ?
    【math】Hiden Markov Model 隐马尔可夫模型了解_第8张图片 【math】Hiden Markov Model 隐马尔可夫模型了解_第9张图片
    【math】Hiden Markov Model 隐马尔可夫模型了解_第10张图片

Q3: learn problem – Baum-Welch algorithm

Q3: how to train the model to describe the observation more accurately ? (Baum-Welch algorithm)

The Baum–Welch algorithm uses the well known EM algorithm to find the maximum likelihood estimate of the parameters of a hidden Markov model given a set of observed feature vectors.
【math】Hiden Markov Model 隐马尔可夫模型了解_第11张图片
【math】Hiden Markov Model 隐马尔可夫模型了解_第12张图片
Baum–Welch algorithm(forward-backward alg.) It approximates the optimal parameters through iteration.

【math】Hiden Markov Model 隐马尔可夫模型了解_第13张图片

  • (1) Likelihood function :
    (Y,|)
  • (2) Expectation of EM algorithm:
    (, ̂ )= ∑ _ I (,|)(,| ̂ )
  • (3) *Maximization of EM algorithm:
    max (, ̂ )

use Lagrangian multiplier method and take the partial derivative of Lagrangian funcition。

Application

CpG island:

  • In the human genome wherever the dinucleotide CG occurs, the C nucleotide is typically chemically modified by methylation.
  • around the promoters or ‘start’ regions
  • CpG is typically a few hundred to a few thousand bases long.
    【math】Hiden Markov Model 隐马尔可夫模型了解_第14张图片 【math】Hiden Markov Model 隐马尔可夫模型了解_第15张图片
    three questions: of CpG island:
  • Given the model of distinguished the CpG island, how to calculate the probability of the observation sequence ?
  • Given a short stretch of genomic sequence, how would we decide if it comes from a CpG island or not ?
  • Given a long piece of sequence, how would we find the CpG islands in it?
    【math】Hiden Markov Model 隐马尔可夫模型了解_第16张图片

reference:

  • Markov chain Defination
  • A Revealing Introduction to Hidden Markov Models
  • Top 10 Algorithms in Data Mining
  • An Introduction to Hidden Markov Models for Biological Sequences
  • python-hmmlearn-example
  • Viterbi algorithm
  • Baum-Welch blog
  • Biological Sequence Analysis. Probabilistic Models of Proteins and Nucleic Acids. R. Durbin, S. Eddy, A. Krogh and G. Mitchison

未完待续…

你可能感兴趣的:(机器学习,人工智能,深度学习)