Kaldi HCLG 深入理解

1. 相关部分包含的主要任务


1.1 WFST Key Concepts

  1. determinization
  2. minimization
  3. composition
  4. equivalent
  5. epsilon-free
  6. functional
  7. on-demand algorithm
  8. weight-pushing
  9. epsilon removal

1.2 HMM Key Concepts

  1. Markov Chain
  2. Hidden Markov Model
  3. Forward-backward algorithm
  4. Viterbi algorithm
  5. E-M for mixture of Gaussians
2. HCLG

L.fst: The Phonetic Dictionary FST

Kaldi HCLG 深入理解_第1张图片

maps monophone sequences to words.

The file L.fst is the Finite State Transducer form of the lexicon with phone symbols on the input and word symbols on the output.

L_disambig.fst:The Phonetic Dictionary with Disambiguation Symbols FST

A lexicon with disambiguation symbols


G.fst:The Language Model FST

FSA grammar (can be built from an n-gram grammar).


C.fst:The Context FST

C maps triphone sequences to monophones.

Expands the phones into context-dependent phones.


H.fst:The HMM FST

H maps multiple HMM states (a.k.a. transition-ids in Kaldi-speak) to context-dependent triphones.

Expands out the HMMs. On the right are the context-dependent phones and on the left are the pdf-ids. 


HCLG.fst: final graph

Kaldi HCLG 深入理解_第2张图片



总结一下:


构图过程 G -> L -> C -> H

          G: 作为 acceptor (输入 symbol 与输出相同),用于对grammar 或者 language model 进行编码
          L:  Lexicon, 其输出 symbol 是 words, 输入 symbol 是 phones
          C:  context-dependency 其输出 symbol 是 phones, 其输入 symbol 为表示context-dependency phones

              如: vector ctx_window = { 12, 15, 21 };
                      含义:id = 15 的 phone 为 中心 phone, left phone id = 12, right phone id = 21

          H: 包括HMM definitions,其输出 symbol 为context-dependency phones, 其输入 symbol 为transitions-ids(即 对 pdf-id 和 其它信息编码后的 id) 
 
            asl=="add-self-loops” 
          rds=="remove-disambiguation-symbols”, 
          and H' is H without the self-loops:

         HCLG = asl(min(rds(det(H' o min(det(C o min(det(L o G))))))))




你可能感兴趣的:(语音识别)