1. FST含义

1.1 本文仅简单介绍如何利用Kaldi所依赖的工具openfst中的两个函数fstprint和fstdraw可视化FST。默认使用者已经顺利产生了.fst文件。

1.2 Kaldi官方文档中详细介绍了各个.fst的产生过程，即解码网络的构建。简单总结一下各个.fst的含义：

（1）L.fst: 音素词典（Phonetic Dictionary or Lexicon）模型，phone symbols作为输入，word symbols作为输出，如图Figure 1所示。

Kaldi中FST(Finite State Transducer)含义及其可视化_第1张图片

Figure 1 L.fst结构

L_disambig.fst是为了消除模棱两可（disambiguation）而引入的模型，表述为 the lexicon with disambiguation symbols。分歧的情况如：一个词是另一个词的前缀，cat 和 cats在同一个词典中，则需要"k ae t #1"；有同音的词，red: "r eh d #1", read: "r eh d #2"。详见Mohri的文章。

（2）G.fst: 语言模型，大部分是FSA（finite state acceptor, i.e. 每个arc的输入输出是相同的），如图Figure 2所示。

Kaldi中FST(Finite State Transducer)含义及其可视化_第2张图片

Figure 2 G.fst结构（由指令词识别1-gram语法产生，disambiguation symbol #0 未加入）

（3）C.fst: 上下文环境（Context）模型，匹配三音素序列(triphone sequences)和单音素(monophones)，扩展音素成为上下文依赖的音素。Kaldi中的C.fst由于产生起来不方便，一般情况下不独立存在，直接与L_disambig.fst和G.fst，根据决策树的结果，产生CLG.fst。

（4）H.fst: HMM(Hidden Markov Models) 模型。这里值得注意的是，传统的FST中，H.fst是以声学状态为输入，上下文依赖的音素为输出，但Kaldi中进行了扩展——“In the conventional FST recipe, the H transducer is the transducer that has, on its output, context dependent phones, and on its input, symbols representing acoustic states. In our case, the symbol on the input of H (or HCLG) is not the acoustic state (in our terminology, the pdf-id) but instead something we call the transition-id”。

（5）HCLG.fst: 最终的解码网络，由脚本mkgraph.sh(graph compilation)产生。

This script creates a fully expanded decoding graph (HCLG) that represents the language-model, pronunciation dictionary (lexicon), context-dependency, and HMM structure in our model. The output is a Finite State Transducer that has word-ids（or context dependent phones） on the output, and pdf-ids on the input (these are indexes that resolve to Gaussian Mixture Models).

产生过程依据如下公式，

HCLG = asl(min(rds(det(H' o min(det(C o min(det(Lo G))))))))

上面的o表示组合，det表示确定化,min表示最小化，rds表示去除消岐符号，asl表示增加自环。最终结构如图Figure 3所示。

Kaldi中FST(Finite State Transducer)含义及其可视化_第3张图片

Figure 3 HCLG.fst graph

2. 可视化

2.1 依赖

openfst提供了关于fst的一系列操作指令。单纯的打印出二进制fst文件并不需要额外的工具支持，openfst足够，但将fst转化成图片，即直观可视化需要Graphviz的dot-languae，所以 sudo apt-get install graphviz

2.2 函数和指令

fstprint和fstdraw是可视化用的两个基本命令，在~/kaldi-master/tools/openfst/bin中（export PATH={包含命令的路径}:$PATH可临时加入Linux PATH）。

（1）fstprint用于打印fst，可以将二进制的fst以文件形式打印出来，基本用法如下（用fstprint --help查看参数）

fstprint [--isymbols=xxxx --osymbols=xxxx ] XXX.fst

参数—isymbols和—osymbols分别表示输入符号表和输出符号表，这两个参数可以省略。

（2）fstdraw用于画fst图，fstdraw得到的结果是dot文件，通过dot命令可转为jpg格式或者pdf格式，基本用法如下（用fstdraw --help查看参数）

fstdraw [--isymbols=phones.txt --osymbols=words.txt] XXX.jpg

fstdraw [--isymbols=phones.txt --osymbols=words.txt] XXX.pdf

Note: 输入符号和输出符号一定要明确，不同的fst的输入和输出是不同的。如，L.fst的输入是phones.txt，输出是words.txt，而G.fst的输入输出都是是word.txt。

2.3 For example, ~/kaldi-master/egs/yesno/s5/data/lang例子中的G.fst的可视化——fstprint和fstdraw产生

fstprint [--isymbols=words.txt --osymbols=words.txt ] G.fst

0 0 NO NO 2.30258512

0 0 YES YES 2.30258512

0 2.30258512

fstdraw [--isymbols=words.txt --osymbols=words.txt ] G.jpg

Kaldi中FST(Finite State Transducer)含义及其可视化_第4张图片

Reference:

[1] kaldi中FST的可视化-以yesno为例

[2] Some Kaldi Notes

[3] Decoding-graph creation recipe(test time)（The Most Important）

Kaldi中FST(Finite State Transducer)含义及其可视化

1. FST含义

2. 可视化

Reference:

你可能感兴趣的:(Kaldi中FST(Finite State Transducer)含义及其可视化)