kaldi常用工具(原理分析)

工具介绍使用:http://blog.csdn.net/zjm750617105/article/details/52540823 , 如果只想看怎么用那看前边那个link就足够了,不需要再往下读了,下边是对上篇文章的补充,主要是第3条到第7条。
3. 查看生成的GMM的模型,比如monophone, triphone的model。

kaldi/src/gmmbin/gmm-copy –binary=false 0.mdl -
这里0表示初始化的模型,40.mdl表示结果模型, 下面先看0.mdl的topolo部分:

<TransitionModel>
<Topology>
<TopologyEntry>
<ForPhones>
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259
ForPhones>
<State> 0 <PdfClass> 0 <Transition> 0 0.75 <Transition> 1 0.25 State>
<State> 1 <PdfClass> 1 <Transition> 1 0.75 <Transition> 2 0.25 State>
<State> 2 <PdfClass> 2 <Transition> 2 0.75 <Transition> 3 0.25 State>
<State> 3 State>
TopologyEntry>
<TopologyEntry>
<ForPhones>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
ForPhones>
<State> 0 <PdfClass> 0 <Transition> 0 0.25 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 State>
<State> 1 <PdfClass> 1 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 State>
<State> 2 <PdfClass> 2 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 State>
<State> 3 <PdfClass> 3 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 State>
<State> 4 <PdfClass> 4 <Transition> 4 0.75 <Transition> 5 0.25 State>
<State> 5 State>
TopologyEntry>
Topology>

这是初始化的状态转移概率模型,首先是从16开始的音素的部分,是3状态的,初始模型是我们写成矩阵的形式:
S-1 S0 S1 S2 S3
S-1 0 1 0 0 0
S0 0 0.75 0.25 0 0
S1 0 0 0.75 0.25 0
S2 0 0 0 0.75 0.25
S3 0 0 0 0 0
这个图就是扩展的HMM三状态的图,S-1 为开始状态可以理解为进入此音素状态,S3为结束 转态,也可以理解为出此音素的转态,一般到S2->S3就判定为结束,所以S3一行都是0 ,跟我们之前的认知是一致的。
然后1到15转态是静音音素,都是5转态的,-1表示开始,5表述结束转态,初始状态转移矩阵同上。
接下来另一个标签是,这里借用原链接中的比较简单的音素模型,我的音素太多了,都粘贴过来有点乱。

<Triples> 84 
1 0 0 
1 1 1 
1 2 2 
2 0 3 
2 1 4 
2 2 5 
3 0 6 
3 1 7 
3 2 8 
4 0 9 
4 1 10 
4 2 11 
5 0 12 
5 1 13 
5 2 14 
6 0 15 
6 1 16 
6 2 17 
7 0 18 
7 1 19 
7 2 20 
8 0 21 
8 1 22 
8 2 23 
9 0 24 
9 1 25 
9 2 26 
10 0 27 
10 1 28 
10 2 29 
11 0 30 
11 1 31 
11 2 32 
12 0 33 
12 1 34 
12 2 35 
13 0 36 
13 1 37 
13 2 38 
14 0 39 
14 1 40 
14 2 41 
15 0 42 
15 1 43 
15 2 44 
16 0 45 
16 1 46 
16 2 47 
17 0 48 
17 1 49 
17 2 50 
18 0 51 
18 1 52 
18 2 53 
19 0 54 
19 1 55 
19 2 56 
20 0 57 
20 1 58 
20 2 59 
21 0 60 
21 1 61 
21 2 62 
22 0 63 
22 1 64 
22 2 65 
23 0 66 
23 1 67 
23 2 68 
24 0 69 
24 1 70 
24 2 71 
25 0 72 
25 1 73 
25 2 74 
26 0 75 
26 1 76 
26 2 77 
27 0 78 
27 1 79 
27 2 80 
28 0 81 
28 1 82 
28 2 83 
Triples> 

这里应该是第一个音素的第几个转态的意思,比如 28 2 83 , 第28个音素的第2个转态的状态(senone)index 是83, 这个也是pdf的id。
然后是标签

[ 0 -0.6931472 -0.6931472 -0.6931472 -0.6931472 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 ]

这个vector的维数是169,这个是怎么来的呢?HMM的三个参数(初始状态概率,状态转移矩阵,生成概率矩阵),这里显然都不是,那具体是什么呢,只能去看源码了,囧囧囧

#!/bin/bash
if [ -f path.sh ]; then . ./path.sh; fi
. parse_options.sh || exit 1;
 #ses.int就是那259个音素另外加上状态共15*5+244*3=807个转态,对应上面的28个音素84个状态,我得自己运行一下,看一下中#间结果,原谅我用两套数据去解释,例如这样:1 2 3 4 56 7 8 9 1011 12 13 14 1516 17 18 19
shared_phones_opt="--shared-phones=data/lang_nosp/phones/sets.int"
sdata="data/train/split10"
lang="data/lang_nosp"
JOB=1
cmvn_opts="--norm-vars=true"
feat_dim=39
dir="exp/mono0a"
feats="ark,s,cs:apply-cmvn $cmvn_opts --utt2spk=ark:$sdata/$JOB/utt2spk scp:$sdata/$JOB/cmvn.scp scp:$sdata/$JOB/feats.scp ark:- | add-deltas ark:- ark:- |"
/home/zjm/kaldi/src/gmmbin/gmm-init-mono $shared_phones_opt "--train-feats=$feats subset-feats --n=10 ark:- ark:-|" \
$lang/topo  $feat_dim $dir/init.mdl $dir/tree

OK,参数命令都有了,上面是我们根据steps/trian_mono.sh 的脚本单独拿出来的一个gmm-init的脚本,测试成功生成了一个跟0.mdl一致的inti.mdl文件,这不是重点,重点是gmm-init-mono.cc文件中怎么生成的.
好吧把代码贴出来看看吧:

// gmmbin/gmm-init-mono.cc
int main(int argc, char *argv[]) {
  try {
    using namespace kaldi;
    using kaldi::int32;
    const char *usage =
        "Initialize monophone GMM.\n"
        "Usage:  gmm-init-mono     \n"
        "e.g.: \n"
        " gmm-init-mono topo 39 mono.mdl mono.tree\n";

    bool binary = true;
    std::string train_feats;
    std::string shared_phones_rxfilename;
    BaseFloat perturb_factor = 0.0;
    ParseOptions po(usage);
    po.Register("binary", &binary, "Write output in binary mode");
    po.Register("train-feats", &train_feats,
                "rspecifier for training features [used to set mean and variance]");
    po.Register("shared-phones", &shared_phones_rxfilename,
                "rxfilename containing, on each line, a list of phones whose pdfs should be shared.");
    po.Register("perturb-factor", &perturb_factor,
                "Perturb the means using this fraction of standard deviation.");
    po.Read(argc, argv);

    if (po.NumArgs() != 4) {
      po.PrintUsage();
      exit(1);
    }
    std::string topo_filename = po.GetArg(1);
    int dim = atoi(po.GetArg(2).c_str());
    KALDI_ASSERT(dim> 0 && dim < 10000);
    std::string model_filename = po.GetArg(3);
    std::string tree_filename = po.GetArg(4);

    Vector glob_inv_var(dim);
    glob_inv_var.Set(1.0);
    Vector glob_mean(dim);
    glob_mean.Set(1.0);

    if (train_feats != "") {
      double count = 0.0;
      Vector<double> var_stats(dim);
      Vector<double> mean_stats(dim);
      SequentialDoubleMatrixReader feat_reader(train_feats);
      for (; !feat_reader.Done(); feat_reader.Next()) {
        '''
        feature-reader是读取多句话的feature,每句话一个matrix, 一个matrix有多少行就有多少帧,
        '''
        const Matrix<double> &mat = feat_reader.Value();
        for (int32 i = 0; i < mat.NumRows(); i++) {
          count += 1.0;  #统计split1一共有多少行
          var_stats.AddVec2(1.0, mat.Row(i));  #每一列元素的平方都加起来,组成一个39维的vector
          mean_stats.AddVec(1.0, mat.Row(i));  #每一列元素都加起来,组成一个39维的vector
        }
      }
      if (count == 0) { KALDI_ERR << "no features were seen."; }
      var_stats.Scale(1.0/count);  #所有元素都乘 1.0 / count
      mean_stats.Scale(1.0/count);
      var_stats.AddVec2(-1.0, mean_stats);  # 方差 =  (x^2 - mean^2)
      if (var_stats.Min() <= 0.0)
        KALDI_ERR << "bad variance";
      var_stats.InvertElements();  #所有元素都取倒数
      glob_inv_var.CopyFromVec(var_stats);
      glob_mean.CopyFromVec(mean_stats);
    }

    HmmTopology topo;
    bool binary_in;
    Input ki(topo_filename, &binary_in);
    topo.Read(ki.Stream(), binary_in);

    const std::vector &phones = topo.GetPhones();

    std::vector phone2num_pdf_classes (1+phones.back());
    for (size_t i = 0; i < phones.size(); i++)
      phone2num_pdf_classes[phones[i]] = topo.NumPdfClasses(phones[i]);

    // Now the tree [not really a tree at this point]:
    ContextDependency *ctx_dep = NULL;
    if (shared_phones_rxfilename == "") {  // No sharing of phones: standard approach.
      ctx_dep = MonophoneContextDependency(phones, phone2num_pdf_classes);
    } else {
      std::vector<std::vector > shared_phones;
      ReadSharedPhonesList(shared_phones_rxfilename, &shared_phones);
      // ReadSharedPhonesList crashes on error.
      ctx_dep = MonophoneContextDependencyShared(shared_phones, phone2num_pdf_classes);
    }

    int32 num_pdfs = ctx_dep->NumPdfs();

    AmDiagGmm am_gmm;
    DiagGmm gmm;
    gmm.Resize(1, dim);
    {  // Initialize the gmm.
      Matrix inv_var(1, dim);
      inv_var.Row(0).CopyFromVec(glob_inv_var);
      Matrix mu(1, dim);
      mu.Row(0).CopyFromVec(glob_mean);
      Vector weights(1);
      weights.Set(1.0);
      gmm.SetInvVarsAndMeans(inv_var, mu);
      gmm.SetWeights(weights);
      gmm.ComputeGconsts();
    }

    for (int i = 0; i < num_pdfs; i++)
      am_gmm.AddPdf(gmm);

    if (perturb_factor != 0.0) {
      for (int i = 0; i < num_pdfs; i++)
        am_gmm.GetPdf(i).Perturb(perturb_factor);
    }

    // Now the transition model:
    TransitionModel trans_model(*ctx_dep, topo);

    {
      Output ko(model_filename, binary);
      trans_model.Write(ko.Stream(), binary);
      am_gmm.Write(ko.Stream(), binary);
    }

    // Now write the tree.
    ctx_dep->Write(Output(tree_filename, binary).Stream(),
                   binary);

    delete ctx_dep;
    return 0;
  } catch(const std::exception &e) {
    std::cerr << e.what();
    return -1;
  }
}

文件0.mdl中的diagGMM的数量是396, LogProb的维数是805, 所有的转态总数是807,不明白这几个数字之间有什么联系??? 或者是有比更小的senone更小的单位 ??? 大神有兴趣可以去追根溯源一下:http://kaldi-asr.org/doc/index.html

4. 下面是第4条的解释,或许可以给理解第三条有启发:
再提重要的一点:在kaldi中p.d.f.’s使用数字标识符表示的,从0开始(这些数字我们叫做pdf-ids),在HTK中他们没有名字。.mdl文件没有足够的信息能在context-dependent phones 和 pdf-ids间建立映射,为看这个,看tree文件,输入:
~/kaldi/src/bin/copy-tree –binary=false tree - > tree.txt

ContextDependency 1 0 ToPdf SE 0 [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 ]
{ SE 0 [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 ]
{ SE 0 [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 ]
{ SE 0 [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ]
{ SE 0 [ 1 2 3 4 5 6 7 8 9 10 ]
{ SE 0 [ 1 2 3 4 5 ]
 { TE -1 5 ( CE 0 CE 1 CE 2 CE 3 CE 4 )
   TE -1 5 ( CE 5 CE 6 CE 7 CE 8 CE 9 )
   } 
SE 0 [ 11 12 13 14 15 ] 
 { TE -1 5 ( CE 10 CE 11 CE 12 CE 13 CE 14 )
   TE -1 3 ( CE 15 CE 16 CE 17 )
   }  
}  
SE 0 [ 20 21 22 23 24 25 26 27 ]
{ SE 0 [ 20 21 22 23 ]
  { TE -1 3 ( CE 18 CE 19 CE 20 )
    TE -1 3 ( CE 21 CE 22 CE 23 )
    }  
SE 0 [ 28 29 30 31 ] 
  { TE -1 3 ( CE 24 CE 25 CE 26 )
    TE -1 3 ( CE 27 CE 28 CE 29 )
    }
}

看的好累啊,SE TE CE 那么多括号, 唉,今天中秋节啊喂,无意之中发现了一个好东西,所以以后多逛官网,来自dannie的lecture2,
在mono0a目录下面,然后打开pdf看看

/home/zjm/kaldi/src/bin/draw-tree ../../data/lang_nosp/phones.txt tree | dot -Tps -Gsize=8,10.5 | ps2pdf - tree.pdf

不知道为什么图片一直传不上来,大小格式都对啊,怎么总提示格式不对,上面的命令是可以运行的,自己看一下pdf吧,我简单描述一下:
有两种情况,一种是直接接叶子节点的:
phone=?
SIL SIL_B SIL_E SIL_I SIL_S (条件) NO(条件)
PdfClass=? PdfClass=?
0,1,2,3,4(条件) 0,1,2,3,4(条件)
s0 s1 s2 s3 s4 s5 s6 s7 s8 s9
对应的文件格式是:
SE 0 [ 1 2 3 4 5 ] #条件,如果phone=[SIL SIL_B SIL_E SIL_I SIL_S ]
{ TE -1 5 ( CE 0 CE 1 CE 2 CE 3 CE 4 ) #Yes
TE -1 5 ( CE 5 CE 6 CE 7 CE 8 CE 9 ) # No
}
其中SE表示条件吧,
TE表示叶子结点的条件吧, -1不知道啥意思,5表示该音素有5个状态(pdf-class)
CE表示叶子节点的值
5.下面是第五条的解释,是forced alignment 阶段的维特比对齐的结果

~/kaldi/src/bin/copy-int-vector "ark:gunzip -c ali.1.gz|" ark,t:- | head -n 1

结果:

10002_20131215_500 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 18 17 17 17 704 703 703 703 703 703 703 703 703 703 706 705 708 707 1502 1501 1504 1506 1352 1351 1351 1351 1354 1353 1356 1022 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1024 1026 1025 4 1 1 1 1 1 1 16 18 17 776 775 778 780 1022 1021 1024 1023 1026 728 730 729 729 729 729 729 729 729 729 732 731 731 731 566 565 565 565 565 568 570 569 992 991 991 991 991 994 993 996 995 398 397 397 397 397 397 397 397 397 397 397 400 399 402 401 401 401 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 18 

这是一句话的,10002_20131215_500是uttId, 后面的是一些id,看起来像音素或者状态的id, 但是我们的音素只有295个,状态只有807个,但是里边的数值都要大于这个,其实这个是transtion-id,具体这个是什么呢,继续看下一个命令
稍等,有发现一个神奇的命令:

gunzip -c ali.1.gz
/home/zjm/kaldi/src/bin/show-alignments ../../data/lang_nosp/phones.txt 40.mdl ark:ali.1

然后结果是:

10002_20131215_500  [ 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 18 17 17 17 ] [ 704 703 703 703 703 703 703 703 703 703 706 705 708 707 ] [ 1502 1501 1504 1506 ] [ 1352 1351 1351 1351 1354 1353 1356 ] [ 1022 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1024 1026 1025 ] [ 4 1 1 1 1 1 1 16 18 17 ] [ 776 775 778 780 ] [ 1022 1021 1024 1023 1026 ] [ 728 730 729 729 729 729 729 729 729 729 732 731 731 731 ] [ 566 565 565 565 565 568 570 569 ] [ 992 991 991 991 991 994 993 996 995 ] [ 398 397 397 397 397 397 397 397 397 397 397 400 399 402 401 401 401 ] [ 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 18 ]
10002_20131215_500  SIL                                                                                              x_B                                                         in_E                    y_B                                    i_E                                                                                                                                   SIL                        n_B                 i_E                          h_B                                                         ao_E                                m_B                                     a_E                                                                     SIL                                                                

10002_20131215_501  [ 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 18 17 ] [ 776 778 780 ] [ 1022 1021 1024 1026 ] [ 1472 1471 1471 1471 1471 1474 1473 1476 1475 1475 ] [ 1070 1072 1074 ] [ 368 367 367 370 369 369 372 371 371 ] [ 1502 1501 1504 1506 ] [ 680 679 679 682 681 681 681 681 684 683 683 683 ] [ 1574 1576 1578 ] [ 1352 1351 1354 1353 1356 ] [ 1022 1021 1021 1021 1021 1021 1021 1021 1021 1021 1021 1024 1026 ] [ 1472 1471 1471 1471 1471 1474 1476 1475 ] [ 1598 1600 1602 ] [ 992 991 991 991 991 994 996 ] [ 638 640 642 ] [ 1352 1354 1356 1355 1355 1355 ] [ 590 589 589 592 591 591 591 591 594 593 593 593 593 593 593 593 593 593 ] [ 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 18 ]
10002_20131215_501  SIL                                                                                                                n_B             i_E                     z_B                                                   uei_E              j_B                                     in_E                    sh_B                                                eng_E              y_B                          i_E                                                                  z_B                                         en_E               m_B                             o_E             y_B                               iang_E                                                                      SIL    

这样是不是就好多了,然后再解码的时候就很easy, 但是重点是 这里是怎么用维特比去对齐的(forced alignment), 看完源码再来补充。

6. 如何查看这些transtion-id呢,还得在0.mdl中查看:

~/kaldi/src/bin/show-transitions ../../data/lang_nosp/phones.txt 0.mdl | head -n 30

结果是:

Transition-state 1: phone = SIL hmm-state = 0 pdf = 0
 Transition-id = 1 p = 0.25 [self-loop]
 Transition-id = 2 p = 0.25 [0 -> 1]
 Transition-id = 3 p = 0.25 [0 -> 2]
 Transition-id = 4 p = 0.25 [0 -> 3]
Transition-state 2: phone = SIL hmm-state = 1 pdf = 1
 Transition-id = 5 p = 0.25 [self-loop]
 Transition-id = 6 p = 0.25 [1 -> 2]
 Transition-id = 7 p = 0.25 [1 -> 3]
 Transition-id = 8 p = 0.25 [1 -> 4]
Transition-state 3: phone = SIL hmm-state = 2 pdf = 2
 Transition-id = 9 p = 0.25 [2 -> 1]
 Transition-id = 10 p = 0.25 [self-loop]
 Transition-id = 11 p = 0.25 [2 -> 3]
 Transition-id = 12 p = 0.25 [2 -> 4]
Transition-state 4: phone = SIL hmm-state = 3 pdf = 3
 Transition-id = 13 p = 0.25 [3 -> 1]
 Transition-id = 14 p = 0.25 [3 -> 2]
 Transition-id = 15 p = 0.25 [self-loop]
 Transition-id = 16 p = 0.25 [3 -> 4]
Transition-state 5: phone = SIL hmm-state = 4 pdf = 4
 Transition-id = 17 p = 0.75 [self-loop]
 Transition-id = 18 p = 0.25 [4 -> 5]
Transition-state 6: phone = SIL_B hmm-state = 0 pdf = 0
 Transition-id = 19 p = 0.25 [self-loop]
 Transition-id = 20 p = 0.25 [0 -> 1]
 Transition-id = 21 p = 0.25 [0 -> 2]
 Transition-id = 22 p = 0.25 [0 -> 3]
  1. 下面看一下最后的模型的情況,在40.mdl中查看,并加入了40.occs
/home/zjm/kaldi/src/bin/show-transitions ../../data/lang_nosp/phones.txt 40.mdl 40.occs | head -n 30
结果:
Transition-state 1: phone = SIL hmm-state = 0 pdf = 0
 Transition-id = 1 p = 0.945521 count of pdf = 1.32797e+06 [self-loop]
 Transition-id = 2 p = 0.01 count of pdf = 1.32797e+06 [0 -> 1]
 Transition-id = 3 p = 0.01 count of pdf = 1.32797e+06 [0 -> 2]
 Transition-id = 4 p = 0.0344834 count of pdf = 1.32797e+06 [0 -> 3]
Transition-state 2: phone = SIL hmm-state = 1 pdf = 1
 Transition-id = 5 p = 0.917364 count of pdf = 140612 [self-loop]
 Transition-id = 6 p = 0.0626407 count of pdf = 140612 [1 -> 2]
 Transition-id = 7 p = 0.01 count of pdf = 140612 [1 -> 3]
 Transition-id = 8 p = 0.01 count of pdf = 140612 [1 -> 4]
Transition-state 3: phone = SIL hmm-state = 2 pdf = 2
 Transition-id = 9 p = 0.01 count of pdf = 317582 [2 -> 1]
 Transition-id = 10 p = 0.935994 count of pdf = 317582 [self-loop]
 Transition-id = 11 p = 0.0440095 count of pdf = 317582 [2 -> 3]
 Transition-id = 12 p = 0.01 count of pdf = 317582 [2 -> 4]
Transition-state 4: phone = SIL hmm-state = 3 pdf = 3
 Transition-id = 13 p = 0.01 count of pdf = 520223 [3 -> 1]
 Transition-id = 14 p = 0.01 count of pdf = 520223 [3 -> 2]
 Transition-id = 15 p = 0.874817 count of pdf = 520223 [self-loop]
 Transition-id = 16 p = 0.105187 count of pdf = 520223 [3 -> 4]
Transition-state 5: phone = SIL hmm-state = 4 pdf = 4
 Transition-id = 17 p = 0.670804 count of pdf = 179121 [self-loop]
 Transition-id = 18 p = 0.329196 count of pdf = 179121 [4 -> 5]
Transition-state 6: phone = SIL_B hmm-state = 0 pdf = 0
 Transition-id = 19 p = 0.25 count of pdf = 1.32797e+06 [self-loop]
 Transition-id = 20 p = 0.25 count of pdf = 1.32797e+06 [0 -> 1]
 Transition-id = 21 p = 0.25 count of pdf = 1.32797e+06 [0 -> 2]
 Transition-id = 22 p = 0.25 count of pdf = 1.32797e+06 [0 -> 3]
Transition-state 7: phone = SIL_B hmm-state = 1 pdf = 1
 Transition-id = 23 p = 0.25 count of pdf = 140612 [self-loop]

可以看到最后的HMM转态不是之前的pdf ,而是transtion-id ,但是为什么后边的额count of pdf 都是一样的?
三级结构:
phone
hmm-state(pdf)
transtion-id

OK, 里边还有一些内容我也不是很清楚,等有时间再继续深入

你可能感兴趣的:(ASR,kaldi)