语音识别系统之kaldi------voxforge实例

   首先来介绍下voxforge。voxforge是个收集语音的网址,你可以免费的得到语音库。下载的话你可以用一些批量下载的软件或者利用kaldi/egs/voxforge/s5里的getdata.sh来下载。大概有12.6GB的大小,大家保证你的linux系统有足够的空间来存储。这个希望大家一开始就评估下,最好预留20GB的空间来进行试验。此外,最好你有2个机器,因为下数据的时间也很长,我们的网速不是很好,花了大约2天的时间左右吧。

   上面是数据的准备和一些注意的地方。希望你可以提前知晓,让你的实验不在困难吧。然后,把数据放在extracted目录下。设置后工作路径,你就可以开始实验了。

   在实验的过程中,你会遇到各种问题。希望你平常心去对待吧,毕竟跑程序虽然很简单,但这个的确不同。

    第一个问题:FLAC decompressor needed but not found! 

            问题解决:sudo apt-get install flac

   第二个问题:checking for Fortran 77 libraries of .....

            问题解决:sudo apt-get install gfortran

   第三个问题:需要安装python,numpy ,swig,pcre ,G2P等。

           问题解决:自己耐心的去安装,G2P的安装我一直出现问题,然后我单独装了一次似乎就好了。

备注:出现错误后,最好把前面的东西都删了,这样就不会出现覆盖不覆盖的问题。这个也是我花很长时间的原因所在。

 

   最后,贴下部分结果。原因是我用虚拟机跑的,而且是CPU跑的。在这个数量级上,我的机器工作了很长时间,还是被我断电了。

下面是部分结果:

--- Preparing pronunciations for OOV words ...
stack usage:  1245
--- Prepare phone lists ...
--- Adding SIL to the lexicon ...
*** Dictionary preparation finished!
Checking data/local/dict/silence_phones.txt ...
--> reading data/local/dict/silence_phones.txt
--> data/local/dict/silence_phones.txt is OK

Checking data/local/dict/optional_silence.txt ...
--> reading data/local/dict/optional_silence.txt
--> data/local/dict/optional_silence.txt is OK

Checking data/local/dict/nonsilence_phones.txt ...
--> reading data/local/dict/nonsilence_phones.txt
--> data/local/dict/silence_phones.txt is OK

Checking disjoint: silence_phones.txt, nonsilence_phones.txt
--> disjoint property is OK.

Checking data/local/dict/lexicon.txt
--> reading data/local/dict/lexicon.txt
--> data/local/dict/lexicon.txt is OK

Checking data/local/dict/extra_questions.txt ...
--> data/local/dict/extra_questions.txt is empty (this is OK)
**Creating data/local/dict/lexiconp.txt from data/local/dict/lexicon.txt
fstaddselfloops 'echo 162 |' 'echo 13155 |'
prepare_lang.sh: validating output directory
Checking data/lang/phones.txt ...
--> data/lang/phones.txt is OK

Checking words.txt: #0 ...
--> data/lang/words.txt has "#0"
--> data/lang/words.txt is OK

Checking data/lang/phones/context_indep.{txt, int, csl} ...
--> 5 entry/entries in data/lang/phones/context_indep.txt
--> data/lang/phones/context_indep.int corresponds to data/lang/phones/context_indep.txt
--> data/lang/phones/context_indep.csl corresponds to data/lang/phones/context_indep.txt
--> data/lang/phones/context_indep.{txt, int, csl} are OK

Checking data/lang/phones/disambig.{txt, int, csl} ...
--> 6 entry/entries in data/lang/phones/disambig.txt
--> data/lang/phones/disambig.int corresponds to data/lang/phones/disambig.txt
--> data/lang/phones/disambig.csl corresponds to data/lang/phones/disambig.txt
--> data/lang/phones/disambig.{txt, int, csl} are OK

Checking data/lang/phones/nonsilence.{txt, int, csl} ...
--> 156 entry/entries in data/lang/phones/nonsilence.txt
--> data/lang/phones/nonsilence.int corresponds to data/lang/phones/nonsilence.txt
--> data/lang/phones/nonsilence.csl corresponds to data/lang/phones/nonsilence.txt
--> data/lang/phones/nonsilence.{txt, int, csl} are OK

Checking data/lang/phones/silence.{txt, int, csl} ...
--> 5 entry/entries in data/lang/phones/silence.txt
--> data/lang/phones/silence.int corresponds to data/lang/phones/silence.txt
--> data/lang/phones/silence.csl corresponds to data/lang/phones/silence.txt
--> data/lang/phones/silence.{txt, int, csl} are OK

Checking data/lang/phones/optional_silence.{txt, int, csl} ...
--> 1 entry/entries in data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.int corresponds to data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.csl corresponds to data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.{txt, int, csl} are OK

Checking data/lang/phones/roots.{txt, int} ...
--> 40 entry/entries in data/lang/phones/roots.txt
--> data/lang/phones/roots.int corresponds to data/lang/phones/roots.txt
--> data/lang/phones/roots.{txt, int} are OK

Checking data/lang/phones/sets.{txt, int} ...
--> 40 entry/entries in data/lang/phones/sets.txt
--> data/lang/phones/sets.int corresponds to data/lang/phones/sets.txt
--> data/lang/phones/sets.{txt, int} are OK

Checking data/lang/phones/extra_questions.{txt, int} ...
--> 9 entry/entries in data/lang/phones/extra_questions.txt
--> data/lang/phones/extra_questions.int corresponds to data/lang/phones/extra_questions.txt
--> data/lang/phones/extra_questions.{txt, int} are OK

Checking data/lang/phones/word_boundary.{txt, int} ...
--> 161 entry/entries in data/lang/phones/word_boundary.txt
--> data/lang/phones/word_boundary.int corresponds to data/lang/phones/word_boundary.txt
--> data/lang/phones/word_boundary.{txt, int} are OK

Checking disjoint: silence.txt, nosilenct.txt, disambig.txt ...
--> silence.txt and nonsilence.txt are disjoint
--> silence.txt and disambig.txt are disjoint
--> disambig.txt and nonsilence.txt are disjoint
--> disjoint property is OK

Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
--> summation property is OK

Checking optional_silence.txt ...
--> reading data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.txt is OK

Checking disambiguation symbols: #0 and #1
--> data/lang/phones/disambig.txt has "#0" and "#1"
--> data/lang/phones/disambig.txt is OK

Checking topo ...
--> data/lang/topo's nonsilence section is OK
--> data/lang/topo's silence section is OK
--> data/lang/topo is OK

Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...
--> data/lang/phones/word_boundary.txt doesn't include disambiguation symbols
--> data/lang/phones/word_boundary.txt is the union of nonsilence.txt and silence.txt
--> data/lang/phones/word_boundary.txt is OK

Checking word_boundary.int and disambig.int
--> generating a 88 words sequence
--> resulting phone sequence from L.fst corresponds to the word sequence
--> L.fst is OK
--> generating a 60 words sequence
--> resulting phone sequence from L_disambig.fst corresponds to the word sequence
--> L_disambig.fst is OK

Checking data/lang/oov.{txt, int} ...
--> 1 entry/entries in data/lang/oov.txt
--> data/lang/oov.int corresponds to data/lang/oov.txt
--> data/lang/oov.{txt, int} are OK

--> SUCCESS
=== Preparing train and test data ...
--- Preparing the grammar transducer (G.fst) for testing ...
arpa2fst -
\data\
Processing 1-grams
Processing 2-grams
Connected 0 states without outgoing arcs.
remove_oovs.pl: removed 0 lines.
fstisstochastic data/lang_test/G.fst
-1.07288e-06 -0.179793
*** Succeeded in formatting data.
steps/make_mfcc.sh --cmd run.pl --nj 2 data/train exp/make_mfcc/train /home/fazz/wbgtest/kaldi-trunk/egs/voxforge/s5/work/mfcc
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
Succeeded creating MFCC features for train
steps/compute_cmvn_stats.sh data/train exp/make_mfcc/train /home/fazz/wbgtest/kaldi-trunk/egs/voxforge/s5/work/mfcc
Succeeded creating CMVN stats for train
steps/make_mfcc.sh --cmd run.pl --nj 2 data/test exp/make_mfcc/test /home/fazz/wbgtest/kaldi-trunk/egs/voxforge/s5/work/mfcc
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
Succeeded creating MFCC features for test
steps/compute_cmvn_stats.sh data/test exp/make_mfcc/test /home/fazz/wbgtest/kaldi-trunk/egs/voxforge/s5/work/mfcc
Succeeded creating CMVN stats for test
utils/subset_data_dir.sh: reducing #utt from 50970 to 1000
steps/train_mono.sh --nj 2 --cmd run.pl data/train.1k data/lang exp/mono
steps/train_mono.sh: Initializing monophone system.
steps/train_mono.sh: Compiling training graphs
steps/train_mono.sh: Aligning data equally (pass 0)
steps/train_mono.sh: Pass 1
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 2
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 3
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 4
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 5
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 6
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 7
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 8
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 9
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 10
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 11
steps/train_mono.sh: Pass 12
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 13
steps/train_mono.sh: Pass 14
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 15
steps/train_mono.sh: Pass 16
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 17
steps/train_mono.sh: Pass 18
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 19
steps/train_mono.sh: Pass 20
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 21
steps/train_mono.sh: Pass 22
steps/train_mono.sh: Pass 23
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 24
steps/train_mono.sh: Pass 25
steps/train_mono.sh: Pass 26
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 27
steps/train_mono.sh: Pass 28
steps/train_mono.sh: Pass 29
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 30
steps/train_mono.sh: Pass 31
steps/train_mono.sh: Pass 32
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 33
steps/train_mono.sh: Pass 34
steps/train_mono.sh: Pass 35
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 36
steps/train_mono.sh: Pass 37
steps/train_mono.sh: Pass 38
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 39
163 warnings in exp/mono/log/acc.*.*.log
1 warnings in exp/mono/log/update.*.log
1357 warnings in exp/mono/log/align.*.*.log
Done
fstminimizeencoded
fsttablecompose data/lang_test/L_disambig.fst data/lang_test/G.fst
fstdeterminizestar --use-log=true
fstisstochastic data/lang_test/tmp/LG.fst
0.00048393 -1.07637
[info]: LG not stochastic.
fstcomposecontext --context-size=1 --central-position=0 --read-disambig-syms=data/lang_test/phones/disambig.int --write-disambig-syms=data/lang_test/tmp/disambig_ilabels_1_0.int data/lang_test/tmp/ilabels_1_0
fstisstochastic data/lang_test/tmp/CLG_1_0.fst
0.00048393 -1.07637
[info]: CLG not stochastic.
make-h-transducer --disambig-syms-out=exp/mono/graph/disambig_tid.int --transition-scale=1.0 data/lang_test/tmp/ilabels_1_0 exp/mono/tree exp/mono/final.mdl
fstminimizeencoded
fstdeterminizestar --use-log=true
fsttablecompose exp/mono/graph/Ha.fst data/lang_test/tmp/CLG_1_0.fst
fstrmsymbols exp/mono/graph/disambig_tid.int
fstrmepslocal
fstisstochastic exp/mono/graph/HCLGa.fst
0.000513345 -1.07603
HCLGa is not stochastic
add-self-loops --self-loop-scale=0.1 --reorder=true exp/mono/final.mdl
steps/decode.sh --config conf/decode.config --nj 2 --cmd run.pl exp/mono/graph data/test exp/mono/decode
decode.sh: feature type is delta
exp/mono/decode/wer_10
%WER 61.01 [ 5644 / 9251, 283 ins, 1228 del, 4133 sub ]
%SER 96.50 [ 993 / 1029 ]
exp/mono/decode/wer_11
%WER 61.81 [ 5718 / 9251, 252 ins, 1386 del, 4080 sub ]
%SER 96.60 [ 994 / 1029 ]
exp/mono/decode/wer_12
%WER 62.52 [ 5784 / 9251, 220 ins, 1553 del, 4011 sub ]
%SER 96.60 [ 994 / 1029 ]
exp/mono/decode/wer_13
%WER 63.36 [ 5861 / 9251, 195 ins, 1725 del, 3941 sub ]
%SER 96.60 [ 994 / 1029 ]
exp/mono/decode/wer_14
%WER 64.79 [ 5994 / 9251, 178 ins, 1914 del, 3902 sub ]
%SER 96.79 [ 996 / 1029 ]
exp/mono/decode/wer_15
%WER 65.55 [ 6064 / 9251, 161 ins, 2021 del, 3882 sub ]
%SER 97.18 [ 1000 / 1029 ]
exp/mono/decode/wer_16
%WER 66.62 [ 6163 / 9251, 131 ins, 2163 del, 3869 sub ]
%SER 97.28 [ 1001 / 1029 ]
exp/mono/decode/wer_17
%WER 67.88 [ 6280 / 9251, 117 ins, 2320 del, 3843 sub ]
%SER 97.57 [ 1004 / 1029 ]
exp/mono/decode/wer_18
%WER 68.61 [ 6347 / 9251, 105 ins, 2401 del, 3841 sub ]
%SER 97.67 [ 1005 / 1029 ]
exp/mono/decode/wer_19
%WER 69.60 [ 6439 / 9251, 104 ins, 2512 del, 3823 sub ]
%SER 97.76 [ 1006 / 1029 ]
exp/mono/decode/wer_20
%WER 70.40 [ 6513 / 9251, 100 ins, 2609 del, 3804 sub ]
%SER 97.86 [ 1007 / 1029 ]
exp/mono/decode/wer_9
%WER 60.73 [ 5618 / 9251, 345 ins, 1109 del, 4164 sub ]
%SER 96.60 [ 994 / 1029 ]

 

后面还有,这个只跑出来一个模型应该。大家看run.sh这个脚本就知道了。最后,希望你带着平静的心来做这个实验,如果遇到什么问题,欢迎和我交流。我qq:354475072。

此外,附上前面的博客衔接,希望你可以更加快速的学习kaldi。

 1.语音识别工具箱之kaldi介绍

 2.语音识别系统之kaldi-----安装续

 3.语音识别系统kaldi----实例说明

你可能感兴趣的:(语音识别系统之kaldi------voxforge实例)