kaldi安装及yesno实例

Kaldi是一个非常强大的语音识别工具库,主要由Daniel Povey开发和维护。目前支持GMM-HMM、SGMM-HMM、DNN-HMM等多种语音识别的模型的训练和预测。其中DNN-HMM中的神经网络还可以由配置文件自定义,DNN、CNN、TDNN、LSTM以及Bidirectional-LSTM等神经网络结构均可支持。

目前在Github上这个项目依旧非常活跃,可以在 https://github.com/kaldi-asr/kaldi 下载代码,以及在 http://kaldi-asr.org/ 查看它的文档。

下载以及安装

与其他开源软件一样,首先Clone它在Github上的代码

$ git clone https://github.com/kaldi-asr/kaldi

Clone下来之后按照INSTALL文件的指示,需要先完成tools文件夹下的编译安装,然后再去编译src下的内容。因此,先去tools文件夹:
$ cd kaldi/tools
在tools文件夹下依旧有一个INSTALL,我们根据它的指示,一步一步完成安装。首先,需要运行extras/check_dependencies.sh这个脚本来检查一些依赖的环境是否存在并且正确配置。

$  extras/check_dependencies.sh
extras/check_dependencies.sh: automake is not installed.
extras/check_dependencies.sh: autoconf is not installed.
extras/check_dependencies.sh: neither libtoolize nor glibtoolize is installed
extras/check_dependencies.sh: subversion is not installed
extras/check_dependencies.sh: we recommend that you run (our best guess):
  sudo apt-get install automake autoconf libtool subversion
You should probably do: 
  sudo apt-get install libatlas3-base
/bin/sh is linked to dash, and currently some of the scripts will not run properly.  We recommend to run:
  sudo ln -s -f bash /bin/sh

这个输出的结果不同的Linux会不相同(我的是Ubuntu 16.04)。根据check_dependencies.sh输出结果的提示,安装缺的包,以及配置正确的环境

$ sudo apt-get install automake autoconf libtool subversion
$ sudo apt-get install libatlas3-base
$ sudo ln -s -f bash /bin/sh

然后再重新运行一遍check_dependencies.sh

$ extras/check_dependencies.sh
extras/check_dependencies.sh: all OK.

如果输出以上结果,那么我们可以继续安装了

$ make -j 16

其中-j 16是开16个job同时进行编译,这个可以根据CPU内核的数量进行指定。确定没有错误后切换到src文件夹

$ cd ../src

这里面也包含了一个INSTALL文件,按照里面的步骤编译和安装

$ ./configure

运行完成后在最后一行可以看到SUCCESS,如果没有的话那应该是哪个步骤出问题了,可以去检查一下上面几个步骤是否有错误

$ make depend
$ make -j 16

检查一下编译是否有错误,如果没有错误的话make脚本会在屏幕的最后一行输出Done。至此Kaldi的编译安装完成了,可以愉快的开始训练模型了。

运行yesno实例
步骤和结果如下:
1.直接运行./run.sh。因为run.sh里面可以直接下载。
测试呈现在linux上的结果:
book@book-desktop:~/kaldi/egs/yesno/s5$ sudo ./run.sh
[sudo] password for book:
–2017-07-03 15:20:32– http://www.openslr.org/resources/1/waves_yesno.tar.gz
Resolving www.openslr.org (www.openslr.org)… 35.184.122.207
Connecting to www.openslr.org (www.openslr.org)|35.184.122.207|:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: 4703754 (4.5M) [application/x-gzip]
Saving to: 鈥榳aves_yesno.tar.gz鈥

waves_yesno.tar.gz 100%[===================>] 4.49M 148KB/s in 45s
Data preparation succeeded
Dictionary preparation succeeded
Preparing train and test data
Preparing word lists etc.
fstaddselfloops ‘echo 4 |’ ‘echo 4 |’
Preparing language models for test
arpa2fst -
\data\
Processing 1-grams
Connected 0 states without outgoing arcs.
fstisstochastic data/lang_test_tg/G.fst
1.20397 0
Succeeded in formatting data.
Succeeded creating MFCC features for train_yesno
Succeeded creating MFCC features for test_yesno
Computing cepstral mean and variance statistics
Initializing monophone system.
Compiling training graphs
Aligning data equally (pass 0)
Pass 1
Aligning data
Pass 2
Aligning data
Pass 3
Aligning data
Pass 4
Aligning data
Pass 5
Aligning data
Pass 6
Aligning data
Pass 7
Aligning data
Pass 8
Aligning data
Pass 9
Aligning data
Pass 10
Aligning data
Pass 11
Pass 12
Aligning data
Pass 13
Pass 14
Aligning data
Pass 15
Pass 16
Aligning data
Pass 17
Pass 18
Aligning data
Pass 19
Pass 20
Aligning data
Pass 21
Pass 22
Pass 23
Aligning data
Pass 24
Pass 25
Pass 26
Aligning data
Pass 27
Pass 28
Pass 29
Aligning data
Pass 30
Pass 31
Pass 32
Aligning data
Pass 33
Pass 34
Pass 35
Aligning data
Pass 36
Pass 37
Pass 38
Aligning data
Pass 39
1 warnings in exp/mono0a/log/update.3.log
1 warnings in exp/mono0a/log/update.7.log
Done
fstminimizeencoded
fstdeterminizestar –use-log=true
fsttablecompose data/lang_test_tg/L_disambig.fst data/lang_test_tg/G.fst
fstisstochastic data/lang_test_tg/tmp/LG.fst
1.20412 -2.34608e-05
warning: LG not stochastic.
fstcomposecontext –context-size=1 –central-position=0 –read-disambig-syms=data/lang_test_tg/tmp/disambig_phones.list –write-disambig-syms=data/lang_test_tg/tmp/disambig_ilabels_1_0.list data/lang_test_tg/tmp/ilabels_1_0
fstisstochastic data/lang_test_tg/tmp/CLG_1_0.fst
1.20412 -2.34608e-05
warning: CLG not stochastic.
make-h-transducer –disambig-syms-out=exp/mono0a/graph_tgpr/disambig_tid.list –transition-scale=1.0 data/lang_test_tg/tmp/ilabels_1_0 exp/mono0a/tree exp/mono0a/final.mdl
fstminimizeencoded
fsttablecompose exp/mono0a/graph_tgpr/Ha.fst data/lang_test_tg/tmp/CLG_1_0.fst
fstdeterminizestar –use-log=true
fstrmsymbols exp/mono0a/graph_tgpr/disambig_tid.list
fstrmepslocal
fstisstochastic exp/mono0a/graph_tgpr/HCLGa.fst
1.20412 -2.34608e-05
HCLGa is not stochastic
add-self-loops –self-loop-scale=0.1 –reorder=true exp/mono0a/final.mdl

%WER 0.00 [ 0 / 232, 0 ins, 0 del, 0 sub ] xp/mono0a/decode_test_yesno/wer_10

实验前可以看一下说明文件。

你可能感兴趣的:(语音识别)