Ubuntu上安装Kaldi

什么是Kaldi?
Kaldi is a speech recognition toolkit, freely available under the Apache License.
注意,Kaldi仅仅是一个工具包,不是一个语音识别框架,想做语音识别,框架还要自己写。

这里有一系列ASR开源软件的比较:
https://en.wikipedia.org/wiki/List_of_speech_recognition_software

可以看到Kaldi是唯一一个用DNN做声学模型的。

安装Kaldi很简单,傻瓜化,官网上提供了详尽的帮助。
http://kaldi-asr.org/doc/install.html

重点看一下需要的第三方库:
Software packages installed by Kaldi

The following tools and libraries come with installation scripts in the tools/ directory so you won’t have to install them yourself (note: this is a non-exhaustive list).

  1. OpenFst: we compile against this and use it heavily. //有限状态机,实现上就是一个有向图,google开发的。
  2. IRSTLM: this a language modeling toolkit. Some of the example scripts require it but it is not tightly integrated with Kaldi; we can convert any Arpa format language model to an FST.
    The IRSTLM build process requires automake, aclocal, and libtoolize (the corresponding packages are automake and libtool).
    Note: some of the example scripts now use SRILM; we make it easy to install that, although you still have to register online to download it.
  3. sph2pipe: this is for converting sph format files into other formats such as wav. It’s needed for the example scripts that use LDC data.
  4. sclite: this is for scoring and is not necessary as we have our own, simple scoring program (compute-wer.cc).
  5. ATLAS, the linear algebra library. This is only needed for the headers; in typical setups we expect that ATLAS will be on your system. However, if it not already on your system you can compile ATLAS as long as your machine does not have CPU throttling enabled. //线性代数库
  6. CLAPACK, the linear algebra library (we download the headers). This is useful only on systems where you don’t have ATLAS and are instead compiling with CLAPACK.

一些准备:
This tutorial assumes that you know the basics of speech recognition using the HMM-GMM approach. One brief introduction that is available online is: M. Gales and S. Young (2007). “The Application of Hidden Markov Models in Speech Recognition.” Foundations and Trends in Signal Processing 1(3): 195-304. The HTK Book is also a good resource. However, unless you have a strong mathematical background and are extremely dedicated, we discourage trying to learn about speech recognition outside an institutional setting. The intended audience for this tutorial is either speech recognition researchers, or graduates or advanced undergraduates who are studying this area anyway.

We assume that you know C++, and have at least some familiarity with shell scripting, preferably using bash or a similar shell. This tutorial assumes you are using a UNIX-like environment or Cygwin (although Kaldi will not necessarily compile and run in all such environments).

Also, importantly, the tutorial assumes you have access to the data on the Resource Management (RM) CDs from the Linguistic Data Consortium (LDC), in the original form as distributed by the LDC. That is, we assume this data is sitting on your system somewhere. We obtained this as catalog number LDC93S3A. It is also available in two separate pieces. Be careful because there was previously a different distribution of the RM data with a different layout.

The system requirements are fairly basic. We assume that you have tools including wget, git, svn, awk, perl and so on, or that you know how to install them. The most difficult part of the installation process relates to the math library ATLAS; if this is not already installed as a library on your system you will have to compile it, and this requires that CPU throttling be turned off, which may require root priveleges. We provide scripts and detailed instructions for all installation steps. When scripts fail, read the output carefully because it tries to provide guidance as to how to fix problems. Please inform us if there are problems at any point, however minor; see Other Kaldi-related resources (and how to get help).

We try to provide some idea how long it should take to execute each step of the tutorial. If there is a limited amount of time available to complete the tutorial, we recommend to try to keep to the posted schedule, if necessary by skipping steps and avoiding following links to more information that we provide in the text. This will help ensure that you get a balanced overview. You can always review the material in more detail later on. If this tutorial is to be given in a classroom setting, it is important that someone run through the tutorial on the relevant system beforehand in order to verify that all the prerequisites are installed.

你可能感兴趣的:(语音识别)