CMUSphinx Learn - Sphinx-4 Application Programmer's Guide

Overview

概要

Sphinx-4 is a pure Java speech recognition library. It's very flexible in its configuration, and in order to carry out speech recognition jobs quite a lot of objects depending on each other should be instantiated, throughout this article we will call them all together “object graph”. Fortunately, the most of the objects can be instantiated automatically, and for those few requiring manual setup Sphinx-4 provides high-level interfaces and a context class that takes out the need to setup each parameter of the object graph separately.

Sphinx-4是纯粹的java语言语音识别库,配置灵活,为了实现语音识别的任务,很多相互依赖的对象都要实例化,在这篇文章中,我们统称这些实例为对象图。幸运的是,大部分实例会被自动实例化,这些对象很少需要去手动安装Sphinx-4提供的高级接口,上下文类需要分别设置对象图标中每个参数。

 

Basic Usage

基础应用

There are several high-level recognition interfaces in Sphinx-4:

Sphinx-4中有一些高级识别接口:

  • LiveSpeechRecognizer                 语音实时识别类
  • BatchSpeechRecognizer               语音批处理识别类
  • SpeechAligner                               语音定位器类

For the most of the speech recognition jobs high-levels interfaces should be enough. And basically you will have only to setup four attributes:

用于语音识别任务的高级接口是足够的,主要需要设置四个属性:

  • Acoustic model.                              声学模型
  • Dictionary.                                      字典
  • Grammar/Language model.           语法/语言模型
  • Source of speech.                          语音数据

First three attributes are setup using Configuration object which is passed then to a recognizer. The way to point out to the speech source depends on a concrete recognizer and usually is passed as a method parameter.

前三个属性需要用配置对象来设置,这个配置对象接下来会被传入识别器,语音源数据的解释方式要看具体的识别器,通常是作为函数的参数来传递。

 

Configuration

配置

Configuration is used to supply required and optional attributes to recognizer.

配置是用来提供识别器必要和可选属性的。

 

Configuration configuration = new Configuration();
 
// Set path to acoustic model.
configuration.setAcousticModelPath("resource:/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz");
// Set path to dictionary.
configuration.setDictionaryPath("resource:/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/cmudict.0.6d");
// Set language model.
configuration.setLanguageModelPath("models/language/en-us.lm.dmp");
 

LiveSpeechRecognizer

语音实时识别类

LiveSpeechRecognizer uses microphone as the speech source.

语音实时识别类使用麦克风作为语音的输入源。

LiveSpeechRecognizer recognizer = new LiveSpeechRecognizer(configuration);
// Start recognition process pruning previously cached data.
recognizer.startRecognition(true);
SpeechResult result = recognizer.getResult()
// Pause recognition process. It can be resumed then with startRecognition(false).
recognizer.stopRecognition();
 

BatchSpeechRecognizer

语音批处理识别类

BatchSpeechRecognizer uses audio file as the speech source.

语音批处理识别类使用音频文件作为语音输入源。

BatchSpeechRecognizer recognizer = BatchSpeechRecognizer(configuration);
recognizer.startRecognition(new File("speech.wav").toURI().toURL());
SpeechResult result = recognizer.getResult()
recognizer.stopRecognition();
 

SpeechAligner

语音定位类

SpeechAligner time-aligns text with audio speech.

语音定位类按时间排列音频语音文本

SpeechAligner aligner = new SpeechAligner(configuration);
recognizer.align(new URL("101-42.wav"), "one oh one four two");
 

SpeechResult

语音识别结果类

SpeechResult provides access to various parts of the recognition result, such as recognized utterance, list of words with time stamps, recognition lattice and so forth.

语音识别结果类提供对识别结果的各种访问,比如已识别的语料,带时间戳的单词列表,识别网格等等

// Print utterance string without filler words.
System.out.println(result.getUtterance(false));
// Save lattice in a graphviz format.
result.getLattice().dumpDot("lattice.dot", "lattice");
 

API Extension

API扩展

If you need something more sophisticated than the recognition interfaces provided by Sphinx-4, you can still write your own one. In this case some internal classes can be helpful. Let's quickly go over them.

如果需要比Sphinx-4更复杂的接口,可以自己写。在这种情况下,内部类就有用了,让我们去重温一下。

 

Context

上下文类

Context wraps low-level manipulations with the underlying configuration into logical methods. For example, setting up the acoustic model it is crucial to correctly set low- and high-pass filters. Context#setAcousticModel(String) will automatically extract such information from the provided model and make necessary changes in the configuration.

上下文类将低级别的底层配置操作封装为逻辑方法,比如,建立声学模型时正确设置高低通滤波器是非常重要的,上下文类的setAcousticModel(String)方法会自动从提供的模型中提取信息,并且在配置中做必要改变。

 

Another important function of Context is the access to the object graph. It can fetch graph objects components by its class. Basically you will always need Recognizer instance as a primary class which carries out recognition and a few secondary instances which are responsible for various aspects of recognition, such as microphone or audio file interface.

上下文类中另一个重要的函数是用于访问对象图,通过这个函数的所属类可以获取图形对象组件,基本上,一直需要识别器实例作为基础类,这个基础类执行识别任务,还有一些次要的实例负责识别的各个方面,比如麦克风或音频文件接口。

 

Context context = new Context(configuration);
// Use microphone input.
context.useMicrophone();
 
// Get required instances. 
Recognizer recognizer = context.getInstance(Recognizer.class);
Microphone microphone = context.getInstance(Microphone.class);
 
// Start recognition.
recognizer.allocate();
microphone.startRecording();
Result result = recognizer.recognize();
microphone.stopRecording();
recognizer.deallocate();
 

AbstractSpeechRecognizer

抽象语音识别类

AbstractSpeechRecognizer contains boilerplate code that is common to existing speech recognizer implementations.

抽象语音识别类包含样本代码,和已存在的语音识别器实现一样。

 

XML Configuration

XML 配置

Even though it is still possible to use the old way of configuring application, XML configuration is deprecated and is a subject to remove in future releases. If you have custom XML configuration and want to switch to the new API, you will have to implement your own interface as it is described above and provide path to your configuration as an argument of Context:

即使仍然有可能使用旧有的方法来配置应用程序,但不赞成采用XML配置方法,而且以后的发布版本中将会被去除。如果已有自定义的XML配置文件,而且想把它转换到新的API,就必须要按照上面描述的方法来实现自己的接口,并且按照文本参数的方式为你的配置文件提供路径。

Context context = new Context("file:custom.config.xml", configuration);
 

Additional Information

附加信息

  • Non-wiki Sphinx-4 Documentation home page
  • Sphinx-4 Frequently Asked Questions

你可能感兴趣的:(语音识别,-,cmu,sphinx)