CMU Sphinx 语音识别入门:利用Sphinx-4搭建应用

  Sphinx-4是一个纯Java的语音识别库。它提供了利用CMUSphinx声学模型进行快速和简单的语音识别的API。除了语音识别,Sphinx-4还可以用于识别发言人,更新模型,以及根据时间戳对音频进行转录等。Sphinx-4不仅支持英语,还支持许多其他语言。

如何在项目中使用Sphinx-4

导入Sphinx-4库

  如果使用Apache Maven 或者是Gradle,使用Sphinx-4非常简单。如果是Gradle,在build.gradle中:

repositories {
    mavenLocal()
    maven { url "https://oss.sonatype.org/content/repositories/snapshots" }
}

dependencies {
    compile group: 'edu.cmu.sphinx', name: 'sphinx4-core', version:'5prealpha-SNAPSHOT'
    compile group: 'edu.cmu.sphinx', name: 'sphinx4-data', version:'5prealpha-SNAPSHOT'
}

  如果是使用Maven, 需要再pom.xml中添加:


...
    
        
            snapshots-repo
            https://oss.sonatype.org/content/repositories/snapshots
            
                false
            
            
                true
            
        
    
...

然后添加sphinx4-core


  edu.cmu.sphinx
  sphinx4-core
  5prealpha-SNAPSHOT

示例

  导入了Sphinx-4的包之后,我们就可以在项目中使用Sphinx-4了,Sphinx官方提供了一下Demo可以用于参考,如下:

package com.example;

import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;

import edu.cmu.sphinx.api.Configuration;
import edu.cmu.sphinx.api.SpeechResult;
import edu.cmu.sphinx.api.StreamSpeechRecognizer;

public class TranscriberDemo {       

    public static void main(String[] args) throws Exception {

        Configuration configuration = new Configuration();

        configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
        configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
        configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin");

    StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);
    InputStream stream = new FileInputStream(new File("test.wav"));

        recognizer.startRecognition(stream);
    SpeechResult result;
        while ((result = recognizer.getResult()) != null) {
        System.out.format("Hypothesis: %s\n", result.getHypothesis());
    }
    recognizer.stopRecognition();
    }
}
配置

  我们主要看一下其中的配置,这里配置的Sphinx语音识别过程中必需的三个模型。如果是对英语进行识别,可以去官网上下载现成的模型,如果针对特定的文本进行分类,需要自己构件这三个模型,关于这三个模型的构件,Sphinx官方也给出了教程,我们以后会介绍到。

Configuration configuration = new Configuration();

// Set path to acoustic model.
configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");

// Set path to dictionary.
configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
// Set language model.
configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin");
数据源

1. LiveSpeechRecognizer
  LiveSpeechRecognizer使用当前设备上的麦克风作为语音的数据源。设置方式如下:

LiveSpeechRecognizer recognizer = new LiveSpeechRecognizer(configuration);

// Start recognition process pruning previously cached data.
recognizer.startRecognition(true);
SpeechResult result = recognizer.getResult();

// Pause recognition process. It can be resumed then with startRecognition(false).
recognizer.stopRecognition();

2. StreamSpeechRecognizer
  StreamSpeechRecognizer使用一个输入流作为语音的数据源,可以是一个文件,网络套接字或者一个现有的字节数组。设置方式如下:

StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);
recognizer.startRecognition(new FileInputStream("speech.wav"));

SpeechResult result = recognizer.getResult();
recognizer.stopRecognition();

3. SpeechAligner
    用于将应用中的音频和文本同步

SpeechAligner aligner = new SpeechAligner(configuration);
recognizer.align(new URL("101-42.wav"), "one oh one four two");

4. SpeechResult
  SpeechResult提供了获取语音识别结果的方式,例如识别的问题,一系列单词的时序等。

// Print utterance string without filler words.
System.out.println(result.getHypothesis());

// Get individual words and their times.
for (WordResult r : result.getWords()) {
    System.out.println(r);
}

// Save lattice in a graphviz format.
result.getLattice().dumpDot("lattice.dot", "lattice");

  另外,需要注意的是,音频的编码必须是如下两种格式:

1. RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz

2. RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 8000 Hz

  Sphinx的解码器不支持其他格式的音频,如果音频的格式不是以上两种,语音识别可能获取不到任何结果。也就是说在解码之前我们需要将音频转换成以上两种格式的一种。例如:如果想用8000Hz的采样率来解码样本中的语音,我们可以调用如下方法:

configuration.setSampleRate(8000);

  我们可以通过如下方式遍历到文件的结果:

while ((result = recognizer.getResult()) != null) {
    System.out.println(result.getHypothesis());
}

sphinx4-samples中还提供了如下示例:

  • Transcriber - demonstrates how to transcribe a file
  • Dialog - demonstrates how to lead a dialog with a user
  • SpeakerID - speaker identification
  • Aligner - demonstration of audio to transcription timestamping

参考:https://cmusphinx.github.io/wiki/tutorialsphinx4/

你可能感兴趣的:(CMU Sphinx 语音识别入门:利用Sphinx-4搭建应用)