HoloLens开发手记 - 语音识别(听写识别)

Hololens上语音输入有三种形式,分别是:

  • 语音命令 Voice Command
  • 听写 Diction
  • 语法识别 Grammar Recognizer

在 HoloLens开发手记 - 语音识别(语音命令) 博客已经介绍了 Voice Command 的用法。本文将介绍听写的用法:

听写识别 Diction


听写就是语音转化成文字 (Speech to Text)。此特性在HoloLens上使用的场所一般多用于需要用到键入文字的地方,例如在HoloLens中使用 Edge 搜索时,由于在HoloLens上一般是非常规的物理键盘输入,使用手势点按虚拟键盘键入文字的具体操作需要用户转动头部将Gaze射线光标定位到想输入的虚拟键盘字母上,再用Gesture点按手势确认选定此字母,由此可见还是有极大的不便性。

HoloLens开发手记 - 语音识别(听写识别)_第1张图片
Paste_Image.png

所以语音转为文字实现键入内容的操作将能大大提高效率。

听写特性用于将用户语音转为文字输入,同时支持内容推断和事件注册特性。Start()和Stop()方法用于启用和禁用听写功能,在听写结束后需要调用Dispose()方法来关闭听写页面。GC会自动回收它的资源,如果不Dispose会带来额外的性能开销。

使用听写识别应该注意的是:

  1. 在你的应用中必须打开 Microphone 特性。设置如下:Edit -> Project Settings -> Player -> Windows Store -> Publishing Settings > Capabilities 中确认勾上Microphone。
  2. 必须确认HoloLens连接上了wifi,这样听写识别才能工作。

DictationRecognizer.cs

using HoloToolkit;
using System.Collections;
using System.Text;
using UnityEngine;
using UnityEngine.UI;
using UnityEngine.Windows.Speech;

public class MicrophoneManager : MonoBehaviour
{
    [Tooltip("A text area for the recognizer to display the recognized strings.")]
    public Text DictationDisplay;

    private DictationRecognizer dictationRecognizer;

    // Use this string to cache the text currently displayed in the text box.
    //使用此字符串可以缓存当前显示在文本框中的文本。
    private StringBuilder textSoFar;

    void Awake()
    {
        /* TODO: DEVELOPER CODING EXERCISE 3.a */

        //Create a new DictationRecognizer and assign it to dictationRecognizer variable.
        dictationRecognizer = new DictationRecognizer();

        //Register for dictationRecognizer.DictationHypothesis and implement DictationHypothesis below
        // This event is fired while the user is talking. As the recognizer listens, it provides text of what it's heard so far.
        //注册听写假设事件。此事件在用户说话时触发。当识别器收听时,提供到目前为止所听到的内容文本
        dictationRecognizer.DictationHypothesis += DictationRecognizer_DictationHypothesis;

        //Register for dictationRecognizer.DictationResult and implement DictationResult below
        // This event is fired after the user pauses, typically at the end of a sentence. The full recognized string is returned here.
        //注册听写结果事件。此事件在用户暂停后触发,通常在句子的结尾处,返回完整的已识别字符串
        dictationRecognizer.DictationResult += DictationRecognizer_DictationResult;

        //Register for dictationRecognizer.DictationComplete and implement DictationComplete below
        // This event is fired when the recognizer stops, whether from Stop() being called, a timeout occurring, or some other error.
        //注册听写完成事件。无论是调用Stop()函数、发生超时或者其他的错误使得识别器停止都会触发此事件
        dictationRecognizer.DictationComplete += DictationRecognizer_DictationComplete;

        //Register for dictationRecognizer.DictationError and implement DictationError below
        // This event is fired when an error occurs.
        //注册听写错误事件。当发生错误时调用此事件,通常是为连接网络或者在识别过程中网络发生中断等时产生错误
        dictationRecognizer.DictationError += DictationRecognizer_DictationError;

        // Shutdown the PhraseRecognitionSystem. This controls the KeywordRecognizers
        //PhraseRecognitionSystem控制的是KeywordRecognizers,关闭语音命令关键字识别。只有在关闭这个后才能开启听写识别
        PhraseRecognitionSystem.Shutdown();

        //Start dictationRecognizer
        //开启听写识别
        dictationRecognizer.Start();

    }

    /// 
    /// This event is fired while the user is talking. As the recognizer listens, it provides text of what it's heard so far.
    /// 
    /// The currently hypothesized recognition.
    private void DictationRecognizer_DictationHypothesis(string text)
    {
        // Set DictationDisplay text to be textSoFar and new hypothesized text
        // We don't want to append to textSoFar yet, because the hypothesis may have changed on the next event
        DictationDisplay.text = textSoFar.ToString() + " " + text + "...";
    }

    /// 
    /// This event is fired after the user pauses, typically at the end of a sentence. The full recognized string is returned here.
    /// 
    /// The text that was heard by the recognizer.
    /// A representation of how confident (rejected, low, medium, high) the recognizer is of this recognition.
    private void DictationRecognizer_DictationResult(string text, ConfidenceLevel confidence)
    {
        // 3.a: Append textSoFar with latest text
        textSoFar.Append(text + "");

        // 3.a: Set DictationDisplay text to be textSoFar
        DictationDisplay.text = textSoFar.ToString();
    }

    /// 
    /// This event is fired when the recognizer stops, whether from Stop() being called, a timeout occurring, or some other error.
    /// Typically, this will simply return "Complete". In this case, we check to see if the recognizer timed out.
    /// 
    /// An enumerated reason for the session completing.
    private void DictationRecognizer_DictationComplete(DictationCompletionCause cause)
    {
        // If Timeout occurs, the user has been silent for too long.
        // With dictation, the default timeout after a recognition is 20 seconds.
        // The default timeout with initial silence is 5 seconds.
        //如果在听写开始后第一个5秒内没听到任何声音,将会超时  
        //如果识别到了一个结果但是之后20秒没听到任何声音,也会超时  
        if (cause == DictationCompletionCause.TimeoutExceeded)
        {
            Microphone.End(deviceName);

            DictationDisplay.text = "Dictation has timed out. Please press the record button again.";
            SendMessage("ResetAfterTimeout");
        }
    }

    /// 
    /// This event is fired when an error occurs.
    /// 
    /// The string representation of the error reason.
    /// The int representation of the hresult.
    private void DictationRecognizer_DictationError(string error, int hresult)
    {
        // 3.a: Set DictationDisplay text to be the error string
        DictationDisplay.text = error + "\nHRESULT: " + hresult;
    }


    // Update is called once per frame  
    void Update () {  
      
    }  
  
    void OnDestroy()  
    {  
        dictationRecognizer.Stop();  
        dictationRecognizer.DictationHypothesis -= DictationRecognizer_DictationHypothesis;  
        dictationRecognizer.DictationResult -= DictationRecognizer_DictationResult;  
        dictationRecognizer.DictationComplete -= DictationRecognizer_DictationComplete;  
        dictationRecognizer.DictationError -= DictationRecognizer_DictationError;  
        dictationRecognizer.Dispose();  
    }  

}

HoloLens只能运行单个语音识别 (run at a time),所以若要使用听写识别的话,必须要关闭KeywordRecognizer

DictationRecognizer中设置有两个超时

  1. 如果识别器启用并且在5秒内没有听到任何声音,将会超时。
  2. 如果识别器识别到了结果,但是在20秒内没有听到声音,将会超时。

你可能感兴趣的:(HoloLens开发手记 - 语音识别(听写识别))