利用OLAMI在unity游戏中加入中文语音控制（一）

(欢迎转载。本文源地址：http://blog.csdn.net/speeds3/article/details/76209152)

现在的游戏越来越精细，但操作却在向简化的方向发展。另一方面，人的手指头是有限的，太复杂的操作上手也会很困难，所以在游戏中引入语音控制会是一个不错的选择。本文中会尝试在unity加入中文语音控制的功能。

unity官方教程中的几个项目很精简，但看起来很不错，里面有全套的资源。最后我选择了tanks-tutorial来做这个实验。

下载和修改项目

首先按照教程下好项目，把坦克移动和射击的代码加上。这时就已经可以称的上是一个“游戏”了，可以控制坦克在地图上环游，也可以开炮。虽然缺少了挨揍的敌人，但是对设想的用语音控制坦克移动和射击已经足够了。这里我把地图扩大了一些，把坦克的速度降了一些，这样不至于几下就开到了地图的边缘。

修改速度

准备语义理解服务

接下来就可以开始加入语音功能了。OLAMI官网有c#的示例，示例中分别有cloud-speech-recognition和natural-language-understanding两个部分，前者字面意思似乎是语音识别，后者看起来是自然语义理解，里面又分为speech-input和text-input两部分，只是speech-input是空的。看看readme，原来已经包含在cloud-speech-recognition了。由于在这里不关心语音识别，所以就把他俩当作一样使用了，一个对应语音理解，是我们需要的部分，一个对应文字理解，可以用来测试，正好。

把SpeechApiSample.cs和NluApiSample.cs拖入unity里，稍作修改就可以直接使用。

在移动和射击脚本中添加语音控制接口

因为打算实现的方案是语音和键盘混合输入，键盘输入能打断语音控制的输入，所以这里要保存一些状态，记录是否是通过语音在控制行动或转向，以及语音转向的角度和当前已经转过的角度。代码如下：

TankMovement.cs

  // 语音控制中已经转过的角度
  private float turnAmount = 0f;
  // 语音控制中希望转到的角度
  private float turnTarget = 0f;
  // 记录是否是语音控制移动的状态
  private bool voiceMove;
  // 记录是否是语音转向的状态
  private bool voiceTurn;

  private void Update () {
        // Store the value of both input axes.
        float movement = Input.GetAxis (m_MovementAxisName);
        if (movement != 0) {
            voiceMove = false;
            m_MovementInputValue = movement;
        } else if (!voiceMove) {
            m_MovementInputValue = 0f;
        }

        float turn = Input.GetAxis (m_TurnAxisName);
        if (turn != 0) {
            voiceTurn = false;
            m_TurnInputValue = turn;
        } else if (!voiceTurn) {
            m_TurnInputValue = 0f;
        }
        EngineAudio ();
    }

  private void Turn () {
        // Determine the number of degrees to be turned based on the input, speed and time between frames.
        float turn = m_TurnInputValue * m_TurnSpeed * Time.deltaTime;

        if (turnTarget != 0) {
            turnAmount += turn;
            if (turnTarget > 0) {
                if (turnAmount > turnTarget) {
                    m_TurnInputValue = 0f;
                    turnTarget = 0f;
                    turnAmount = 0f;
                    voiceTurn = false;
                }
            } else {
                if (turnAmount < turnTarget) {
                    m_TurnInputValue = 0f;
                    turnTarget = 0f;
                    turnAmount = 0f;
                    voiceTurn = false;
                }
            }
        }

        // Make this into a rotation in the y axis.
        Quaternion turnRotation = Quaternion.Euler (0f, turn, 0f);

        // Apply this rotation to the rigidbody's rotation.
        m_Rigidbody.MoveRotation (m_Rigidbody.rotation * turnRotation);
    }

    public void VoiceMove(float movement) {
        if (movement != 0) {
            voiceMove = true;
            m_MovementInputValue = movement;
        } else {
            voiceMove = false;
            m_MovementInputValue = 0f;
        }
    }

    public void VoiceTurn(float turn) {
        if (turn == 0) {
            voiceTurn = false;
            return;
        }
        turnTarget = turn;
        voiceTurn = true;
        if (turn > 0) {
            m_TurnInputValue = 1.0f;
        } else {
            m_TurnInputValue = -1.0f;
        }

    }

转向和移动稍有些不同，移动时只要模拟按键值一直是1就可以，转向就有一个转到多少度的问题。所以Turn的代码里加了一些处理。

TankShootin中就比较简单，直接添加方法：

public void VoiceFire() {
    m_CurrentLaunchForce = m_MaxLaunchForce / 2;
    Fire ();
}

考虑到语音输入本身需要时间，这里没有加入冷却的代码，而且蓄力直接定为满格的1/2。

为了方便之后在录音和输入文本后使用，将语音控制包装到TankVoiceControl中，并将脚本附加到tank上。

TankVoiceControl.cs

using System.Collections;
using System.Collections.Generic;
using UnityEngine;

public class TankVoiceControl : MonoBehaviour {

    TankMovement move;

    TankShooting shooting;

    // Use this for initialization
    void Start () {
        move = GetComponent ();
        shooting = GetComponent ();
    }

    // Update is called once per frame
    void Update () {

    }

    public void VoiceMove(float movement) {
        move.VoiceMove (movement);
    }

    public void VoiceTurn(float turn) {
        move.VoiceTurn (turn);
    }

    public void VoiceFire() {
        shooting.VoiceFire ();
    }

  // 处理OLAMI解析出来的语义
    public void ProcessSemantic(Semantic sem) {
        if (sem.app == "game") {
            string modifier = sem.modifier [0];
            Slot[] slots = sem.slots;
            switch (modifier) {
            case "move":
                {
                    string move = "0f";
                    foreach (Slot slot in slots) {
                        if (slot.name == "movement") {
                            move = slot.value;
                        }
                    }
                    VoiceMove (float.Parse (move));
                }
                break;
            case "stop":
                {
                    VoiceMove (0f);
                }
                break;
            case "leftturn":
                {
                    string turn = "0f";
                    foreach (Slot slot in slots) {
                        if (slot.name == "turn") {
                            turn = slot.value;
                        }
                    }
                    VoiceTurn (0 - float.Parse (turn));
                }
                break;
            case "rightturn":
                {
                    string turn = "0f";
                    foreach (Slot slot in slots) {
                        if (slot.name == "turn") {
                            turn = slot.value;
                        }
                    }
                    VoiceTurn (float.Parse (turn));
                }
                break;
            case "fire":
                {
                    VoiceFire ();
                }
                break;
            }
            return;
        }
    }
}

ProcessSemantic方法用来处理OLAMI接口返回的语义。

在OLAMI平台添加语义

其实我的语义是在ProcessSemantic之前就写好了的，不过先规划好语义再去OLAMI添加也没什么问题。

添加语义

加完之后别忘了发布，再在应用管理页面配置上刚加的NLI模块。

用文本来测试语义解析

现在可以来测试一下语义能不能起作用了。这里是场景增加一个InputField，on end edit的回调函数中调用NluApiSample的GetRecognitionResult方法的。当然这其中少不了一些封装。

on end edit的回调函数

public void OnSubmitText(string text) {
        string result = VoiceService.GetInstance().sendText (text);
        VoiceResult voiceResult = JsonUtility.FromJson (result);
        if (voiceResult.status.Equals ("ok")) {
            Nli[] nlis = voiceResult.data.nli;
            if (nlis.Length != 0) {
                foreach (Nli nli in nlis) {
                    if (nli.type == "game") {
                        foreach (Semantic sem in nli.semantic) {
                            voiceControl.ProcessSemantic (sem);
                            return;
                        }
                    }
                }
            }
        }
    }

VoiceService的sendText方法

public string sendText(string text) {
        return nluApi.GetRecognitionResult ("nli", text);
    }

保存脚本，测试。文本的语义理解速度非常快，虽然是通过http请求的方式拿结果，但在我的机器上测试时感觉不到延时，坦克的转向、移动都很顺畅。

增加录音功能

unity中提供了一个Microphone类来实现麦克风的功能，可以直接得到AudioClip对象。这里采用按下F1开始录音，松开结束录音的方式。录音长度暂定为5秒。由于olami接口支持的是wav格式的PCM录音，所以在github上找到一个WavUtility来做转换。

VoiceController.cs

using System.Collections;
using System.Collections.Generic;
using UnityEngine.UI;
using UnityEngine;
using System;
using System.Threading;

public class VoiceController : MonoBehaviour {
    AudioClip audioclip;

    bool recording;

    [SerializeField]
    TankVoiceControl voiceControl;

    // Use this for initialization
    void Start () {
    }

    // Update is called once per frame
    void Update () {
        if (Input.GetKeyDown (KeyCode.F1)) {
            recording = true;
        } else if (Input.GetKeyUp(KeyCode.F1)) {
            recording = false;
        }
    }

    void LateUpdate() {
        if (recording) {
            if (!Microphone.IsRecording (null)) {
        // 开始录音
                audioclip = Microphone.Start (null, false, 5, 16000);
            }
        } else {
            if (Microphone.IsRecording(null)) {
                Microphone.End (null);
                if (audioclip != null) {
          // WavUtility中有方法必须在主线程中执行，所以只能放在这里转换
                    byte[] audiodata = WavUtility.FromAudioClip (audioclip);
          // 将发送录音的过程放到新线程里，减少主线程卡顿
                    Thread thread = new Thread (new ParameterizedThreadStart(process));
                    thread.Start ((object) audiodata);
                }
            }

        }
    }

    void process(object obj) {
        byte[] audiodata = (byte[]) obj;
        string result = VoiceService.GetInstance ().sendSpeech (audiodata);
        audioclip = null;
        Debug.Log (result);
        VoiceResult voiceResult = JsonUtility.FromJson (result);
        if (voiceResult.status.Equals ("ok")) {
            Nli[] nlis = voiceResult.data.nli;
            if (nlis != null && nlis.Length != 0) {
                foreach (Nli nli in nlis) {
                    if (nli.type == "game") {
                        foreach (Semantic sem in nli.semantic) {
                            voiceControl.ProcessSemantic (sem);
                        }
                    }
                }
            }
        }
    }
}

// 下面的几个class用于解析json数据。
[Serializable]
public class VoiceResult {
    public VoiceData data;
    public string status;
}

[Serializable]
public class VoiceData {
    public Nli[] nli;
}

[Serializable]
public class Nli {
    public DescObj desc;
    public Semantic[] semantic;
    public string type;
}

[Serializable]
public class DescObj {
    public string result;
    public int status;
}

[Serializable]
public class Semantic {
    public string app;
    public string input;
    public Slot[] slots;
    public string[] modifier;
    public string customer;
}

[Serializable]
public class Slot {
    public string name;
    public string value;
    public string[] modifier;
}

测试

现在可以启动游戏，试试语音的控制了。在我的机器上，从录音结束到坦克开始行动大概要一两秒的时间。不过说前进，后退之后不用一直按着按键，感觉还是不错的。还可以说“左转1800度”来看坦克傻傻的转圈。

总结

总的来说，虽然是在线语义理解，但OLAMI还是可以用在游戏中实时性要求不是特别高的场景，比如自动向前跑动。OLAMI在文本语义理解上的速度表现更是出乎意料的好。如果能提高语音识别的速度，例如提供离线包，相信语音控制应用的范围会更大一些。这个游戏后续我还会继续完善，敬请期待。

附录

游戏试玩下载连接：
链接: http://pan.baidu.com/s/1pLDgq9t 密码: dmxx

源码下载：
链接: http://pan.baidu.com/s/1qYWcuYC 密码: gh3n