媒体资源控制协议(Media Resource Control Protocol, MRCP)是一种通讯协议,用于语音服务器向客户端提供各种语音服务(如语音识别和语音合成)。
本文将讲解外呼系统对接 mrcp(如讯飞 mrcp),首先介绍一下 mrcp:
MRCP 的基本架构,其中,在服务器端支持了很多媒体资源。媒体资源则包括了各种媒体类型。MRCP定义了六种媒体资源类型,它们分别是:
basicsynth,支持基本的语音合成
speechsynth ,支持标准的语音合成
dtmfrecog,支持DTMF识别
speechrecog,支持语音识别
recorder,支持语音录音
speakverify,讲话人验证,声纹匹配
从定义类型就能看出 MRCP 的作用也就很明显了:充分利用SIP协议的优势,非常完美地解决管理媒体和控制会话的问题。从SIP协议的角度来看,它管理的话会话属性本身不是最重要的,它更侧重于对媒体资源的定位,提供一个整合功能。因为 SIP 协议提供的媒体资源服务器查询服务,MRCP 客户端可以获得关于媒体资源的支持能力。
我们可以理解一下,其实mrcp-server就是实现了,sip协议的,以及rtp协议,将自己作为一个sipserver,freeswitch 作为 client,进行连接,同时 freeswitch 将自己的 rtp 流实时传给 mrcp-server。
说人话,mrcp 协议其实就是再搭建一个中间件,来做语音识别(asr)、语音合成(tts)工作,因此需要 mrcp-server 与 freeswitch 进行交互,freeswitch 与真正的用户进行交互。
本文使用 media bug 进行媒体监听的方式,搭建基于讯飞的 adk、以及 unimrcp-server 与 freeswitch 的对接。
我们先参考网上已有资料,点击这里
这是目前 github 上 star 最多的中文 mrcp 对接 freeswitch,但里面有一些坑还需要优化处理。
一、mrcp-server 部署搭建,编译依赖后执行下面命令
./configure --with-apr=/opt/mrcp/MRCP-Plugin-Demo-master/unimrcp-deps-1.5.0/libs/apr --with-apr-util=/opt/mrcp/MRCP-Plugin-Demo-master/unimrcp-deps-1.5.0/libs/apr-util
其中后面 --with-apr 是我安装的目录和 代码文件对应的目录,不然可能编译好久都无法成功。
二、配置mrcp-server 和 freeswitch 配置
freeswitch 你需要配置两个地方,
a、你需要新增一个 conf/mrcp_profiles/unimrcpserver-mrcp-v2.xml
<include>
<profile name="unimrcpserver-mrcp-v2" version="2">
<param name="client-ip" value="127.0.0.1"/>
<param name="client-port" value="9060"/>
<param name="server-ip" value="192.168.0.190"/>
<param name="server-port" value="8060"/>
<param name="sip-transport" value="udp"/>
<param name="rtp-ip" value="192.168.0.190"/>
<param name="rtp-port-min" value="4000"/>
<param name="rtp-port-max" value="5000"/>
<param name="codecs" value="PCMU PCMA L16/96/8000"/>
<param name="speechsynth" value="speechsynthesizer"/>
<param name="speechrecog" value="speechrecognizer"/>
<synthparams>
</synthparams>
<recogparams>
<param name="start-input-timers" value="false"/>
</recogparams>
</profile>
</include>
我上面介绍的,client 是 freeswitch sip uas,所以你根据你自己的进行配置ip和sip端口
server是你的mrcp-server 配置的sip地址
三、下载训练对应的sdk
注意要勾选两个,一个是语音听写(asr)、一个是语音合成(tts)
替换对应的目录,plugins/third-party/xfyun,记得需要修改代码里面的appkey。
不然tts识别时候就会遇到问题:
MRCP/2.0 83 1 200 IN-PROGRESS
Channel-Identifier: ede2ac36452811ec@speechsynth
2021-11-14 16:57:43:392587 [WARN] [xfyun] 正在合成 ...
2021-11-14 16:57:43:543463 [WARN] [xfyun] QTTSAudioGet failed, error code: 10407.
2021-11-14 16:57:43:551022 [INFO] Process SPEAK-COMPLETE Event [1]
2021-11-14 16:57:43:551051 [NOTICE] State Transition SPEAKING -> IDLE
2021-11-14 16:57:43:551089 [INFO] Send MRCPv2 Data 192.168.0.190:1544 <-> 192.168.0.190:41578 [122 bytes]
MRCP/2.0 122 SPEAK-COMPLETE 1 COMPLETE
Channel-Identifier: ede2ac36452811ec@speec
10407,就是权限问题,所以记得替换 xfyun_login 方法里面 appid= 后面的星号,其他不用改。
紧接着会发现下面的问题
2021-11-14 16:41:52:936062 [NOTICE] Create RTP Termination Factory 192.168.0.190:[5000,6000]
2021-11-14 16:41:52:936073 [INFO] Register RTP Termination Factory [RTP-Factory-1]
2021-11-14 16:41:52:936086 [INFO] Load Plugin [XFyun-Recog-1] [/usr/local/unimrcp/plugin/xfyunrecog.so]
2021-11-14 16:41:52:936626 [WARN] Failed to Load DSO: /lib/libmsc.so: undefined symbol: _ZTVN10__cxxabiv117__class_type_infoE
2021-11-14 16:41:52:936653 [INFO] Load Plugin [XFyun-Synth-1] [/usr/local/unimrcp/plugin/xfyunsynth.so]
2021-11-14 16:41:52:937044 [WARN] Failed to Load DSO: /lib/libmsc.so: undefined symbol: _ZTVN10__cxxabiv117__class_type_infoE
2021-11-14 16:41:52:937076 [INFO] Register RTP Settings [RTP-Settings-1]
项目挺好的,不知道为啥,明明有人反应了问题,但是还是没有人更新代码,难道是半开源【开玩笑】
这个就是需要在Makefile文件加上 -lstdc++
./third-party/xfyun/samples/sch_translate_sample/Makefile:18:LDFLAGS += -lmsc -lrt -ldl -lpthread -lstdc++
./third-party/xfyun/samples/iat_online_sample/Makefile:18:LDFLAGS += -lmsc -lrt -ldl -lpthread -lstdc++
./third-party/xfyun/samples/ise_online_sample/Makefile:18:LDFLAGS += -lmsc -lrt -ldl -lpthread -lstdc++
./third-party/xfyun/samples/tts_online_sample/Makefile:18:LDFLAGS += -lmsc -lrt -ldl -lpthread -lstdc++
./xfyun-recog/xfyunrecog.la:20:dependency_libs=' -L../../plugins/third-party/xfyun/libs/x64 -lmsc -ldl -lpthread -lrt -lstdc++'
./xfyun-recog/Makefile.in:359: -lmsc -ldl -lpthread -lrt -lstdc++
./xfyun-recog/Makefile:359: -lmsc -ldl -lpthread -lrt -lstdc++
./xfyun-recog/.libs/xfyunrecog.lai:20:dependency_libs=' -L../../plugins/third-party/xfyun/libs/x64 -lmsc -ldl -lpthread -lrt -lstdc++'
./xfyun-recog/Makefile.am:8: -lmsc -ldl -lpthread -lrt -lstdc++
配置完成后,启动应该看不见报错了。
写上路由: 不用lua,用python
<extension name="mrcq_demo">
<condition field="destination_number" expression="^5001$">
<action application="set" data="RECORD_TITLE=Recording ${destination_number} ${caller_id_number} ${strftime(%Y-%m-%d %H:%M)}"/>
<action application="set" data="RECORD_COPYRIGHT=(c) 2011"/>
<action application="set" data="RECORD_SOFTWARE=FreeSWITCH"/>
<action application="set" data="RECORD_ARTIST=FreeSWITCH"/>
<action application="set" data="RECORD_COMMENT=FreeSWITCH"/>
<action application="set" data="RECORD_DATE=${strftime(%Y-%m-%d %H:%M)}"/>
<action application="set" data="RECORD_STEREO=true"/>
<action application="record_session" data="$${base_dir}/recordings/archive/${strftime(%Y-%m-%d-%H-%M-%S)}_${destination_number}_${caller_id_number}_${call_uuid}.wav"/>
<action application="answer"/>
<action application="sleep" data="2000"/>
<action application="python" data="mrcp"/>
</condition>
</extension>
#encoding=utf-8
from freeswitch import *
def handler1(session, args):
call_addr='user/1018'
session.execute("bridge", call_addr)
def handler(session, args):
#uuid = "ggg"
#console_log("1", "... test from my python program\n")
#session = PySession(uuid)
session.answer()
session.set_tts_params("unimrcp", "xiaofang")
session.speak("你好啊,我爱你,中国,哎你你,爱你�")
#session.execute()
session.execute("play_and_detect_speech", "say:please say yes or no. please say no or yes. please say something! detect:unimrcp {start-input-timers=false,no-input-timeout=5000,recognition-timeout=5000}builtin:grammar/boolean?language=en-US;y=1;n=2")
session.hangup()
简单的demo体验就ok了。
优化一下:做一个简单的交互机器人,python代码如下:
#encoding=utf-8
import json
import tempfile
import requests
import xml.etree.ElementTree as ET
import freeswitch as fs
from freeswitch import *
# `UNI_ENGINE`: unimrcp engine
# In Python, `+` is optional for quoted string concatenation, ^_^
UNI_ENGINE = 'detect:unimrcp {start-input-timers=false,' \
'no-input-timeout=5000,recognition-timeout=5000}'
# this will be ignored by baidu ASR, and `chat-empty` is also available
UNI_GRAMMAR = 'builtin:grammar/boolean?language=en-US;y=1;n=2'
def asr2text(result):
"""fetch recognized text from asr result (xml)"""
root = ET.fromstring(result)
node = root.find('.//input[@mode="speech"]')
text = None
if node is not None and node.text:
# node.text is unicode
text = node.text.encode('utf-8')
return text
def handler1(session, args):
call_addr='user/1018'
session.execute("bridge", call_addr)
def handler(session, args):
fs.consoleLog('info', '>>> start chatbot service')
#uuid = "ggg"
#console_log("1", "... test from my python program\n")
#session = PySession(uuid)
session.answer()
# first 请求proxy-第一句应该返回什么内容,
answer_sound = Synthesizer()('你好啊,baby。')
while session.ready():
# here, we play anser sound and detect user input in a loop
session.execute('play_and_detect_speech',
answer_sound + UNI_ENGINE + UNI_GRAMMAR)
asr_result = session.getVariable('detect_speech_result')
if asr_result is None:
# if result is None, it means session closed or timeout
fs.consoleLog('CRIT', '>>> ASR NONE')
break
try:
text = asr2text(asr_result)
except Exception as e:
fs.consoleLog('CRIT', '>>> ASR result parse failed \n%s' % e)
continue
fs.consoleLog('CRIT', '>>> ASR result is %s' % text)
# len will get correct length with unicode
if text is None or len(unicode(text, encoding='utf-8')) < 2:
fs.consoleLog('CRIT', '>>> ASR result TOO SHORT')
# answer_sound = sound_query('inaudible')
answer_sound = Synthesizer()('不好意思,我没有听清您的话,请再说一次。')
continue
# chat with robot
# text = Robot()(text)
fs.consoleLog('CRIT', 'Robot result is %s' % text)
if not text:
text = '不好意思,我刚才迷失在人生的道路上了。请问您还需要什么帮助?'
# speech synthesis
answer_sound = Synthesizer()(text)
# session close
fs.msleep(800)
session.hangup(1)
# session.set_tts_params("unimrcp", "xiaofang")
# session.speak("你好啊,我爱你,中国,哎你你,爱你�")
#session.playFile("/path/to/your.mp3", "")
#session.speak("Please enter telephone number with area code and press pound sign. ")
#input = session.getDigits("", 11, "*#", "#", 10000)
# session.hangup(1)
class Synthesizer:
def __init__(self):
self.audiofile = tempfile.NamedTemporaryFile(prefix='session_', suffix='.wav')
def __call__(self, text):
if isinstance(text, unicode):
text = text.encode('utf-8')
audio = requests.post("http://127.0.0.1:8001/tts_text", files=dict(text=(None, text))).content
import uuid
name = str(uuid.uuid1())
filename = "/tmp/" + name
with open(filename, "wb") as file:
file.write(audio.decode())
return filename
使用自己训练的离线语音合成模型 (tts)
希望大家通过本文能提高对 MRCP 的意义和作用能有所了解,对对接 MRCP 能有所掌握,我们也有自研的全套 FreeSwitch、ASR、TTS 等能力,已经全部更新到私有化部署版本,安全快捷。后续我会对各电话外呼中心,电话网络基建进行更多的详尽解释,喜欢可以关注我~有问题可以留言或私信我