打开链接https://cmusphinx.github.io/wiki/download/
按照以下顺序安装:
1)Sphinxbase
2)Sphinxtrain
3)Pocketsphinx
解压后进入对应文件夹,
./configure
make
make install
安装成功后,采用示例音频文件进行pocketsphinx_continuous语音识别功能检验:
1)cd pocketsphinx-5prealpha/test/data/cards
2)pocketsphinx_continuous -infile 005.wav > 005.result
3)cat 005.result
eight of spades for up close seven of hearts
pocket_sphinx默认是包含英语语言包, 如果想进行汉语识别,需要先下载普通话语言包:
1)打开https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/
2)选择Mandarin
下载加压后没有规定的路径,我个人放在了(和内嵌英语语言包对应):
pocketsphinx-5prealpha/model/zh-cn
README zh_cn.cd_cont_5000 zh_cn.dic zh_cn.lm.bin
加载汉语语言词典进行汉语识别:
pocketsphinx_continuous -adcdev plughw:0,0 -inmic yes -hmm zh_cn.cd_cont_5000 -lm zh_cn.lm.bin -dict zh_cn.dic
没有成功,出现了以下错误:
Error opening audio device plughw:0,0 for capture: Connection refused
FATAL: “continuous.c”, line 245: Failed to open audio device
Google了很多帖子,很长时间没有解决,所以就只能深入代码了。
(20200518更新:后来在研究WebRTC的时候,也遇到过open audio device 失败的case, 得到的经验是,可以先检查以下当前用户是否为root,如果在root用户下,可以切换到non-root用户,再试一下,如果成功,下面的可以不看了。)
1)在continuous.c里面,失败在了ad_open_dev调用上。
这个API对应不同的linux audio service会有不同定义,比如alsa, oos, pulseaudio,分别对应不同的实现。
具体pocketsphinx使用哪一个,就需要具体分析了。(看一下sphinxbase里面的configure脚本)
a) check系统中, 存在pulse/pulseaudio.h
ad_files=“ad_pulse.lo”
b) 系统中存在alsa/asoundlib.h
ad_files=“ad_alsa.lo”
在我本地的Debian 10系统下,之前安装过pulseaudio, 所以本地pocketsphinx使用了pulseaudio,
sudo apt-get remove pulseaudio
sudo apt-get install pulseaudio
错误依然不能解决。
2)选定alsa, 重新编译sphinxbase
因为问题得不到解决,所以想着先用alsa实现。
a) 可以改sphinxbase的configure脚本,去掉pulseaudio相关内容,然后重新编译安装sphinxbase.
b) 我自己的办法:
cd /usr/include
mv pulse pulse.bk # 重新编译sphinxbase时,让其找不到
编译sphinxbase的log:
checking pulse/pulseaudio.h usability… no
checking pulse/pulseaudio.h presence… no
checking for pulse/pulseaudio.h… no
checking alsa/asoundlib.h usability… yes
checking alsa/asoundlib.h presence… yes
checking for alsa/asoundlib.h… yes
checking for snd_pcm_open in -lasound… yes
最后,确定一下pocketsphinx是否使用alsa:
/usr/src/pocket_sphinx/sphinxbase-5prealpha/src/libsphinxad/ad_alsa.lo
3)确定alsa的声音采集/播放功能是否好用
a) 查看语音采集卡
arecord -l
**** List of CAPTURE Hardware Devices ****
card 0: I82801AAICH [Intel 82801AA-ICH], device 0: Intel ICH [Intel 82801AA-ICH]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 0: I82801AAICH [Intel 82801AA-ICH], device 1: Intel ICH - MIC ADC [Intel 82801AA-ICH - MIC ADC]
Subdevices: 1/1
Subdevice #0: subdevice #0
b) 查看语音播放的声卡
aplay -l
**** List of PLAYBACK Hardware Devices ****
card 0: I82801AAICH [Intel 82801AA-ICH], device 0: Intel ICH [Intel 82801AA-ICH]
Subdevices: 1/1
Subdevice #0: subdevice #0
c) 采集语音
arecord -D hw:I82801AAICH -fS16_LE -d10 -c2 -r48000 myrecord.wav
Recording WAVE ‘myrecord.wav’ : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo
d) 播放采集的音频
aplay -D hw:I82801AAICH -fS16_LE -c2 -r48000 myrecord.wav
Playing WAVE ‘myrecord.wav’ : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo
4)alsa功能没有问题之后,运行pocketsphinx_continuous加载汉语语言模型/词典
pocketsphinx_continuous -adcdev plughw:0,0 -inmic yes -hmm zh_cn.cd_cont_5000 -lm zh_cn.lm.bin -dict zh_cn.dic
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 192^3 * 2 bytes (13824 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 886272 bytes (865 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 886272 bytes (865 KiB) for single-phone word triphones
INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
INFO: ngram_search_fwdtree.c(74): Initializing search tree
INFO: ngram_search_fwdtree.c(101): 1354 unique initial diphones
INFO: ngram_search_fwdtree.c(186): Creating search channels
INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 37305
INFO: ngram_search_fwdtree.c(333): Created 1229 root, 37177 non-root channels, 3 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: continuous.c(307): pocketsphinx_continuous COMPILED ON: Apr 1 2020, AT: 13:20:08
INFO: continuous.c(252): Ready…
INFO: continuous.c(261): Listening…
…
INFO: ngram_search.c(1250): lattice start node .0 end node .645
INFO: ngram_search.c(1276): Eliminated 0 nodes before end node
INFO: ngram_search.c(1381): Lattice has 1000 nodes, 3639 links
INFO: ps_lattice.c(1380): Bestpath score: -23275
INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:645:666) = -1277610
INFO: ps_lattice.c(1441): Joint P(O,S) = -1346879 P(S|O) = -69269
INFO: ngram_search.c(872): bestpath 0.08 CPU 0.013 xRT
INFO: ngram_search.c(875): bestpath 0.09 wall 0.013 xRT
你好
问题解决!但是识别效果不理想,需要自己训练自己的模型。