目录
简介
目的
准备
安装sphinxbase,阅读readme
安装pocketsphinx,阅读readme
安装sphinxtrain,阅读readme
运行demo
参考
Sphinx是李开复先生在CMU(Carnegie Mellon University:美国卡耐基梅隆大学)的博士论文,是大词汇量、非特定人、连续英语语音识别系统。下面一段介绍来自https://cmusphinx.github.io/wiki/download/
CMU Sphinx toolkit has a number of packages for different tasks and applications. It’s sometimes confusing what to choose. To cleanup, here is the list
- Pocketsphinx — recognizer library written in C.
- Sphinxtrain — acoustic model training tools
- Sphinxbase — support library required by Pocketsphinx and Sphinxtrain
- Sphinx4 — adjustable, modifiable recognizer written in Java
We recommend you to use the latest available releases:
- sphinxbase-5prealpha
- pocketsphinx-5prealpha
- sphinxtrain-5prealpha
- sphinx4-5prealpha
我想要在嵌入式上实现语音识别(主要是孤立词识别),因为sphinx基于C语言便于嵌入式移植,所以在目前几类热门语音识别工具包首先选择了sphinx。先给入门做一个计划:
1.安装软件,了解软件各个模块,运行软件提供的demo,心里大致有个印象;
2.阅读tutorial,翻译以便加深理解和纠错;
3.深入分析sphinx如何实现语音识别的预处理、声学模型、语言模型(因为是为了孤立词识别,所以语言模型目前不会是我的重心)、解码。
平台:Ubuntu16.04 64bit(Python2.7、perl5.22)
下载sphinxbase-5prealpha.tar.gz、sphinxtrain-5prealpha.tar.gz、pocketsphinx-5prealpha.tar.gz,解压。
PS:特别说明,我遇到的一个问题,这是一个关于mic的lib,编译sphinxbase时会根据环境configure,所以在安装sphinxbase之前请先安装pulse。
$ sudo apt-get install libpulse-dev
Linux/Unix installation
-------------------------------------------------------------------------------sphinxbase is used by other modules. The convention requires the
physical layout of the code looks like this:.
├── package/
└── sphinxbase/So if you get the file from a distribution, you might want to rename
sphinxbase-X.X to sphinxbase by typing:$ mv sphinxbase-
sphinxbase ( being the version of sphinxbase) If you downloaded directly from the Subversion repository, you need to create
the "configure" file by typing:$ ./autogen.sh
If you downloaded a release version or if you have already run
"autogen.sh", you can build simply by running:$ ./configure
$ makeIf you are compiling for a platform without floating-point arithmetic, you
should instead use:$ ./configure --enable-fixed --without-lapack
$ makeYou can also check the validity of the package by typing:
$ make check
... and then install it with (might require permissions):
$ make install
Linux/Unix installation
------------------------------------------------------------------------------In a unix-like environment (such as linux, solaris etc):
* Build and optionally install SphinxBase. If you want to use
fixed-point arithmetic, you **must** configure SphinxBase with the
`--enable-fixed` option.* If you downloaded directly from the CVS repository, you need to do
this at least once to generate the "configure" file:```
$ ./autogen.sh
```
* If you downloaded the release version, or ran `autogen.sh` at least
once, then compile and install:```
$ ./configure
$ make clean all
$ make check
$ sudo make install
```
Linux/Unix Installation:
==============================================================================This distribution now uses GNU autoconf to find out basic information
about your system, and should compile on most Unix and Unix-like
systems, and certainly on Linux. To build, simply run./configure
make
make installThis should configure everything automatically. The code has been tested with gcc.
Also, check the section title "All Platforms" above.
PS:安装过程中,sphinx会需要一些依赖,比如bison等。
All Platforms:
==============================================================================You will need Perl to use the scripts provided. Linux usually comes
with some version of Perl. If you do not have Perl installed, please
check:http://www.perl.org
where you can download it for free. For Windows, a popular version,
ActivePerl, is available from ActiveState at:http://www.activestate.com/Products/ActivePerl/
For some advanced techniques (which are not enabled by default) you
will need Python with NumPy and SciPy. Python can be obtained from:http://www.python.org/download/
Packages for NumPy and SciPy can be obtained from:
http://scipy.org/Download
尝试运行pocketsphinx
$ pocketsphinx_continuous
pocketsphinx_continuous: error while loading shared libraries: libpocketsphinx.so.1: cannot open shared object file: No such file or directory
出现了以上错误,这是因为pocketsphinx生成动态库路径不在我的默认路径内。在安装sphinxbase时也给了提示日志信息
If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the '-LLIBDIR'
flag during linking and do at least one of the following:
- add LIBDIR to the 'LD_LIBRARY_PATH' environment variable
during execution
- add LIBDIR to the 'LD_RUN_PATH' environment variable
during linking
- use the '-Wl,-rpath -Wl,LIBDIR' linker flag
- have your system administrator add LIBDIR to '/etc/ld.so.conf'
然后我们按照最后一种方法修改/etc/ld.so.conf文件。
$ vim /etc/ld.so.conf
在文件中追加下面两行
/usr/local/lib
/usr/local/lib/pkgconfig
$ ldconfig
再次尝试运行pocketsphinx
ERROR: "cmd_ln.c", line 682: No arguments given, available options are:
Arguments list definition:
[NAME] [DEFLT] [DESCR]
-adcdev Name of audio device to use for input.
-agc none Automatic gain control for c0 ('max', 'emax', 'noise', or 'none')
-agcthresh 2.0 Initial threshold for automatic gain control
-allphone Perform phoneme decoding with phonetic lm
-allphone_ci no Perform phoneme decoding with phonetic lm and context-independent units only
-alpha 0.97 Preemphasis parameter
-argfile Argument file giving extra arguments.
-ascale 20.0 Inverse of acoustic model scale for confidence score calculation
-aw 1 Inverse weight applied to acoustic scores.
-backtrace no Print results and backtraces to log.
-beam 1e-48 Beam width applied to every frame in Viterbi search (smaller values mean wider beam)
-bestpath yes Run bestpath (Dijkstra) search over word lattice (3rd pass)
-bestpathlw 9.5 Language model probability weight for bestpath search
-ceplen 13 Number of components in the input feature vector
-cmn live Cepstral mean normalization scheme ('live', 'batch', or 'none')
-cmninit 40,3,-1 Initial values (comma-separated) for cepstral mean when 'live' is used
-compallsen no Compute all senone scores in every frame (can be faster when there are many senones)
-debug Verbosity level for debugging messages
-dict Main pronunciation dictionary (lexicon) input file
-dictcase no Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only)
-dither no Add 1/2-bit noise
-doublebw no Use double bandwidth filters (same center freq)
-ds 1 Frame GMM computation downsampling ratio
-fdict Noise word pronunciation dictionary input file
-feat 1s_c_d_dd Feature stream type, depends on the acoustic model
-featparams File containing feature extraction parameters.
-fillprob 1e-8 Filler word transition probability
-frate 100 Frame rate
-fsg Sphinx format finite state grammar file
-fsgusealtpron yes Add alternate pronunciations to FSG
-fsgusefiller yes Insert filler words at each state.
-fwdflat yes Run forward flat-lexicon search over word lattice (2nd pass)
-fwdflatbeam 1e-64 Beam width applied to every frame in second-pass flat search
-fwdflatefwid 4 Minimum number of end frames for a word to be searched in fwdflat search
-fwdflatlw 8.5 Language model probability weight for flat lexicon (2nd pass) decoding
-fwdflatsfwin 25 Window of frames in lattice to search for successor words in fwdflat search
-fwdflatwbeam 7e-29 Beam width applied to word exits in second-pass flat search
-fwdtree yes Run forward lexicon-tree search (1st pass)
-hmm Directory containing acoustic model files.
-infile Audio file to transcribe.
-inmic no Transcribe audio from microphone.
-input_endian little Endianness of input data, big or little, ignored if NIST or MS Wav
-jsgf JSGF grammar file
-keyphrase Keyphrase to spot
-kws A file with keyphrases to spot, one per line
-kws_delay 10 Delay to wait for best detection score
-kws_plp 1e-1 Phone loop probability for keyphrase spotting
-kws_threshold 1 Threshold for p(hyp)/p(alternatives) ratio
-latsize 5000 Initial backpointer table size
-lda File containing transformation matrix to be applied to features (single-stream features only)
-ldadim 0 Dimensionality of output of feature transformation (0 to use entire matrix)
-lifter 0 Length of sin-curve for liftering, or 0 for no liftering.
-lm Word trigram language model input file
-lmctl Specify a set of language model
-lmname Which language model in -lmctl to use by default
-logbase 1.0001 Base in which all log-likelihoods calculated
-logfn File to write log messages in
-logspec no Write out logspectral files instead of cepstra
-lowerf 133.33334 Lower edge of filters
-lpbeam 1e-40 Beam width applied to last phone in words
-lponlybeam 7e-29 Beam width applied to last phone in single-phone words
-lw 6.5 Language model probability weight
-maxhmmpf 30000 Maximum number of active HMMs to maintain at each frame (or -1 for no pruning)
-maxwpf -1 Maximum number of distinct word exits at each frame (or -1 for no pruning)
-mdef Model definition input file
-mean Mixture gaussian means input file
-mfclogdir Directory to log feature files to
-min_endfr 0 Nodes ignored in lattice construction if they persist for fewer than N frames
-mixw Senone mixture weights input file (uncompressed)
-mixwfloor 0.0000001 Senone mixture weights floor (applied to data from -mixw file)
-mllr MLLR transformation to apply to means and variances
-mmap yes Use memory-mapped I/O (if possible) for model files
-ncep 13 Number of cep coefficients
-nfft 512 Size of FFT
-nfilt 40 Number of filter banks
-nwpen 1.0 New word transition penalty
-pbeam 1e-48 Beam width applied to phone transitions
-pip 1.0 Phone insertion penalty
-pl_beam 1e-10 Beam width applied to phone loop search for lookahead
-pl_pbeam 1e-10 Beam width applied to phone loop transitions for lookahead
-pl_pip 1.0 Phone insertion penalty for phone loop
-pl_weight 3.0 Weight for phoneme lookahead penalties
-pl_window 5 Phoneme lookahead window size, in frames
-rawlogdir Directory to log raw audio files to
-remove_dc no Remove DC offset from each frame
-remove_noise yes Remove noise with spectral subtraction in mel-energies
-remove_silence yes Enables VAD, removes silence frames from processing
-round_filters yes Round mel filter frequencies to DFT points
-samprate 16000 Sampling rate
-seed -1 Seed for random number generator; if less than zero, pick our own
-sendump Senone dump (compressed mixture weights) input file
-senlogdir Directory to log senone score files to
-senmgau Senone to codebook mapping input file (usually not needed)
-silprob 0.005 Silence word transition probability
-smoothspec no Write out cepstral-smoothed logspectral files
-svspec Subvector specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)
-time no Print word times in file transcription.
-tmat HMM state transition matrix input file
-tmatfloor 0.0001 HMM state transition probability floor (applied to -tmat file)
-topn 4 Maximum number of top Gaussians to use in scoring.
-topn_beam 0 Beam width used to determine top-N Gaussians (or a list, per-feature)
-toprule Start rule for JSGF (first public rule is default)
-transform legacy Which type of transform to use to calculate cepstra (legacy, dct, or htk)
-unit_area yes Normalize mel filters to unit area
-upperf 6855.4976 Upper edge of filters
-uw 1.0 Unigram weight
-vad_postspeech 50 Num of silence frames to keep after from speech to silence.
-vad_prespeech 20 Num of speech frames to keep before silence to speech.
-vad_startspeech 10 Num of speech frames to trigger vad from silence to speech.
-vad_threshold 2.0 Threshold for decision between noise and silence frames. Log-ratio between signal level and noise level.
-var Mixture gaussian variances input file
-varfloor 0.0001 Mixture gaussian variance floor (applied to data from -var file)
-varnorm no Variance normalize each utterance (only if CMN == current)
-verbose no Show input filenames
-warp_params Parameters defining the warping function
-warp_type inverse_linear Warping function type (or shape)
-wbeam 7e-29 Beam width applied to word exits
-wip 0.65 Word insertion penalty
-wlen 0.025625 Hamming window lengthINFO: continuous.c(295): Specify '-infile
' to recognize from file or '-inmic yes' to recognize from microphone.
根据提示可以使用-infile指定一个WAV文件,也可以使用-inmic使用硬件mic。这里我们先使用一个已知的文件,内容是单词'yes'。
$ pocketsphinx_continuous -infile ~/Downloads/speech_commands_v0.02/yes/11099149_nohash_0.wav
INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from /usr/local/share/pocketsphinx/model/en-us/en-us/feat.params
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-allphone
-allphone_ci no no
-alpha 0.97 9.700000e-01
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-ceplen 13 13
-cmn live batch
-cmninit 40,3,-1 41.00,-5.29,-0.12,5.09,2.48,-4.07,-1.37,-1.78,-5.08,-2.05,-6.45,-1.42,1.17
-compallsen no no
-debug 0
-dict /usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /usr/local/share/pocketsphinx/model/en-us/en-us
-input_endian little little
-jsgf
-keyphrase
-kws
-kws_delay 10 10
-kws_plp 1e-1 1.000000e-01
-kws_threshold 1 1.000000e+00
-latsize 5000 5000
-lda
-ldadim 0 0
-lifter 0 22
-lm /usr/local/share/pocketsphinx/model/en-us/en-us.lm.bin
-lmctl
-lmname
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.300000e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf 30000 30000
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 25
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-10 1.000000e-10
-pl_pip 1.0 1.000000e+00
-pl_weight 3.0 3.000000e+00
-pl_window 5 5
-rawlogdir
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec 0-12/13-25/26-38
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 6.800000e+03
-uw 1.0 1.000000e+00
-vad_postspeech 50 50
-vad_prespeech 20 20
-vad_startspeech 10 10
-vad_threshold 2.0 2.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='batch', VARNORM='no', AGC='none'
INFO: acmod.c(162): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(518): Reading model definition: /usr/local/share/pocketsphinx/model/en-us/en-us/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: /usr/local/share/pocketsphinx/model/en-us/en-us/mdef
INFO: bin_mdef.c(516): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-sen, 5126 Sen, 29324 Sen-Seq
INFO: tmat.c(149): Reading HMM transition probability matrices: /usr/local/share/pocketsphinx/model/en-us/en-us/transition_matrices
INFO: acmod.c(113): Attempting to use PTM computation module
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/en-us/en-us/means
INFO: ms_gauden.c(242): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/en-us/en-us/variances
INFO: ms_gauden.c(242): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(304): 222 variance values floored
INFO: ptm_mgau.c(476): Loading senones from dump file /usr/local/share/pocketsphinx/model/en-us/en-us/sendump
INFO: ptm_mgau.c(500): BEGIN FILE FORMAT DESCRIPTION
INFO: ptm_mgau.c(563): Rows: 128, Columns: 5126
INFO: ptm_mgau.c(595): Using memory-mapped I/O for senones
INFO: ptm_mgau.c(838): Maximum top-N: 4
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 138824 * 32 bytes (4338 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: /usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict
INFO: dict.c(213): Dictionary size 134723, allocated 1016 KiB for strings, 1679 KiB for phones
INFO: dict.c(336): 134723 words read
INFO: dict.c(358): Reading filler dictionary: /usr/local/share/pocketsphinx/model/en-us/en-us/noisedict
INFO: dict.c(213): Dictionary size 134728, allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 5 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 42672 bytes (41 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 42672 bytes (41 KiB) for single-phone word triphones
INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
INFO: ngram_search_fwdtree.c(74): Initializing search tree
INFO: ngram_search_fwdtree.c(101): 791 unique initial diphones
INFO: ngram_search_fwdtree.c(186): Creating search channels
INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 152609
INFO: ngram_search_fwdtree.c(333): Created 723 root, 152481 non-root channels, 53 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: continuous.c(307): pocketsphinx_continuous COMPILED ON: Nov 28 2018, AT: 11:11:04INFO: cmn_live.c(120): Update from < 41.00 -5.29 -0.12 5.09 2.48 -4.07 -1.37 -1.78 -5.08 -2.05 -6.45 -1.42 1.17 >
INFO: cmn_live.c(138): Update to < 47.94 -7.35 -17.00 -0.68 4.35 -11.16 -8.48 0.35 1.64 0.72 -1.44 0.13 -1.01 >
INFO: ngram_search_fwdtree.c(1550): 1659 words recognized (17/fr)
INFO: ngram_search_fwdtree.c(1552): 334753 senones evaluated (3348/fr)
INFO: ngram_search_fwdtree.c(1556): 1940903 channels searched (19409/fr), 66488 1st, 55406 last
INFO: ngram_search_fwdtree.c(1559): 3538 words for which last channels evaluated (35/fr)
INFO: ngram_search_fwdtree.c(1561): 149275 candidate words for entering last phone (1492/fr)
INFO: ngram_search_fwdtree.c(1564): fwdtree 0.98 CPU 0.977 xRT
INFO: ngram_search_fwdtree.c(1567): fwdtree 0.98 wall 0.977 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 57 words
INFO: ngram_search_fwdflat.c(948): 1119 words recognized (11/fr)
INFO: ngram_search_fwdflat.c(950): 76953 senones evaluated (770/fr)
INFO: ngram_search_fwdflat.c(952): 82863 channels searched (828/fr)
INFO: ngram_search_fwdflat.c(954): 4353 words searched (43/fr)
INFO: ngram_search_fwdflat.c(957): 2924 word transitions (29/fr)
INFO: ngram_search_fwdflat.c(960): fwdflat 0.06 CPU 0.060 xRT
INFO: ngram_search_fwdflat.c(963): fwdflat 0.06 wall 0.060 xRT
INFO: ngram_search.c(1250): lattice start node.0 end node.61
INFO: ngram_search.c(1276): Eliminated 2 nodes before end node
INFO: ngram_search.c(1381): Lattice has 226 nodes, 547 links
INFO: ps_lattice.c(1380): Bestpath score: -2266
INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:61:98) = -190798
INFO: ps_lattice.c(1441): Joint P(O,S) = -211857 P(S|O) = -21059
INFO: ngram_search.c(872): bestpath 0.00 CPU 0.001 xRT
INFO: ngram_search.c(875): bestpath 0.00 wall 0.001 xRT
yes
INFO: ngram_search_fwdtree.c(429): TOTAL fwdtree 0.98 CPU 0.987 xRT
INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 0.98 wall 0.987 xRT
INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 0.06 CPU 0.061 xRT
INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.06 wall 0.061 xRT
INFO: ngram_search.c(303): TOTAL bestpath 0.00 CPU 0.001 xRT
INFO: ngram_search.c(306): TOTAL bestpath 0.00 wall 0.001 xRT
可以看到pocketsphinx成功的检测到了'yes'。
这是一个顺利的开局,接下来尝试使用-inmic,进行实时语音识别。
$ pocketsphinx_continuous -inmic yes
...
ad_oss.c(115): Failed to open audio device(/dev/dsp): No such file or directory
FATAL: "continuous.c", line 245: Failed to open audio device
这是说sphinx没有找到audio device,我是在笔记本电脑上运行的,笔记本电脑上有一个默认的mic。
sudo apt-get install libpulse-dev
This is important! The sphinxbase configure script will detect your audio environment and compile this into sphinxbase. On Ubuntu Pulseaudio is the default, and believe me, the fallback, sphinx ALSA support, is hard to get working. The FAQ strongly encourages us to stick with Pulseaudio. If you compile audio in the wrong way, you'll get all sorts of errors, for example "Failed to open audio device(/dev/dsp): No such file or directory".
And try to use Sphinx on a machine with just one microphone, like a laptop. Telling Sphinx to use another mic than the "default" one is a nightmare as well, like, searching for a webcam mic in arecord -L, passing that in as -adcdev etc.
这里说在安装sphinxbase时,配置脚本会检测音频环境,然后会编译进sphinxbase,所以在配置之前我们需要先安装libpulse audio device。
$ sudo apt-get install libpulse-dev
然后重复安装sphinxbase的步骤,因为pocketsphinx依赖sphinxbase,所以也要重复安装pocketsphinx的步骤。
再次尝试
$ pocketsphinx_continuous -inmic yes
这时demo工作起来了,实时的检测周围活跃语音,并识别。
测试:对着mic说'no'
INFO: continuous.c(275): Ready....
INFO: continuous.c(261): Listening...
INFO: cmn_live.c(120): Update from < 19.86 6.95 -5.77 -2.05 -0.02 -1.30 8.82 7.85 7.79 2.40 -2.33 -0.96 2.92 >
INFO: cmn_live.c(138): Update to < 20.22 7.71 -4.99 -1.71 0.59 -0.58 8.82 8.31 7.59 2.11 -2.61 -1.33 2.58 >
INFO: ngram_search_fwdtree.c(1550): 1918 words recognized (21/fr)
INFO: ngram_search_fwdtree.c(1552): 353678 senones evaluated (3887/fr)
INFO: ngram_search_fwdtree.c(1556): 2759511 channels searched (30324/fr), 62153 1st, 69480 last
INFO: ngram_search_fwdtree.c(1559): 3984 words for which last channels evaluated (43/fr)
INFO: ngram_search_fwdtree.c(1561): 254404 candidate words for entering last phone (2795/fr)
INFO: ngram_search_fwdtree.c(1564): fwdtree 1.36 CPU 1.498 xRT
INFO: ngram_search_fwdtree.c(1567): fwdtree 3.96 wall 4.355 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 68 words
INFO: ngram_search_fwdflat.c(948): 1297 words recognized (14/fr)
INFO: ngram_search_fwdflat.c(950): 68988 senones evaluated (758/fr)
INFO: ngram_search_fwdflat.c(952): 90656 channels searched (996/fr)
INFO: ngram_search_fwdflat.c(954): 4603 words searched (50/fr)
INFO: ngram_search_fwdflat.c(957): 2897 word transitions (31/fr)
INFO: ngram_search_fwdflat.c(960): fwdflat 0.05 CPU 0.056 xRT
INFO: ngram_search_fwdflat.c(963): fwdflat 0.05 wall 0.056 xRT
INFO: ngram_search.c(1250): lattice start node.0 end node.50
INFO: ngram_search.c(1276): Eliminated 1 nodes before end node
INFO: ngram_search.c(1381): Lattice has 299 nodes, 2168 links
INFO: ps_lattice.c(1380): Bestpath score: -2024
INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:50:89) = -151318
INFO: ps_lattice.c(1441): Joint P(O,S) = -185659 P(S|O) = -34341
INFO: ngram_search.c(872): bestpath 0.00 CPU 0.005 xRT
INFO: ngram_search.c(875): bestpath 0.00 wall 0.005 xRT
no
https://cmusphinx.github.io/
https://cmusphinx.github.io/wiki/
Building an application with PocketSphinx(https://cmusphinx.github.io/wiki/tutorialpocketsphinx/)