语音识别_CMUSphinx入门(一)

目录

简介

目的

准备

安装sphinxbase,阅读readme

安装pocketsphinx,阅读readme

安装sphinxtrain,阅读readme

运行demo

参考


简介

Sphinx是李开复先生在CMU(Carnegie Mellon University:美国卡耐基梅隆大学)的博士论文,是大词汇量、非特定人、连续英语语音识别系统。下面一段介绍来自https://cmusphinx.github.io/wiki/download/

CMU Sphinx toolkit has a number of packages for different tasks and applications. It’s sometimes confusing what to choose. To cleanup, here is the list

  • Pocketsphinx — recognizer library written in C.
  • Sphinxtrain — acoustic model training tools
  • Sphinxbase — support library required by Pocketsphinx and Sphinxtrain
  • Sphinx4 — adjustable, modifiable recognizer written in Java

We recommend you to use the latest available releases:

  • sphinxbase-5prealpha
  • pocketsphinx-5prealpha
  • sphinxtrain-5prealpha
  • sphinx4-5prealpha

目的

我想要在嵌入式上实现语音识别(主要是孤立词识别),因为sphinx基于C语言便于嵌入式移植,所以在目前几类热门语音识别工具包首先选择了sphinx。先给入门做一个计划:

1.安装软件,了解软件各个模块,运行软件提供的demo,心里大致有个印象;

2.阅读tutorial,翻译以便加深理解和纠错;

3.深入分析sphinx如何实现语音识别的预处理、声学模型、语言模型(因为是为了孤立词识别,所以语言模型目前不会是我的重心)、解码。

准备

平台:Ubuntu16.04 64bit(Python2.7、perl5.22)

下载sphinxbase-5prealpha.tar.gz、sphinxtrain-5prealpha.tar.gz、pocketsphinx-5prealpha.tar.gz,解压。

PS:特别说明,我遇到的一个问题,这是一个关于mic的lib,编译sphinxbase时会根据环境configure,所以在安装sphinxbase之前请先安装pulse。

sudo apt-get install libpulse-dev

安装sphinxbase,阅读readme

Linux/Unix installation
-------------------------------------------------------------------------------

sphinxbase is used by other modules.  The convention requires the
physical layout of the code looks like this:

    .
    ├── package/
    └── sphinxbase/

So if you get the file from a distribution, you might want to rename
sphinxbase-X.X to sphinxbase by typing:

    $ mv sphinxbase- sphinxbase ( being the version of sphinxbase)

If you downloaded directly from the Subversion repository, you need to create
the "configure" file by typing:

    $ ./autogen.sh

If you downloaded a release version or if you have already run
"autogen.sh", you can build simply by running:

    $ ./configure
    $ make

If you are compiling for a platform without floating-point arithmetic, you
should instead use:

    $ ./configure --enable-fixed --without-lapack
    $ make

You can also check the validity of the package by typing:

    $ make check

... and then install it with (might require permissions):

    $ make install

安装pocketsphinx,阅读readme

Linux/Unix installation
------------------------------------------------------------------------------

In a unix-like environment (such as linux, solaris etc):

 * Build and optionally install SphinxBase. If you want to use
   fixed-point arithmetic, you **must** configure SphinxBase with the
   `--enable-fixed` option.

 * If you downloaded directly from the CVS repository, you need to do
   this at least once to generate the "configure" file:

   ```
   $ ./autogen.sh
   ```
 * If you downloaded the release version, or ran `autogen.sh` at least
   once, then compile and install:

   ```
   $ ./configure
   $ make clean all
   $ make check
   $ sudo make install
   ```

安装sphinxtrain,阅读readme

Linux/Unix Installation:
==============================================================================

This distribution now uses GNU autoconf to find out basic information
about your system, and should compile on most Unix and Unix-like
systems, and certainly on Linux.  To build, simply run

    ./configure
    make
    make install

This should configure everything automatically. The code has been tested with gcc.

Also, check the section title "All Platforms" above.

PS:安装过程中,sphinx会需要一些依赖,比如bison等。

All Platforms:
==============================================================================

You will need Perl to use the scripts provided. Linux usually comes
with some version of Perl. If you do not have Perl installed, please
check:

http://www.perl.org

where you can download it for free. For Windows, a popular version,
ActivePerl, is available from ActiveState at:

http://www.activestate.com/Products/ActivePerl/

For some advanced techniques (which are not enabled by default) you
will need Python with NumPy and SciPy.  Python can be obtained from:

http://www.python.org/download/

Packages for NumPy and SciPy can be obtained from:

http://scipy.org/Download

运行demo

尝试运行pocketsphinx

$ pocketsphinx_continuous

pocketsphinx_continuous: error while loading shared libraries: libpocketsphinx.so.1: cannot open shared object file: No such file or directory 

出现了以上错误,这是因为pocketsphinx生成动态库路径不在我的默认路径内。在安装sphinxbase时也给了提示日志信息

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the '-LLIBDIR'
flag during linking and do at least one of the following:
   - add LIBDIR to the 'LD_LIBRARY_PATH' environment variable
     during execution
   - add LIBDIR to the 'LD_RUN_PATH' environment variable
     during linking
   - use the '-Wl,-rpath -Wl,LIBDIR' linker flag
   - have your system administrator add LIBDIR to '/etc/ld.so.conf'

然后我们按照最后一种方法修改/etc/ld.so.conf文件。

$ vim /etc/ld.so.conf

在文件中追加下面两行

/usr/local/lib

/usr/local/lib/pkgconfig

$ ldconfig

 

 再次尝试运行pocketsphinx

ERROR: "cmd_ln.c", line 682: No arguments given, available options are:
Arguments list definition:
[NAME]            [DEFLT]        [DESCR]
-adcdev                    Name of audio device to use for input.
-agc            none        Automatic gain control for c0 ('max', 'emax', 'noise', or 'none')
-agcthresh        2.0        Initial threshold for automatic gain control
-allphone                Perform phoneme decoding with phonetic lm
-allphone_ci        no        Perform phoneme decoding with phonetic lm and context-independent units only
-alpha            0.97        Preemphasis parameter
-argfile                Argument file giving extra arguments.
-ascale            20.0        Inverse of acoustic model scale for confidence score calculation
-aw            1        Inverse weight applied to acoustic scores.
-backtrace        no        Print results and backtraces to log.
-beam            1e-48        Beam width applied to every frame in Viterbi search (smaller values mean wider beam)
-bestpath        yes        Run bestpath (Dijkstra) search over word lattice (3rd pass)
-bestpathlw        9.5        Language model probability weight for bestpath search
-ceplen            13        Number of components in the input feature vector
-cmn            live        Cepstral mean normalization scheme ('live', 'batch', or 'none')
-cmninit        40,3,-1        Initial values (comma-separated) for cepstral mean when 'live' is used
-compallsen        no        Compute all senone scores in every frame (can be faster when there are many senones)
-debug                    Verbosity level for debugging messages
-dict                    Main pronunciation dictionary (lexicon) input file
-dictcase        no        Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only)
-dither            no        Add 1/2-bit noise
-doublebw        no        Use double bandwidth filters (same center freq)
-ds            1        Frame GMM computation downsampling ratio
-fdict                    Noise word pronunciation dictionary input file
-feat            1s_c_d_dd    Feature stream type, depends on the acoustic model
-featparams                File containing feature extraction parameters.
-fillprob        1e-8        Filler word transition probability
-frate            100        Frame rate
-fsg                    Sphinx format finite state grammar file
-fsgusealtpron        yes        Add alternate pronunciations to FSG
-fsgusefiller        yes        Insert filler words at each state.
-fwdflat        yes        Run forward flat-lexicon search over word lattice (2nd pass)
-fwdflatbeam        1e-64        Beam width applied to every frame in second-pass flat search
-fwdflatefwid        4        Minimum number of end frames for a word to be searched in fwdflat search
-fwdflatlw        8.5        Language model probability weight for flat lexicon (2nd pass) decoding
-fwdflatsfwin        25        Window of frames in lattice to search for successor words in fwdflat search 
-fwdflatwbeam        7e-29        Beam width applied to word exits in second-pass flat search
-fwdtree        yes        Run forward lexicon-tree search (1st pass)
-hmm                    Directory containing acoustic model files.
-infile                    Audio file to transcribe.
-inmic            no        Transcribe audio from microphone.
-input_endian        little        Endianness of input data, big or little, ignored if NIST or MS Wav
-jsgf                    JSGF grammar file
-keyphrase                Keyphrase to spot
-kws                    A file with keyphrases to spot, one per line
-kws_delay        10        Delay to wait for best detection score
-kws_plp        1e-1        Phone loop probability for keyphrase spotting
-kws_threshold        1        Threshold for p(hyp)/p(alternatives) ratio
-latsize        5000        Initial backpointer table size
-lda                    File containing transformation matrix to be applied to features (single-stream features only)
-ldadim            0        Dimensionality of output of feature transformation (0 to use entire matrix)
-lifter            0        Length of sin-curve for liftering, or 0 for no liftering.
-lm                    Word trigram language model input file
-lmctl                    Specify a set of language model
-lmname                    Which language model in -lmctl to use by default
-logbase        1.0001        Base in which all log-likelihoods calculated
-logfn                    File to write log messages in
-logspec        no        Write out logspectral files instead of cepstra
-lowerf            133.33334    Lower edge of filters
-lpbeam            1e-40        Beam width applied to last phone in words
-lponlybeam        7e-29        Beam width applied to last phone in single-phone words
-lw            6.5        Language model probability weight
-maxhmmpf        30000        Maximum number of active HMMs to maintain at each frame (or -1 for no pruning)
-maxwpf            -1        Maximum number of distinct word exits at each frame (or -1 for no pruning)
-mdef                    Model definition input file
-mean                    Mixture gaussian means input file
-mfclogdir                Directory to log feature files to
-min_endfr        0        Nodes ignored in lattice construction if they persist for fewer than N frames
-mixw                    Senone mixture weights input file (uncompressed)
-mixwfloor        0.0000001    Senone mixture weights floor (applied to data from -mixw file)
-mllr                    MLLR transformation to apply to means and variances
-mmap            yes        Use memory-mapped I/O (if possible) for model files
-ncep            13        Number of cep coefficients
-nfft            512        Size of FFT
-nfilt            40        Number of filter banks
-nwpen            1.0        New word transition penalty
-pbeam            1e-48        Beam width applied to phone transitions
-pip            1.0        Phone insertion penalty
-pl_beam        1e-10        Beam width applied to phone loop search for lookahead
-pl_pbeam        1e-10        Beam width applied to phone loop transitions for lookahead
-pl_pip            1.0        Phone insertion penalty for phone loop
-pl_weight        3.0        Weight for phoneme lookahead penalties
-pl_window        5        Phoneme lookahead window size, in frames
-rawlogdir                Directory to log raw audio files to
-remove_dc        no        Remove DC offset from each frame
-remove_noise        yes        Remove noise with spectral subtraction in mel-energies
-remove_silence        yes        Enables VAD, removes silence frames from processing
-round_filters        yes        Round mel filter frequencies to DFT points
-samprate        16000        Sampling rate
-seed            -1        Seed for random number generator; if less than zero, pick our own
-sendump                Senone dump (compressed mixture weights) input file
-senlogdir                Directory to log senone score files to
-senmgau                Senone to codebook mapping input file (usually not needed)
-silprob        0.005        Silence word transition probability
-smoothspec        no        Write out cepstral-smoothed logspectral files
-svspec                    Subvector specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)
-time            no        Print word times in file transcription.
-tmat                    HMM state transition matrix input file
-tmatfloor        0.0001        HMM state transition probability floor (applied to -tmat file)
-topn            4        Maximum number of top Gaussians to use in scoring.
-topn_beam        0        Beam width used to determine top-N Gaussians (or a list, per-feature)
-toprule                Start rule for JSGF (first public rule is default)
-transform        legacy        Which type of transform to use to calculate cepstra (legacy, dct, or htk)
-unit_area        yes        Normalize mel filters to unit area
-upperf            6855.4976    Upper edge of filters
-uw            1.0        Unigram weight
-vad_postspeech        50        Num of silence frames to keep after from speech to silence.
-vad_prespeech        20        Num of speech frames to keep before silence to speech.
-vad_startspeech    10        Num of speech frames to trigger vad from silence to speech.
-vad_threshold        2.0        Threshold for decision between noise and silence frames. Log-ratio between signal level and noise level.
-var                    Mixture gaussian variances input file
-varfloor        0.0001        Mixture gaussian variance floor (applied to data from -var file)
-varnorm        no        Variance normalize each utterance (only if CMN == current)
-verbose        no        Show input filenames
-warp_params                Parameters defining the warping function
-warp_type        inverse_linear    Warping function type (or shape)
-wbeam            7e-29        Beam width applied to word exits
-wip            0.65        Word insertion penalty
-wlen            0.025625    Hamming window length

INFO: continuous.c(295): Specify '-infile ' to recognize from file or '-inmic yes' to recognize from microphone.

 根据提示可以使用-infile指定一个WAV文件,也可以使用-inmic使用硬件mic。这里我们先使用一个已知的文件,内容是单词'yes'。

pocketsphinx_continuous -infile ~/Downloads/speech_commands_v0.02/yes/11099149_nohash_0.wav

 

INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from /usr/local/share/pocketsphinx/model/en-us/en-us/feat.params
Current configuration:
[NAME]            [DEFLT]        [VALUE]
-agc            none        none
-agcthresh        2.0        2.000000e+00
-allphone                
-allphone_ci        no        no
-alpha            0.97        9.700000e-01
-ascale            20.0        2.000000e+01
-aw            1        1
-backtrace        no        no
-beam            1e-48        1.000000e-48
-bestpath        yes        yes
-bestpathlw        9.5        9.500000e+00
-ceplen            13        13
-cmn            live        batch
-cmninit        40,3,-1        41.00,-5.29,-0.12,5.09,2.48,-4.07,-1.37,-1.78,-5.08,-2.05,-6.45,-1.42,1.17
-compallsen        no        no
-debug                    0
-dict                    /usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict
-dictcase        no        no
-dither            no        no
-doublebw        no        no
-ds            1        1
-fdict                    
-feat            1s_c_d_dd    1s_c_d_dd
-featparams                
-fillprob        1e-8        1.000000e-08
-frate            100        100
-fsg                    
-fsgusealtpron        yes        yes
-fsgusefiller        yes        yes
-fwdflat        yes        yes
-fwdflatbeam        1e-64        1.000000e-64
-fwdflatefwid        4        4
-fwdflatlw        8.5        8.500000e+00
-fwdflatsfwin        25        25
-fwdflatwbeam        7e-29        7.000000e-29
-fwdtree        yes        yes
-hmm                    /usr/local/share/pocketsphinx/model/en-us/en-us
-input_endian        little        little
-jsgf                    
-keyphrase                
-kws                    
-kws_delay        10        10
-kws_plp        1e-1        1.000000e-01
-kws_threshold        1        1.000000e+00
-latsize        5000        5000
-lda                    
-ldadim            0        0
-lifter            0        22
-lm                    /usr/local/share/pocketsphinx/model/en-us/en-us.lm.bin
-lmctl                    
-lmname                    
-logbase        1.0001        1.000100e+00
-logfn                    
-logspec        no        no
-lowerf            133.33334    1.300000e+02
-lpbeam            1e-40        1.000000e-40
-lponlybeam        7e-29        7.000000e-29
-lw            6.5        6.500000e+00
-maxhmmpf        30000        30000
-maxwpf            -1        -1
-mdef                    
-mean                    
-mfclogdir                
-min_endfr        0        0
-mixw                    
-mixwfloor        0.0000001    1.000000e-07
-mllr                    
-mmap            yes        yes
-ncep            13        13
-nfft            512        512
-nfilt            40        25
-nwpen            1.0        1.000000e+00
-pbeam            1e-48        1.000000e-48
-pip            1.0        1.000000e+00
-pl_beam        1e-10        1.000000e-10
-pl_pbeam        1e-10        1.000000e-10
-pl_pip            1.0        1.000000e+00
-pl_weight        3.0        3.000000e+00
-pl_window        5        5
-rawlogdir                
-remove_dc        no        no
-remove_noise        yes        yes
-remove_silence        yes        yes
-round_filters        yes        yes
-samprate        16000        1.600000e+04
-seed            -1        -1
-sendump                
-senlogdir                
-senmgau                
-silprob        0.005        5.000000e-03
-smoothspec        no        no
-svspec                    0-12/13-25/26-38
-tmat                    
-tmatfloor        0.0001        1.000000e-04
-topn            4        4
-topn_beam        0        0
-toprule                
-transform        legacy        dct
-unit_area        yes        yes
-upperf            6855.4976    6.800000e+03
-uw            1.0        1.000000e+00
-vad_postspeech        50        50
-vad_prespeech        20        20
-vad_startspeech    10        10
-vad_threshold        2.0        2.000000e+00
-var                    
-varfloor        0.0001        1.000000e-04
-varnorm        no        no
-verbose        no        no
-warp_params                
-warp_type        inverse_linear    inverse_linear
-wbeam            7e-29        7.000000e-29
-wip            0.65        6.500000e-01
-wlen            0.025625    2.562500e-02

INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='batch', VARNORM='no', AGC='none'
INFO: acmod.c(162): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(518): Reading model definition: /usr/local/share/pocketsphinx/model/en-us/en-us/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: /usr/local/share/pocketsphinx/model/en-us/en-us/mdef
INFO: bin_mdef.c(516): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-sen, 5126 Sen, 29324 Sen-Seq
INFO: tmat.c(149): Reading HMM transition probability matrices: /usr/local/share/pocketsphinx/model/en-us/en-us/transition_matrices
INFO: acmod.c(113): Attempting to use PTM computation module
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/en-us/en-us/means
INFO: ms_gauden.c(242): 42 codebook, 3 feature, size: 
INFO: ms_gauden.c(244):  128x13
INFO: ms_gauden.c(244):  128x13
INFO: ms_gauden.c(244):  128x13
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/en-us/en-us/variances
INFO: ms_gauden.c(242): 42 codebook, 3 feature, size: 
INFO: ms_gauden.c(244):  128x13
INFO: ms_gauden.c(244):  128x13
INFO: ms_gauden.c(244):  128x13
INFO: ms_gauden.c(304): 222 variance values floored
INFO: ptm_mgau.c(476): Loading senones from dump file /usr/local/share/pocketsphinx/model/en-us/en-us/sendump
INFO: ptm_mgau.c(500): BEGIN FILE FORMAT DESCRIPTION
INFO: ptm_mgau.c(563): Rows: 128, Columns: 5126
INFO: ptm_mgau.c(595): Using memory-mapped I/O for senones
INFO: ptm_mgau.c(838): Maximum top-N: 4
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 138824 * 32 bytes (4338 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: /usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict
INFO: dict.c(213): Dictionary size 134723, allocated 1016 KiB for strings, 1679 KiB for phones
INFO: dict.c(336): 134723 words read
INFO: dict.c(358): Reading filler dictionary: /usr/local/share/pocketsphinx/model/en-us/en-us/noisedict
INFO: dict.c(213): Dictionary size 134728, allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 5 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 42672 bytes (41 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 42672 bytes (41 KiB) for single-phone word triphones
INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
INFO: ngram_search_fwdtree.c(74): Initializing search tree
INFO: ngram_search_fwdtree.c(101): 791 unique initial diphones
INFO: ngram_search_fwdtree.c(186): Creating search channels
INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 152609
INFO: ngram_search_fwdtree.c(333): Created 723 root, 152481 non-root channels, 53 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: continuous.c(307): pocketsphinx_continuous COMPILED ON: Nov 28 2018, AT: 11:11:04

INFO: cmn_live.c(120): Update from < 41.00 -5.29 -0.12  5.09  2.48 -4.07 -1.37 -1.78 -5.08 -2.05 -6.45 -1.42  1.17 >
INFO: cmn_live.c(138): Update to   < 47.94 -7.35 -17.00 -0.68  4.35 -11.16 -8.48  0.35  1.64  0.72 -1.44  0.13 -1.01 >
INFO: ngram_search_fwdtree.c(1550):     1659 words recognized (17/fr)
INFO: ngram_search_fwdtree.c(1552):   334753 senones evaluated (3348/fr)
INFO: ngram_search_fwdtree.c(1556):  1940903 channels searched (19409/fr), 66488 1st, 55406 last
INFO: ngram_search_fwdtree.c(1559):     3538 words for which last channels evaluated (35/fr)
INFO: ngram_search_fwdtree.c(1561):   149275 candidate words for entering last phone (1492/fr)
INFO: ngram_search_fwdtree.c(1564): fwdtree 0.98 CPU 0.977 xRT
INFO: ngram_search_fwdtree.c(1567): fwdtree 0.98 wall 0.977 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 57 words
INFO: ngram_search_fwdflat.c(948):     1119 words recognized (11/fr)
INFO: ngram_search_fwdflat.c(950):    76953 senones evaluated (770/fr)
INFO: ngram_search_fwdflat.c(952):    82863 channels searched (828/fr)
INFO: ngram_search_fwdflat.c(954):     4353 words searched (43/fr)
INFO: ngram_search_fwdflat.c(957):     2924 word transitions (29/fr)
INFO: ngram_search_fwdflat.c(960): fwdflat 0.06 CPU 0.060 xRT
INFO: ngram_search_fwdflat.c(963): fwdflat 0.06 wall 0.060 xRT
INFO: ngram_search.c(1250): lattice start node .0 end node .61
INFO: ngram_search.c(1276): Eliminated 2 nodes before end node
INFO: ngram_search.c(1381): Lattice has 226 nodes, 547 links
INFO: ps_lattice.c(1380): Bestpath score: -2266
INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:61:98) = -190798
INFO: ps_lattice.c(1441): Joint P(O,S) = -211857 P(S|O) = -21059
INFO: ngram_search.c(872): bestpath 0.00 CPU 0.001 xRT
INFO: ngram_search.c(875): bestpath 0.00 wall 0.001 xRT
yes
INFO: ngram_search_fwdtree.c(429): TOTAL fwdtree 0.98 CPU 0.987 xRT
INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 0.98 wall 0.987 xRT
INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 0.06 CPU 0.061 xRT
INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.06 wall 0.061 xRT
INFO: ngram_search.c(303): TOTAL bestpath 0.00 CPU 0.001 xRT
INFO: ngram_search.c(306): TOTAL bestpath 0.00 wall 0.001 xRT

可以看到pocketsphinx成功的检测到了'yes'。

这是一个顺利的开局,接下来尝试使用-inmic,进行实时语音识别。

$ pocketsphinx_continuous -inmic yes

...

ad_oss.c(115): Failed to open audio device(/dev/dsp): No such file or directory
FATAL: "continuous.c", line 245: Failed to open audio device

这是说sphinx没有找到audio device,我是在笔记本电脑上运行的,笔记本电脑上有一个默认的mic。 

sudo apt-get install libpulse-dev

This is important! The sphinxbase configure script will detect your audio environment and compile this into sphinxbase. On Ubuntu Pulseaudio is the default, and believe me, the fallback, sphinx ALSA support, is hard to get working. The FAQ strongly encourages us to stick with Pulseaudio. If you compile audio in the wrong way, you'll get all sorts of errors, for example "Failed to open audio device(/dev/dsp): No such file or directory".
And try to use Sphinx on a machine with just one microphone, like a laptop. Telling Sphinx to use another mic than the "default" one is a nightmare as well, like, searching for a webcam mic in arecord -L, passing that in as -adcdev etc.

 这里说在安装sphinxbase时,配置脚本会检测音频环境,然后会编译进sphinxbase,所以在配置之前我们需要先安装libpulse audio device。

sudo apt-get install libpulse-dev

然后重复安装sphinxbase的步骤,因为pocketsphinx依赖sphinxbase,所以也要重复安装pocketsphinx的步骤。

 再次尝试

$ pocketsphinx_continuous -inmic yes

这时demo工作起来了,实时的检测周围活跃语音,并识别。 

测试:对着mic说'no' 

INFO: continuous.c(275): Ready....
INFO: continuous.c(261): Listening...
INFO: cmn_live.c(120): Update from < 19.86  6.95 -5.77 -2.05 -0.02 -1.30  8.82  7.85  7.79  2.40 -2.33 -0.96  2.92 >
INFO: cmn_live.c(138): Update to   < 20.22  7.71 -4.99 -1.71  0.59 -0.58  8.82  8.31  7.59  2.11 -2.61 -1.33  2.58 >
INFO: ngram_search_fwdtree.c(1550):     1918 words recognized (21/fr)
INFO: ngram_search_fwdtree.c(1552):   353678 senones evaluated (3887/fr)
INFO: ngram_search_fwdtree.c(1556):  2759511 channels searched (30324/fr), 62153 1st, 69480 last
INFO: ngram_search_fwdtree.c(1559):     3984 words for which last channels evaluated (43/fr)
INFO: ngram_search_fwdtree.c(1561):   254404 candidate words for entering last phone (2795/fr)
INFO: ngram_search_fwdtree.c(1564): fwdtree 1.36 CPU 1.498 xRT
INFO: ngram_search_fwdtree.c(1567): fwdtree 3.96 wall 4.355 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 68 words
INFO: ngram_search_fwdflat.c(948):     1297 words recognized (14/fr)
INFO: ngram_search_fwdflat.c(950):    68988 senones evaluated (758/fr)
INFO: ngram_search_fwdflat.c(952):    90656 channels searched (996/fr)
INFO: ngram_search_fwdflat.c(954):     4603 words searched (50/fr)
INFO: ngram_search_fwdflat.c(957):     2897 word transitions (31/fr)
INFO: ngram_search_fwdflat.c(960): fwdflat 0.05 CPU 0.056 xRT
INFO: ngram_search_fwdflat.c(963): fwdflat 0.05 wall 0.056 xRT
INFO: ngram_search.c(1250): lattice start node .0 end node .50
INFO: ngram_search.c(1276): Eliminated 1 nodes before end node
INFO: ngram_search.c(1381): Lattice has 299 nodes, 2168 links
INFO: ps_lattice.c(1380): Bestpath score: -2024
INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:50:89) = -151318
INFO: ps_lattice.c(1441): Joint P(O,S) = -185659 P(S|O) = -34341
INFO: ngram_search.c(872): bestpath 0.00 CPU 0.005 xRT
INFO: ngram_search.c(875): bestpath 0.00 wall 0.005 xRT
no

参考

https://cmusphinx.github.io/

https://cmusphinx.github.io/wiki/

Building an application with PocketSphinx(https://cmusphinx.github.io/wiki/tutorialpocketsphinx/)

 

你可能感兴趣的:(音频开发,机器学习)