使用openSMILE提取MFCC简易教程(Mac)

openSMILE是一款专门为提取音频特征设计的软件,介绍和安装方法网上已经有很多,这里不再赘述,我摸索openSMILE的使用方法的时候发现网上关于这个软件的教程很少,所以将自己使用的经验写出来放到这个博客上来,希望有人使用这个软件的时候不要再绕那么多弯路。

我安装软件的时候跟visual studio不停地冲突,所以我尝试了一下安装到mac系统上,并且使用shell编写程序脚本,进行特征的提取。

在使用openSMILE的时候,决定了你提取的特征的参数都存储在你所使用的.config文件中,包括了frame size, mfcc的系数的数量,是否包含delta等,小白完全可以在官方提供的配置文件上进行修改,提取自己模型所需的特征值。

在openSMILE中,默认提取后存储的文件格式为compatibility HTK format,在python中我没有找到很合适的function读取这种文件类型,最终我使用的是MATLAB的voice tool 包中的readhtk() 函数。具体的模型建立方法我将在下一篇博客中说明。

现在我们先来看一下官方提供的配置文件MFCC12_0_D_A.conf


///////////////////////////////
///// > openSMILE configuration file to extract MFCC features <  //////
/////   HTK target kind: MFCC_0_D_A, numCeps=12                  //////////
/////                                                            //////
///// (c) 2013-2016 audEERING.                                   //////////
/////     All rights reserverd. See file COPYING for details.    //////
///////////////////////////////



///////////////////////////////
;
; This section is always required in openSMILE configuration files
;   it configures the componentManager and gives a list of all components which are to be loaded
; The order in which the components are listed should match 
;   the order of the data flow for most efficient processing
;
///////////////////////////////
[componentInstances:cComponentManager]
instance[dataMemory].type=cDataMemory

\{shared/standard_wave_input.conf.inc}

[componentInstances:cComponentManager]
 ; audio framer
instance[frame].type=cFramer
 ; speech pre-emphasis (on a per frame basis as HTK does it)
instance[pe].type=cVectorPreemphasis
 ; apply a window function to pre-emphasised frames
instance[win].type=cWindower
 ; transform to the frequency domain using FFT
instance[fft].type=cTransformFFT
 ; compute magnitude of the complex fft from the previous component
instance[fftmag].type=cFFTmagphase
 ; compute Mel-bands from magnitude spectrum
instance[melspec].type=cMelspec
 ; compute MFCC from Mel-band spectrum
instance[mfcc].type=cMfcc
 ; compute delta coefficients from mfcc and energy
instance[delta].type=cDeltaRegression
 ; compute acceleration coefficients from delta coefficients of mfcc and energy
instance[accel].type=cDeltaRegression

; run single threaded (nThreads=1)
; NOTE: a single thread is more efficient for processing small files, since multi-threaded processing involves more 
;       overhead during startup, which will make the system slower in the end
nThreads=1
; do not show any internal dataMemory level settings 
; (if you want to see them set the value to 1, 2, 3, or 4, depending on the amount of detail you wish)
printLevelStats=0


/////////////////////////////////
/////////   component configuration  ////////////
/////////////////////////////////
; the following sections configure the components listed above
; a help on configuration parameters can be obtained with 
;  SMILExtract -H
; or
;  SMILExtract -H configTypeName (= componentTypeName)
/////////////////////////////////

[frame:cFramer]
reader.dmLevel=wave
writer.dmLevel=frames
noPostEOIprocessing = 1
copyInputName = 1
frameSize = 0.0250
frameStep = 0.010
frameMode = fixed
frameCenterSpecial = left

[pe:cVectorPreemphasis]
reader.dmLevel=frames
writer.dmLevel=framespe
k = 0.97
de = 0

[win:cWindower]
reader.dmLevel=framespe
writer.dmLevel=winframes
copyInputName = 1
processArrayFields = 1
 ; hamming window
winFunc = ham
 ; no gain, no offset
gain = 1.0
offset = 0

[fft:cTransformFFT]
reader.dmLevel=winframes
writer.dmLevel=fft
copyInputName = 1
processArrayFields = 1
inverse = 0
 ; for compatibility with 2.2.0 and older versions
zeroPadSymmetric = 0

[fftmag:cFFTmagphase]
reader.dmLevel=fft
writer.dmLevel=fftmag
copyInputName = 1
processArrayFields = 1
inverse = 0
magnitude = 1
phase = 0

[melspec:cMelspec]
reader.dmLevel=fftmag
writer.dmLevel=melspec
copyInputName = 1
processArrayFields = 1
; htk compatible sample value scaling
htkcompatible = 1
nBands = 26
; use power spectrum instead of magnitude spectrum
usePower = 1
lofreq = 0
hifreq = 8000
specScale = mel
inverse = 0

[mfcc:cMfcc]
reader.dmLevel=melspec
writer.dmLevel=ft0
copyInputName = 1
processArrayFields = 1
firstMfcc = 0
lastMfcc  = 12
cepLifter = 22.0
htkcompatible = 1


[delta:cDeltaRegression]
reader.dmLevel=ft0
writer.dmLevel=ft0de
nameAppend = de
copyInputName = 1
noPostEOIprocessing = 0
deltawin=2
blocksize=1

[accel:cDeltaRegression]
reader.dmLevel=ft0de
writer.dmLevel=ft0dede
nameAppend = de
copyInputName = 1
noPostEOIprocessing = 0
deltawin=2
blocksize=1

  //////////////////////////
 ///////  data output configuration  //////
//////////////////////////

[componentInstances:cComponentManager]
instance[audspec_lldconcat].type=cVectorConcat

[audspec_lldconcat:cVectorConcat]
reader.dmLevel = ft0;ft0de;ft0dede
writer.dmLevel = lld
includeSingleElementFields = 1

\{shared/standard_data_output_lldonly.conf.inc}

//---------------------- END -------------------------///

整个文件被大致分为了三个部分:

  • 第一部分是参数介绍,介绍了这个文件中涉及到的MFCC的参数以及它们在文件中的命名是什么;
  • 第二部分是参数的数值设置;
  • 第三部分是结果输出的设置,一般来说可以保持这部分不变。

在设置好所需的配置文件之后,提取过程就非常简单了。

首先我们打开一个空白文档,需要设定输入目录、输出目录以及openSMILE所在目录。在这个范例中,我们输入目录下的所有文件都是.wav 格式,所以减少了一步验证文件格式的操作。然后转至openSMILE所在目录,使用循环将输入目录下的文件全部提取出来并存放至输出目录,存入的文件名格式为 xx.mfcc.htk 。点击保存,存储类型为.sh ,使用时,直接拖入terminal终端即可运行。

示例代码如下。

#!/bin/bash
#vi .bash_profile
PATH=$PATH:$HOME/bin

dir=/Users/lemon/Documents/wan/test
OPATH=/Users/lemon/Documents/wanzhi/test_mfcc

os=/Users/wan/Downloads/opensmile-2.3.0

cd $os

for wav in $(ls $dir); do
    SMILExtract -C config/MFCC12_0_D_A.conf -I $dir/$wav -O $OPATH/$wav.mfcc.htk
    echo "$wav is extracted"
done

echo "work finished!"

至此,我们去查看输出路径,会发现提取好的mfcc特征都以.htk 的格式存放好了。

下一篇将解释如何把这些数据导入到MATLAB中进行分类。

你可能感兴趣的:(使用openSMILE提取MFCC简易教程(Mac))