调用opensmile编译的DLL动态库API进行声音特征提取

目录

一、opensmlie :

二、SMILEapi调用方式。

1、创建实例   

2、初始化。

3、设置回调

4、写音频数据。

5、配置文件改动

三、详细代码:


最近尝试使用opensmile进行声音特征提取,查了一些资料和文档,记录在此。

一、opensmlie :

opensmile: 官网

github项目:https://github.com/audeering/opensmile

文档: openSMILE — openSMILE Documentation

下载后编译,windows下使用cmake生成项目然后编译。编译后安装

调用opensmile编译的DLL动态库API进行声音特征提取_第1张图片

 include文件夹内为头文件。 lib文件夹内SMILEapi.lib为静态链接库,bin目录SMILEapi.dll为动态动态库。

二、SMILEapi调用方式。

打开头文件SMILEapi.h,里面的接口并不算多。因为opensmile提取特征的细节都是在配置文件内指定。

调用步骤   

1、创建实例   

smileobj_t* m_pSmileObj = smile_new();

2、初始化。

初始化主要就是加载配置文件,

std::string configfile = "./config/MFCC12_E_D_A_.conf";

smileres_t ret = smile_initialize(m_pSmileObj, configfile.c_str(), 0, nullptr);
if (ret == SMILE_SUCCESS) {
	std::cout << "smile init succeed" << std::endl;
}
else {
	std::cout << "smile_init failed" << ret << std::endl;
}

初始化中configfile只向的配置文件需要进行一些修改,才能理顺调用过程。

3、设置回调

bool external_sink_callback(const float* data, long vectorSize, void* param)
{
	std::cout << "vectorSize: "<

 回到中两个参数比较重要,一个是回调函数,一个是回调组件名称,上述中的"externalSink"。

设置回到后,配置文件生成的工作流程走到"externalSink"时就会将数据传送给回调函数。

4、写音频数据。

int ret = smile_extaudiosource_write_data(m_pSmileObj, "externalSource", SrcData, length);

 这里同样重要的是 写入组件名称,上述中的"externalSource",需要在配置文件中创建此组件。

5、配置文件改动

配置文件在config/mfcc/MFCC12_E_D_A.conf基础上修改而成。

a) 在组件实例管理部分添加 外部源组件 cExternalAudioSource

 [componentInstances:cComponentManager]
instance[externalSource].type=cExternalAudioSource

b) 在组件配置部分添加 cExternalAudioSource 配置

/
/   component configuration  
/
; the following sections configure the components listed above
; a help on configuration parameters can be obtained with 
;  SMILExtract -H
; or
;  SMILExtract -H configTypeName (= componentTypeName)
/


[externalSource:cExternalAudioSource]
writer.dmLevel=wave
blocksize = 8000
blocksize_sec =0.50
sampleRate = 8000
channels = 1
nBits = 16
nBPS = 0
fieldName = pcm

cExternalAudioSource 配置有哪些成员可以在 官方文档 中查看,可以配置音频波特率、通道、位数等参数。

c)  数据输出组件实例管理部分添加 instance[externalSink].type = cExternalSink

 ///  data output configuration  //
//

[componentInstances:cComponentManager]
instance[audspec_lldconcat].type=cVectorConcat
instance[externalSink].type = cExternalSink

d)  添加组件[externalSink:cExternalSink] 。 组件reader.dmLevel的内容 lld数据将进入回调函数。

[externalSink:cExternalSink]
reader.dmLevel = lld

改动后整体配置文件./config/MFCC12_E_D_A_.conf如下:


///
/ > openSMILE configuration file to extract MFCC features <  //
/   HTK target kind: MFCC_E_D_A, numCeps=12                  //
/                                                            //
/  * written 2009 by Florian Eyben *                         //
/                                                            //
/ (c) audEERING UG (haftungsbeschr�nkt),                     //
/     All rights reserved.                                  //
///



///
;
; This section is always required in openSMILE configuration files
;   it configures the componentManager and gives a list of all components which are to be loaded
; The order in which the components are listed should match 
;   the order of the data flow for most efficient processing
;
///
[componentInstances:cComponentManager]
instance[dataMemory].type=cDataMemory



[componentInstances:cComponentManager]
instance[externalSource].type=cExternalAudioSource
 ; audio framer
instance[frame].type=cFramer
 ; speech pre-emphasis (on a per frame basis as HTK does it)
instance[pe].type=cVectorPreemphasis
 ; apply a window function to pre-emphasised frames
instance[win].type=cWindower
 ; transform to the frequency domain using FFT
instance[fft].type=cTransformFFT
 ; compute magnitude of the complex fft from the previous component
instance[fftmag].type=cFFTmagphase
 ; compute Mel-bands from magnitude spectrum
instance[melspec].type=cMelspec
 ; compute MFCC from Mel-band spectrum
instance[mfcc].type=cMfcc
 ; compute log-energy from raw signal frames 
 ; (not windowed, not pre-emphasised: that's the way HTK does it)
instance[energy].type=cEnergy
 ; concat mfcc and energy, so we can compute delta and acceleration 
 ; coefficients of both features at the same tim
instance[cat].type=cVectorConcat
 ; compute delta coefficients from mfcc and energy
instance[delta].type=cDeltaRegression
 ; compute acceleration coefficients from delta coefficients of mfcc and energy
instance[accel].type=cDeltaRegression

; run single threaded (nThreads=1)
; NOTE: a single thread is more efficient for processing small files, since multi-threaded processing involves more 
;       overhead during startup, which will make the system slower in the end
nThreads=1
; do not show any internal dataMemory level settings 
; (if you want to see them set the value to 1, 2, 3, or 4, depending on the amount of detail you wish)
printLevelStats=3


/
/   component configuration  
/
; the following sections configure the components listed above
; a help on configuration parameters can be obtained with 
;  SMILExtract -H
; or
;  SMILExtract -H configTypeName (= componentTypeName)
/


[externalSource:cExternalAudioSource]
writer.dmLevel=wave
blocksize = 8000
blocksize_sec =0.50
sampleRate = 8000
channels = 1
nBits = 16
nBPS = 0
fieldName = pcm

[frame:cFramer]
reader.dmLevel=wave
writer.dmLevel=frames
noPostEOIprocessing = 1
copyInputName = 1
frameSize = 0.04
frameStep = 0.02
frameMode = fixed
frameCenterSpecial = left

[pe:cVectorPreemphasis]
reader.dmLevel=frames
writer.dmLevel=framespe
k = 0.97
de = 0

[win:cWindower]
reader.dmLevel=framespe
writer.dmLevel=winframes
copyInputName = 1
processArrayFields = 1
 ; hamming window
winFunc = ham
 ; no gain, no offset
gain = 1.0
offset = 0

[fft:cTransformFFT]
reader.dmLevel=winframes
writer.dmLevel=fft
copyInputName = 1
processArrayFields = 1
inverse = 0
 ; for compatibility with 2.2.0 and older versions
zeroPadSymmetric = 0

[fftmag:cFFTmagphase]
reader.dmLevel=fft
writer.dmLevel=fftmag
copyInputName = 1
processArrayFields = 1
inverse = 0
magnitude = 1
phase = 0

[melspec:cMelspec]
reader.dmLevel=fftmag
writer.dmLevel=melspec
copyInputName = 1
processArrayFields = 1
; htk compatible sample value scaling
htkcompatible = 1
nBands = 26
; use power spectrum instead of magnitude spectrum
usePower = 1
lofreq = 0
hifreq = 8000
specScale = mel
inverse = 0

[mfcc:cMfcc]
reader.dmLevel=melspec
writer.dmLevel=mfcc
copyInputName = 1
processArrayFields = 1
firstMfcc = 1
lastMfcc  = 12
cepLifter = 22.0
htkcompatible = 1

[energy:cEnergy]
reader.dmLevel=frames
writer.dmLevel=energy
nameAppend = energy
copyInputName = 1
processArrayFields = 0
htkcompatible=1
rms = 0
log = 1


[cat:cVectorConcat]
reader.dmLevel=mfcc;energy
writer.dmLevel=ft0
copyInputName = 1
processArrayFields = 0

[delta:cDeltaRegression]
reader.dmLevel=ft0
writer.dmLevel=ft0de
nameAppend = de
copyInputName = 1
noPostEOIprocessing = 0
deltawin=2
blocksize=1

[accel:cDeltaRegression]
reader.dmLevel=ft0de
writer.dmLevel=ft0dede
nameAppend = de
copyInputName = 1
noPostEOIprocessing = 0
deltawin=2
blocksize=1

  //
 ///  data output configuration  //
//

[componentInstances:cComponentManager]
instance[audspec_lldconcat].type=cVectorConcat
instance[externalSink].type = cExternalSink

[audspec_lldconcat:cVectorConcat]
reader.dmLevel = ft0;ft0de;ft0dede
writer.dmLevel = lld
includeSingleElementFields = 1

[externalSink:cExternalSink]
reader.dmLevel = lld

三、详细代码:

        代码代码及测试程序已经封装编写为VS2015工程, 代码位置:

opensmileTest.rar-机器学习文档类资源-CSDN下载

你可能感兴趣的:(机器学习,深度学习,c++,语音识别,1024程序员节)