目录
一、opensmlie :
二、SMILEapi调用方式。
1、创建实例
2、初始化。
3、设置回调
4、写音频数据。
5、配置文件改动
三、详细代码:
最近尝试使用opensmile进行声音特征提取,查了一些资料和文档,记录在此。
opensmile: 官网
github项目:https://github.com/audeering/opensmile
文档: openSMILE — openSMILE Documentation
下载后编译,windows下使用cmake生成项目然后编译。编译后安装
include文件夹内为头文件。 lib文件夹内SMILEapi.lib为静态链接库,bin目录SMILEapi.dll为动态动态库。
打开头文件SMILEapi.h,里面的接口并不算多。因为opensmile提取特征的细节都是在配置文件内指定。
调用步骤
smileobj_t* m_pSmileObj = smile_new();
初始化主要就是加载配置文件,
std::string configfile = "./config/MFCC12_E_D_A_.conf";
smileres_t ret = smile_initialize(m_pSmileObj, configfile.c_str(), 0, nullptr);
if (ret == SMILE_SUCCESS) {
std::cout << "smile init succeed" << std::endl;
}
else {
std::cout << "smile_init failed" << ret << std::endl;
}
初始化中configfile只向的配置文件需要进行一些修改,才能理顺调用过程。
bool external_sink_callback(const float* data, long vectorSize, void* param)
{
std::cout << "vectorSize: "<
回到中两个参数比较重要,一个是回调函数,一个是回调组件名称,上述中的"externalSink"。
设置回到后,配置文件生成的工作流程走到"externalSink"时就会将数据传送给回调函数。
int ret = smile_extaudiosource_write_data(m_pSmileObj, "externalSource", SrcData, length);
这里同样重要的是 写入组件名称,上述中的"externalSource",需要在配置文件中创建此组件。
配置文件在config/mfcc/MFCC12_E_D_A.conf基础上修改而成。
a) 在组件实例管理部分添加 外部源组件 cExternalAudioSource
[componentInstances:cComponentManager]
instance[externalSource].type=cExternalAudioSource
b) 在组件配置部分添加 cExternalAudioSource 配置
/
/ component configuration
/
; the following sections configure the components listed above
; a help on configuration parameters can be obtained with
; SMILExtract -H
; or
; SMILExtract -H configTypeName (= componentTypeName)
/
[externalSource:cExternalAudioSource]
writer.dmLevel=wave
blocksize = 8000
blocksize_sec =0.50
sampleRate = 8000
channels = 1
nBits = 16
nBPS = 0
fieldName = pcm
cExternalAudioSource 配置有哪些成员可以在 官方文档 中查看,可以配置音频波特率、通道、位数等参数。
c) 数据输出组件实例管理部分添加 instance[externalSink].type = cExternalSink
/// data output configuration //
//
[componentInstances:cComponentManager]
instance[audspec_lldconcat].type=cVectorConcat
instance[externalSink].type = cExternalSink
d) 添加组件[externalSink:cExternalSink] 。 组件reader.dmLevel的内容 lld数据将进入回调函数。
[externalSink:cExternalSink]
reader.dmLevel = lld
改动后整体配置文件./config/MFCC12_E_D_A_.conf如下:
///
/ > openSMILE configuration file to extract MFCC features < //
/ HTK target kind: MFCC_E_D_A, numCeps=12 //
/ //
/ * written 2009 by Florian Eyben * //
/ //
/ (c) audEERING UG (haftungsbeschr�nkt), //
/ All rights reserved. //
///
///
;
; This section is always required in openSMILE configuration files
; it configures the componentManager and gives a list of all components which are to be loaded
; The order in which the components are listed should match
; the order of the data flow for most efficient processing
;
///
[componentInstances:cComponentManager]
instance[dataMemory].type=cDataMemory
[componentInstances:cComponentManager]
instance[externalSource].type=cExternalAudioSource
; audio framer
instance[frame].type=cFramer
; speech pre-emphasis (on a per frame basis as HTK does it)
instance[pe].type=cVectorPreemphasis
; apply a window function to pre-emphasised frames
instance[win].type=cWindower
; transform to the frequency domain using FFT
instance[fft].type=cTransformFFT
; compute magnitude of the complex fft from the previous component
instance[fftmag].type=cFFTmagphase
; compute Mel-bands from magnitude spectrum
instance[melspec].type=cMelspec
; compute MFCC from Mel-band spectrum
instance[mfcc].type=cMfcc
; compute log-energy from raw signal frames
; (not windowed, not pre-emphasised: that's the way HTK does it)
instance[energy].type=cEnergy
; concat mfcc and energy, so we can compute delta and acceleration
; coefficients of both features at the same tim
instance[cat].type=cVectorConcat
; compute delta coefficients from mfcc and energy
instance[delta].type=cDeltaRegression
; compute acceleration coefficients from delta coefficients of mfcc and energy
instance[accel].type=cDeltaRegression
; run single threaded (nThreads=1)
; NOTE: a single thread is more efficient for processing small files, since multi-threaded processing involves more
; overhead during startup, which will make the system slower in the end
nThreads=1
; do not show any internal dataMemory level settings
; (if you want to see them set the value to 1, 2, 3, or 4, depending on the amount of detail you wish)
printLevelStats=3
/
/ component configuration
/
; the following sections configure the components listed above
; a help on configuration parameters can be obtained with
; SMILExtract -H
; or
; SMILExtract -H configTypeName (= componentTypeName)
/
[externalSource:cExternalAudioSource]
writer.dmLevel=wave
blocksize = 8000
blocksize_sec =0.50
sampleRate = 8000
channels = 1
nBits = 16
nBPS = 0
fieldName = pcm
[frame:cFramer]
reader.dmLevel=wave
writer.dmLevel=frames
noPostEOIprocessing = 1
copyInputName = 1
frameSize = 0.04
frameStep = 0.02
frameMode = fixed
frameCenterSpecial = left
[pe:cVectorPreemphasis]
reader.dmLevel=frames
writer.dmLevel=framespe
k = 0.97
de = 0
[win:cWindower]
reader.dmLevel=framespe
writer.dmLevel=winframes
copyInputName = 1
processArrayFields = 1
; hamming window
winFunc = ham
; no gain, no offset
gain = 1.0
offset = 0
[fft:cTransformFFT]
reader.dmLevel=winframes
writer.dmLevel=fft
copyInputName = 1
processArrayFields = 1
inverse = 0
; for compatibility with 2.2.0 and older versions
zeroPadSymmetric = 0
[fftmag:cFFTmagphase]
reader.dmLevel=fft
writer.dmLevel=fftmag
copyInputName = 1
processArrayFields = 1
inverse = 0
magnitude = 1
phase = 0
[melspec:cMelspec]
reader.dmLevel=fftmag
writer.dmLevel=melspec
copyInputName = 1
processArrayFields = 1
; htk compatible sample value scaling
htkcompatible = 1
nBands = 26
; use power spectrum instead of magnitude spectrum
usePower = 1
lofreq = 0
hifreq = 8000
specScale = mel
inverse = 0
[mfcc:cMfcc]
reader.dmLevel=melspec
writer.dmLevel=mfcc
copyInputName = 1
processArrayFields = 1
firstMfcc = 1
lastMfcc = 12
cepLifter = 22.0
htkcompatible = 1
[energy:cEnergy]
reader.dmLevel=frames
writer.dmLevel=energy
nameAppend = energy
copyInputName = 1
processArrayFields = 0
htkcompatible=1
rms = 0
log = 1
[cat:cVectorConcat]
reader.dmLevel=mfcc;energy
writer.dmLevel=ft0
copyInputName = 1
processArrayFields = 0
[delta:cDeltaRegression]
reader.dmLevel=ft0
writer.dmLevel=ft0de
nameAppend = de
copyInputName = 1
noPostEOIprocessing = 0
deltawin=2
blocksize=1
[accel:cDeltaRegression]
reader.dmLevel=ft0de
writer.dmLevel=ft0dede
nameAppend = de
copyInputName = 1
noPostEOIprocessing = 0
deltawin=2
blocksize=1
//
/// data output configuration //
//
[componentInstances:cComponentManager]
instance[audspec_lldconcat].type=cVectorConcat
instance[externalSink].type = cExternalSink
[audspec_lldconcat:cVectorConcat]
reader.dmLevel = ft0;ft0de;ft0dede
writer.dmLevel = lld
includeSingleElementFields = 1
[externalSink:cExternalSink]
reader.dmLevel = lld
代码代码及测试程序已经封装编写为VS2015工程, 代码位置:
opensmileTest.rar-机器学习文档类资源-CSDN下载