melbankm、mfcc_m、melcepst、cepstralFeatureExtractor、mfcc、HelperComputePitchAndMFCC、 melSpectrogram
function [x,mc,mn,mx]=melbankm(p,n,fs,fl,fh,w)
%MELBANKM determine matrix for a mel/erb/bark-spaced filterbank [X,MN,MX]=(P,N,FS,FL,FH,W)
%
% Inputs:
% p number of filters in filterbank or the filter spacing in k-mel/bark/erb [ceil(4.6*log10(fs))]
% n length of fft
% fs sample rate in Hz
% fl low end of the lowest filter as a fraction of fs [default = 0]
% fh high end of highest filter as a fraction of fs [default = 0.5]
% w any sensible combination of the following:
% 可取代Mel频率的选项:
% 'b' = bark scale instead of mel
% 'e' = erb-rate scale
% 'l' = log10 Hz frequency scale
% 'f' = linear frequency scale
%
% 'c' = fl/fh specify centre of low and high filters
% 'h' = fl/fh are in Hz instead of fractions of fs
% 'H' = fl/fh are in mel/erb/bark/log10
%
% 滤波器形状:
% 't' = triangular shaped filters in mel/erb/bark domain (default)
% 'n' = hanning shaped filters in mel/erb/bark domain
% 'm' = hamming shaped filters in mel/erb/bark domain
%
% 'z' = highest and lowest filters taper down to zero [default]
% 'y' = lowest filter remains at 1 down to 0 frequency and
% highest filter remains at 1 up to nyquist freqency
%
% 'u' = scale filters to sum to unity
%
% 's' = single-sided: do not double filters to account for negative frequencies
%
% 输出滤波器组的响应曲线:
% 'g' = plot idealized filters [default if no output arguments present]
%
% Note that the filter shape (triangular, hamming etc) is defined in the mel (or erb etc) domain.
% Some people instead define an asymmetric triangular filter in the frequency domain.
%
% If 'ty' or 'ny' is specified, the total power in the fft is preserved.
%
% Outputs: x a sparse matrix containing the filterbank amplitudes
% If the mn and mx outputs are given then size(x)=[p,mx-mn+1]
% otherwise size(x)=[p,1+floor(n/2)]
% Note that the peak filter values equal 2 to account for the power
% in the negative FFT frequencies.
% mc the filterbank centre frequencies in mel/erb/bark滤波器中心频率
% mn the lowest fft bin with a non-zero coefficient
% mx the highest fft bin with a non-zero coefficient
% Note: you must specify both or neither of mn and mx.mn与mx必须同时指定或者不指定
%
% =============================!用法举例(MFCC流程)==============================
%
% (a) Calcuate the Mel-frequency Cepstral Coefficients
%
% f=rfft(s); % rfft() returns only 1+floor(n/2) coefficients去除虚数部分
% x=melbankm(p,n,fs); % n is the fft length, p is the number of filters wanted
% z=log(x*abs(f).^2); % multiply x by the power spectrum
% c=dct(z); % take the DCT
%
% (b) Calcuate the Mel-frequency Cepstral Coefficients efficiently
%
% f=fft(s); % n is the fft length, p is the number of filters wanted
% [x,mc,na,nb]=melbankm(p,n,fs); % na:nb gives the fft bins that are needed
% z=log(x*(f(na:nb)).*conj(f(na:nb)));
%
% (c) Plot the calculated filterbanks
%
% plot((0:floor(n/2))*fs/n,melbankm(p,n,fs)') % fs=sample frequency
%
% (d) Plot the idealized filterbanks (without output sampling)
%
% melbankm(p,n,fs);
该函数只是设计滤波器组,属于MFCC处理的一部分。
bank=melbankm(p,frameSize,fs,0,0.5,'m');
% 归一化Mel滤波器组系数
bank=full(bank);
bank=bank/max(bank( : ));
% 归一化倒谱提升窗口:对MFCC系数中某些谱线进行增强
w = 1 + 6 * sin(pi * [1:p2] ./ p2);
w = w/max(w);
需要修正的地方:
只有一阶差分系数;
滤波器选择后并不能只截取想要的部分;
归一化Mel滤波器组系数、归一化倒谱提升窗口有待考证。
function [c,tc]=melcepst(s,fs,w,nc,p,n,inc,fl,fh)
%MELCEPST Calculate the mel cepstrum of a signal C=(S,FS,W,NC,P,N,INC,FL,FH)
%
%
% Simple use: (1) c=melcepst(s,fs) % calculate mel cepstrum with 12 coefs, 256 sample frames
% (2) c=melcepst(s,fs,'E0dD') % include log energy, 0th cepstral coef, delta and delta-delta coefs
%
% Inputs:
% s speech signal
% fs sample rate in Hz (default 11025)
% w mode string (see below)
% nc number of cepstral coefficients excluding 0'th coefficient [default 12] MFCC维数设定
% p number of filters in filterbank [default: floor(3*log(fs)) = approx 2.1 per ocatave] 滤波器数量
% n length of frame in samples [default power of 2 < (0.03*fs)] 帧长
% inc frame increment [default n/2] 帧移
% fl low end of the lowest filter as a fraction of fs [default = 0] 滤波器最低频率
% fh high end of highest filter as a fraction of fs [default = 0.5] 滤波器最高频率,通过fs归一化
%
% w any sensible combination of the following:
% 时域窗函数:
% 'R' rectangular window in time domain
% 'N' Hanning window in time domain
% 'M' Hamming window in time domain (default)
%
% 频域窗函数:
% 't' triangular shaped filters in mel domain (default)
% 'n' hanning shaped filters in mel domain
% 'm' hamming shaped filters in mel domain
%
%
% 'p' filters act in the power domain
% 'a' filters act in the absolute magnitude domain (default)
%
% MFCC除12维基本参数之外的选择:
% '0' include 0'th order cepstral coefficient
% 'E' include log energy
% 'd' include delta coefficients (dc/dt)
% 'D' include delta-delta coefficients (d^2c/dt^2)
%
% 滤波器频率设置:
% 'z' highest and lowest filters taper down to zero (default)
% 'y' lowest filter remains at 1 down to 0 frequency and
% highest filter remains at 1 up to nyquist freqency
%
% If 'ty' or 'ny' is specified, the total power in the fft is preserved.
%
% Outputs: c mel cepstrum output: one frame per row. Log energy, if requested, is the
% first element of each row followed by the delta and then the delta-delta
% coefficients.
% tc fractional time in samples at the centre of each frame
% with the first sample being 1.
% ==================================设置默认参数=================================
if nargin<2 fs=11025; end% 滤波器的最高频率
if nargin<3 w='M'; end% hamming窗
if nargin<4 nc=12; end% MFCC维数
if nargin<5 p=floor(3*log(fs)); end% p个滤波器
if nargin<6 n=pow2(floor(log2(0.03*fs))); end% n是一帧FFT后数据的长度
if nargin<9
fh=0.5;% 滤波器的最高频率,用fs归一化
if nargin<8
fl=0;% 设计滤波器的最低频率
if nargin<7
inc=floor(n/2);
end
end
end
if isempty(w)
w='M';
end
if any(w=='R')
[z,tc]=enframe(s,n,inc);
elseif any (w=='N')
[z,tc]=enframe(s,hanning(n),inc);
else
[z,tc]=enframe(s,hamming(n),inc);
end
% =================================!理论核心部分=================================
f=rfft(z.');
[m,a,b]=melbankm(p,n,fs,fl,fh,w);% m为滤波器的频域响应
pw=f(a:b,:).*conj(f(a:b,:));% 计算帧能量
pth=max(pw(:))*1E-20;
if any(w=='p')
y=log(max(m*pw,pth));
else
ath=sqrt(pth);
y=log(max(m*abs(f(a:b,:)),ath));
end
c=rdct(y).';% 得到13维系数
nf=size(c,1);
nc=nc+1;
if p>nc
c(:,nc+1:end)=[];% 当滤波器个数比所需维数多的时候,就将后面滤波器获得的参数删去
elseif p<nc
c=[c zeros(nf,nc-p)];% 滤波器个数少的时候,用0补齐
end
if ~any(w=='0')
c(:,1)=[];
nc=nc-1;
end
if any(w=='E')
c=[log(max(sum(pw),pth)).' c];
nc=nc+1;
end
% ===============================计算一阶和二阶差分==============================
if any(w=='D')
vf=(4:-1:-4)/60;
af=(1:-1:-1)/2;
ww=ones(5,1);
cx=[c(ww,:); c; c(nf*ww,:)];
vx=reshape(filter(vf,1,cx(:)),nf+10,nc);
vx(1:8,:)=[];
ax=reshape(filter(af,1,vx(:)),nf+2,nc);
ax(1:2,:)=[];
vx([1 nf+2],:)=[];
if any(w=='d')
c=[c vx ax];
else
c=[c ax];
end
elseif any(w=='d')
vf=(4:-1:-4)/60;
ww=ones(4,1);
cx=[c(ww,:); c; c(nf*ww,:)];
vx=reshape(filter(vf,1,cx(:)),nf+8,nc);
vx(1:8,:)=[];
c=[c vx];
end
% =======================如果不输出任何参数,就会输出语谱图==========================
if nargout<1
[nf,nc]=size(c);
% t=((0:nf-1)*inc+(n-1)/2)/fs;
ci=(1:nc)-any(w=='0')-any(w=='E');
imh = imagesc(tc/fs,ci,c.');
axis('xy');
xlabel('Time (s)');
ylabel('Mel-cepstrum coefficient');
map = (0:63)'/63;
colormap([map map map]);
colorbar;
end
if ~any(w=='0')
c(:,1)=[];
nc=nc-1;
end
如果不需要’0’阶系数,就将第一列删除,并得到13-1=12维数据,说明DCT后得到的是13维数据,默认将第一个元素,即0阶倒谱系数删去。第一维比后12维都大很多(直流项?)。
test = 'D:\DataBase\TIMIT\TRAIN\DR2\MARC0\SX108.WAV';
[x, fs] = audioread(test);
n=pow2(floor(log2(0.03*fs)));
inc=floor(n/2);
f = enframe(x,hamming(n),inc);
cepFeatures = cepstralFeatureExtractor('SampleRate',fs,'LogEnergy','Replace');
[coeffs, delta, deltaDelta]= cepFeatures(f(1,:)');
参数设置中有FilterBankNormalization,选项为:Area,Bandwidth(默认),None,用于滤波器组的权重分配。
cepstralFeatureExtractor类的部分代码:
classdef (StrictDefaults)cepstralFeatureExtractor < dsp.private.SampleRateEngine
%cepstralFeatureExtractor Cepstral Feature Extractor
% cepFeatures = cepstralFeatureExtractor returns a System object,
% cepFeatures, that calculates cepstral features. Columns of the input
% are treated as individual channels.
%
% cepFeatures = cepstralFeatureExtractor('Name',Value, ...) returns a
% cepstralFeatureExtractor System object, cepFeatures, with each
% specified property name set to the specified value. You can specify
% additional name-value pair arguments in any order as
% (Name1,Value1,...NameN,ValueN).
%
% step method syntax内置的step()函数:
%
% [COEFFS,DELTA,DELTADELTA] = step(cepFeatures,X) returns the cepstral
% coefficients, the delta, and the delta-delta. The log energy is also
% returned in the COEFFS output based on the LogEnergy property. The
% DELTA and DELTADELTA are initialized as zero-vectors. X must be a
% real-valued, double-precision or single-precision matrix. Each column
% of X is treated as an independent channel.
%
% System objects may be called directly like a function instead of using
% the step method. For example, y = step(obj,x) and y = obj(x) are
% equivalent.
% 对象可以直接作为函数使用,所以step()与obj()功能一致
%
% cepstralFeatureExtractor methods:
% step - See above description for use of this method
% release - Allow property values and input characteristics to change
% clone - Create cepstralFeatureExtractor object with same property
% values
% isLocked - Locked status (logical)
% <a href="matlab:help matlab.System/reset ">reset</a> - Reset the internal states to initial conditions
% getFilters - Get filterbank used to calculate the cepstral
% coefficients
%
% cepstralFeatureExtractor properties:
% FilterBank - Filter bank ('Mel'/'Gammatone')
% InputDomain - Domain of input signal
% NumCoeffs - Number of coefficients to return
% FFTLength - FFT length
% LogEnergy - Log energy usage ('Append'/'Replace'/'Ignore')
% SampleRate - Sample rate (Hz)
%
% Advanced properties:
% BandEdges - Band edges of mel filter bank (Hz)
% FilterBankNormalization - Normalize filter bank
% FilterBankDesignDomain - Domain for mel filter bank design
% FrequencyRange - Gammatone filter bank frequency range
%#codegen
properties
%SampleRate Input sample rate (Hz)
% Specify the sampling rate of the input in Hertz as a real, finite
% numeric scalar. The default is 16000 Hz. This property is
% tunable.
SampleRate = 16000;
end
properties (Constant, Hidden)
% SampleRateSet is used to setup the choices for SampleRate
SampleRateSet = matlab.system.SourceSet({'PropertyOrMethod', ...
'SystemBlock', 'InheritSampleRate', 'getInheritedSampleRate',true});
end
properties (Nontunable)
%BandEdges Band edges of Mel filter bank (Hz)
% Specify the band edges of the mel filter bank as a monotonically
% increasing vector in the range [0,fs/2]. The number of band edges
% must be in the range [4,160]. The default band edges are spaced
% linearly for the first ten and then logarithmically thereafter.
% This property applies when FilterBank is 'Mel'.
% 只有是Mel的时候,BandEdges属性才有用
BandEdges = cepstralFeatureExtractor.getDefaultBandEdges();
%FFTLength FFT length 默认FFT长度是输入的行数,所以做好分帧!
FFTLength = [];
%NumCoeffs Number of coefficients to return 默认MFCC维数13
NumCoeffs = 13;
%InputDomain Domain of the input signal 默认输入数据是时域的
InputDomain = 'Time';
%FilterBankNormalization Filter bank normalization 默认以带宽设置滤波器权重
FilterBankNormalization = 'Bandwidth';
%LogEnergy Log energy usage 默认log能量参数是有的
LogEnergy = 'Append';
end
---------------------------------------------------------略-----------------------------------------------------------
end
[audioIn,fs] = audioread('Counting-16-44p1-mono-15secs.wav');
[coeffs,delta,deltaDelta,loc] = mfcc(audioIn,fs);
function varargout = mfcc(x, fs, varargin)
%MFCC Extract the mfcc, log-energy, delta, and delta-delta of audio signal
% coeffs = MFCC(audioIn,fs) returns the mel-frequency cepstral
% coefficients over time for the audio input. Columns of the input are
% treated as individual channels. coeffs is returned as an L-by-M-by-N
% array.
% L - Number of frames the audio signal is partitioned into.
% This is determined by the WINDOWLENGTH and OVERLAPLENGTH
% properties.
% M - Number coefficients returned per frame.
% This is determined by the NUMCOEFFS property.
% N - Number of channels.
%
% 'WindowLength' defaults to round(0.030 * fs).
% 'OverlapLength' defaults to round(fs*0.02).
% 'NumCoeffs' If not specified, the number of coefficients is 13.
% 'FFTLength' By default, the FFT length is set to the WINDOWLENGTH.
% 'DeltaWindowLength' The default is 2.
% coeffs = MFCC(...,'LogEnergy',LOGENERGY) specifies if and how the log
% energy is used. Specify log energy as a character vector:
% 'Append' - Adds the log-energy as the first element of the
% returned coefficients vector. This is the default
% setting.
% 'Replace' - Replaces the zeroth coefficient (first element of
% coeffs) with the log-energy.
% 'Ignore' - Ignores and does not return the log-energy.
% =========================验证输入数据的格式=============================
validateRequiredInputs(x, fs)
params = audio.internal.MFCCValidator(fs,size(x,1),varargin{:});% 输入默认的参数
hopLength = params.WindowLength - params.OverlapLength;% 帧移
% ==========================创建mfcc提取object============================
mfccObject = cepstralFeatureExtractor( ...
'SampleRate', fs, ...
'FFTLength', params.FFTLength, ...
'NumCoeffs', params.NumCoeffs, ...
'LogEnergy', params.LogEnergy);
% ====================验证所需要的mfcc维数比滤波器个数少===================
numValidBands = sum(mfccObject.BandEdges <= floor(fs/2)) - 2;
coder.internal.errorIf(numValidBands < params.NumCoeffs, ...
'audio:mfcc:BadNumCoeffs', ...
numValidBands,fs);
% ==========================mfcc参数获取=================================
[nRow,nChan] = size(x);% 一般都是单通道,audiorea读取到的是一列数据
N = params.WindowLength;
numHops = floor((nRow-N)/hopLength) + 1;
y = audio.internal.buffer(x,N,hopLength);
c = mfccObject(y);% mfccObject是cepstralFeatureExtractor类,所以,与cepstralFeatureExtractor求解方法一样
c2 = reshape(c , size(c,1) , size(c,2)/nChan , nChan );
coeffs = permute(c2 , [2 1 3]);% 将第1维与第2维转置,因为cepstralFeatureExtractor得到的特征是列排的
varargout{1} = coeffs;
%=========================一阶差分====================================
if nargout > 1
delta = audio.internal.computeDelta(coeffs,params.DeltaWindowLength);
varargout{2} = delta;
end
% ============================二阶差分=================================
if nargout > 2
deltaDelta = audio.internal.computeDelta(delta,params.DeltaWindowLength);
varargout{3} = deltaDelta;
end
% -------------------------------------------------------------------------
% Output sample stamp -----------------------------------------------------
if nargout > 3
varargout{4} = ...
cast(((0:(numHops-1))*hopLength + params.WindowLength)','like',x);
end
end
% -------------------------------------------------------------------------
% Validate required inputs
% -------------------------------------------------------------------------
function validateRequiredInputs(x,fs)
validateattributes(x,{'single','double'},...
{'nonempty','2d','real'}, ...
'mfcc','audioIn')
validateattributes(fs,{'single','double'}, ...
{'nonempty','positive','real','scalar','nonnan','finite'}, ...
'mfcc','fs');
end
默认有40个滤波器,得到14维参数,相当于melcepst中的’E0’,只是melcepst的最低频率从0Hz开始;delta与deltaDelta的第一行都是0;loc是每一帧的开始位置。
如何使delta与deltaDelta的首行不为0?设置DeltaWindowLength
参数即可。
从滤波器组设置可以看出,每个滤波器的起点是上个滤波器带宽的中点。
melSpectrogram
output的第一维是Number of bandpass filters in filterbank,默认为32个滤波器;第二维是Number of frames in spectrogram,即帧数。
它不可以计算差分,只是spectrogram的一个小分支,若取40个滤波器,得到的结果与mfcc相近,只是需要转置一下。
实现方式 | MFCC | 频谱图 |
---|---|---|
mfcc | ||
cepstralFeatureExtractor | ||
melcepst | ||
melSpectrogram |
可见,cepstralFeatureExtractor与mfcc所用算法基本一致,只是cepstralFeatureExtractor分帧求取,melcepst与它们的第2维数据有数量级的差异,暂时认为是滤波器归一化的原因。在mfcc中,log能量是作为额外系数默认附加的,通常Matlab会提供最好的性能,所以暂时按默认选项进行。melSpectrogram默认32个滤波器,mfcc默认40个滤波器,且melSpectrogram不能计算差分,所以mfcc总的来说,更合适作为以后的计算使用。
https://www.jianshu.com/p/1c2742096382