有关同态、倒谱、基音周期等概念,可参考一篇本科毕业论文,链接:link
x ^ ( n ) \hat{x}(n) x^(n):复倒谱
c ( n ) c(n) c(n):倒谱
function [pitchf] = getPitch(audio,Fs,time,overlap)
% 获取一段音频的 基音频率 fp(Hz) = fs/Np
% 步骤:分帧、hamming窗、倒谱c(n)、求Np
% Np是倒谱上最大峰值和次峰值之间的采样点数
len = length(audio);
N = time*Fs; %每帧N个采样点
mixN = floor(overlap*N); %帧叠的点数
frames = floor(len/(N-mixN)); %总帧数
LNp = floor(Fs/1000); %基音周期的范围取50~1000Hz(1ms-20ms)
HNp = floor(Fs/50); %基音周期最多可能有多少个采样点数
pitchf = zeros([1 (frames-1)]);
start = 1;
for m=1:(frames-1) %对每一帧求基音周期
tail = start+N-1; %每帧帧尾
if tail>len %数组越界
tail = len;
end
tmp = audio(start:tail,1); %取出一帧
x = tmp.*hamming(length(tmp));
lgS = log(abs(fft(x))); %傅里叶变换后取模,再取对数
cn = ifft(lgS); %得到x(n)的倒谱c(n)
lenc = ceil(length(cn)/2); %圆周共轭,为减少运算取一半
if HNp > lenc %考虑到数组越界
HNp = lenc;
end
c = cn(LNp:HNp); %在合适范围内搜索Np
[maxcn,idx] = max(c); %搜索出max
if maxcn>0.08 %门限设置为0.08
pitchN = LNp+idx;
pitchf(m) = Fs/pitchN;
formantcn = cn(1:pitchN);
end
%%求第50帧的浊音共振峰
if m==50
formant = Formant(formantcn,pitchN);
xi = (1:floor(pitchN/2))*Fs/pitchN; %实偶对实偶,取一半即可
plot(xi,formant(1:floor(pitchN/2)));
xlabel('频率(Hz)');ylabel('平滑对数幅度');
t = title(["第50帧 频率-共振峰"]);
t.FontSize = 16;
end
start = start + (N-mixN)-1;%下一帧帧首
end
end
function [formant] = Formant(formantcn,pitchN)
%求某一帧倒谱的共振峰
%formantcn是某一帧音频的倒谱
%pitchN 是这一帧的基音周期(单位:采样点数)
%求共振峰流程:formantcn-加窗-FT-取实部-取对数-中值滤波
%完成上述过程后,峰值对应的频率就是共振峰频率
%做法参考:https://www.docin.com/p-715554902.html
xn = formantcn.*hamming(pitchN); %加窗
tmp = 20*log(abs(fft(xn))); %FT-取实部-取对数
formant = zeros([1 pitchN]);
formant(1:2) = tmp(1:2);
for i = 3:(pitchN-2) %以下为中值滤波
md1 = median(tmp(i-2:i));
md2 = median(tmp(i-1:i+1));
md3 = median(tmp(i:i+2));
formant(i) = md1*0.25 + md2*0.5 + md3*0.25;
end
end
[audio,Fs] = audioread("summer.wav");
plot(audio);
xlabel('采样点数(n)');ylabel('音频采样值');
titlename = "夏天,你好";
t = title([titlename]);
t.FontSize = 16;
pitchf = getPitch(audio,Fs,0.03,0.6);
stem(pitchf,'.');
xlabel('帧数(n)');ylabel('基音频率(Hz)');
t = title(["基音周期"]);
t.FontSize = 16;
共振峰效果并不是很好,也可能和音频本身相关,不过还是只能说是将个就写了点实现