前言
一、录制个人的训练语音及测试语音
二、识别语音
1.提取特征参数mfcc
2.使用动态时间规整(DTW)算法计算语音模板之间的距离
总结
代码如下:
fs = 16000; %采样频率
duration = 2; %时间长度
n = duration*fs;
t = (1:n)/fs;
recObj = audiorecorder(fs,16,1);
recordblocking(recObj, duration);
y = getaudiodata(recObj);
ymax = max(abs(y) ); %归一化
y = y/ymax;
audiowrite('x.wav',y,fs);
录制输入语音
代码如下:
n = duration*fs;
t=(1:n)/fs;
recObj = audiorecorder(fs,16,1);
recordblocking(recObj,duration);
y = getaudiodata(recObj);
ymax = max(abs(y)); %归一化
y = y/ymax;
audiowrite('x1.wav',y,fs);%保存语音文件
代码如下(示例):
bank = melbankm(24,256,16000,0,0.5,'m') ;
bank = full(bank) ;
bank = bank/max( bank(:));
for k = 1:12
n = 0:23;
dctcoef(k,:) = cos((2*n+1)*k*pi/(2*24));
end
%归一化倒谱提升窗口
w = 1+6*sin(pi*[1:12]./12);
w = w/max( w);
%预加重滤波器
xx = double(x);
xx = filter([1 -0.9375],1,xx);
%语音信号分帧
xx = enframe(xx,frameSize,inc);
n2 = fix(frameSize/2)+1;
%计算每帧的 MFCC 参数
for i=1:size(xx,1)
y=xx(i,:);
s=y'.*hamming(512);
t=abs(fft(s));
t=t.^2;
c1=dctcoef*log(bank*t(1:129));
c2=c1.*w';
m(i,:)=c2';
end
%差分系数
dtm = zeros(size(m));
for i = 3:size(m,1)-2
dtm(i,:) = -2*m(i-2,:)-m(i-1,:)+m(i+1,:)+2*m(i+2,:);
end
dtm = dtm/3;
代码如下(示例):
for j=1:9
fname=sprintf('%d.wav',j);
x=fname;
[x,fs]=audioread(x);
m = mfcc(x,fs);
ref(j).mfcc = m;
end
i=1;
fname= sprintf('13.wav');
x=fname;
[x,fs]=audioread(x);
m = mfcc(x,fs);
test(i).mfcc =m;
dist = zeros(10,10);
for i=1
for j=1:9
dist(i,j) = dtw(test(i).mfcc,ref(j).mfcc);
end
end
[d,i] = min([dist(i,1),dist(i,2),dist(i,3),dist(i,4),dist(i,5),
dist(i,6),dist(i,7),dist(i,8),dist(i,9)]);
还在学习如何训练更大规模的语音识别中...........