语音识别算法设计-基于MFCC+DTW算法-Matlab+C代码版本

语音识别算法设计-基于MFCC+DTW算法-Matlab+C代码(全定点加速)版本

语音识别算法主要涉及特征提取、统计建模和识别技术等几个关键方面。在此使用MFCC+DTW算法的方式给出语音识别的代码,首先进行简单介绍。
Matlab版本代码地址:https://download.csdn.net/download/weixin_44584198/88347255?spm=1001.2014.3001.5503
C代码(全定点加速)下载地址:https://download.csdn.net/download/weixin_44584198/88347257

1、MFCC+DTW基本理论

MFCC(Mel Frequency Cepstral Coefficients)是一种常用的语音信号特征提取算法,广泛应用于语音识别和语音处理领域。其处理步骤如下所示:

  1. 预处理:对输入的语音信号进行预处理,通常包括去除静音段、语音活动检测、预加重等操作。
  2. 分帧:将预处理后的语音信号切割成短时帧,通常每帧的大小在10-30毫秒之间。相邻帧之间会有一定的重叠,常见的重叠比例为50%。
  3. 加窗:对每个语音帧应用窗函数(如汉宁窗),以减少帧边缘的振荡。
  4. 傅里叶变换:对每个帧应用快速傅里叶变换(FFT),将时域信号转换为频域信号。
  5. 梅尔滤波器组:在频域上,应用一组梅尔滤波器,将频谱图划分成一系列梅尔频带,以模拟人耳的听觉感知特性。
  6. 对数压缩:对每个梅尔滤波器组的输出进行对数压缩,通常使用自然对数,以增强低频能量并减小高频能量。
  7. 离散余弦变换:对对数压缩后的梅尔频谱图应用离散余弦变换(DCT),将频域的能量信息转换为倒谱系数。
  8. 特征提取:选择一部分倒谱系数作为MFCC特征向量,通常选择较低频率的倒谱系数,因为它们包含了较多的语音信息,而高频部分可能包含更多的噪声。

DTW算法的优势在于其对时间轴的灵活性,使其能够处理不同语速。是DTW算法的适应性较好。它不需要事先对语音样本进行训练,因此可以直接应用于新的语音数据,无需耗费大量时间进行模型训练。

语音活动检测(VAD)算法:使用VAD算法对音频信号进行处理,提取出有声音的部分。VAD算法基于一些特征(如短时过零率、短时能量等)和门限,通过判断音频帧的活动和非活动状态,确定语音段的起始和结束点。本文算法基于得到的功率谱数据实现了语音活动检测(原理如下)。
语音识别算法设计-基于MFCC+DTW算法-Matlab+C代码版本_第1张图片

整个系统的识别框图如下所示:
语音识别算法设计-基于MFCC+DTW算法-Matlab+C代码版本_第2张图片
针对一段语音的识别过程如下(其中时间的评估是基于运行速度50MHz的MCU平台-不使用浮点加速):
语音识别算法设计-基于MFCC+DTW算法-Matlab+C代码版本_第3张图片
性能对比:
语音识别算法设计-基于MFCC+DTW算法-Matlab+C代码版本_第4张图片

2、MFCC+DTW的Matlab代码实现

主函数recognition_wxp.m:该函数主要实现音频数据的读取以及算法的运行时间估算。

clear;close all;
clc;
% 实际处理时所使用的采样率为my_fs
my_fs=8000;

% [wav_data,fs]=audioread("merge.wav");
load Voice_data_1
fs=8000;
wav_data=Voice_data_1'/32768;
% 降采样率到8k
wav_data=wav_data(1:fs/my_fs:end);
wav_data=wav_data';
% 补0到51200位数
MaxLengthData=51200;
wav_data=[wav_data zeros(1,MaxLengthData-length(wav_data))];
% 音频数据绘图
plot(wav_data)
% 音频数据转有符号16位
wav_data_short=ceil(wav_data*32767);
% 播放音频
sound(wav_data,8000);
% 定义出现end_cnt帧的静音段就判断为一段语音结束
global end_cnt
end_cnt=4;
% 运行计时,计算时间
tic
for i=1:1:1
    hello_fun_top(wav_data);
end
toc

函数hello_fun_top.m:该函数为功能顶层,负责实现数据的初步处理,以及有无语音段的处理:

function [] = hello_fun_top(wav_data)
% 有声音帧计数
dist_ind=1;
% 每帧固定为256个点
N_frame=256;
% 训练得到的参数模板:hi VeriSilicon
train_data=[-0.346487792276900,-1.40059809625545,-4.41959884388216,-8.48010517470577,-10.1563616372905,-7.83369012654417,-4.33333586849326,-2.96012177654879,-3.14158613425379,-2.48508781888993,-0.937603394384175,-0.0672717297338245;3.47173817984233,2.80302616388370,-2.42307496356610,-9.01554510792411,-10.4633991889052,-6.16962553032179,-2.28216621437324,-2.08751015283687,-2.72513654189198,-1.84957388103955,-0.646441429938092,-0.145779342456221;4.71660978520528,1.13955656623149,-5.37084129611757,-8.91496409642569,-7.86341913513536,-5.18822572092777,-3.52198818905299,-2.81615061090971,-1.99982194243341,-0.827357392054028,0.0628652218551304,0.181323329849240;3.40539873644088,0.694792490673434,-4.46109055139396,-7.27784002705807,-5.92975497186088,-3.20317707506988,-2.09595775867707,-2.35478535793317,-2.27143548883744,-1.42593901126683,-0.449846427039349,0.0725120416106929;-0.482276063159477,-0.653921631056499,-1.34142316333990,-2.32251138055431,-2.42389195504733,-1.74486870219226,-1.88744892549279,-3.04860671421533,-3.29484947230030,-1.75734700243898,-0.0969100690700208,0.237522468293704;-5.09283151614853,-5.84220920053041,-4.39860496655632,-2.71645720810805,-2.83291714727149,-4.59835955453183,-6.17290539954239,-6.33961024569723,-5.20637164218657,-3.30975025866925,-1.34476448946671,-0.178497290175860;-7.03159513997222,-6.71726414389574,-2.47933142419532,1.42605435253041,1.30640441747211,-1.68272417966983,-4.11103476041911,-4.96403786683175,-4.84533794451010,-3.52143190041928,-1.34762559872198,-0.0681421691831556;-8.79754842421281,-9.18063908523648,-4.91045498887341,-0.689373989770212,-0.756259248031253,-3.12994905404126,-3.54478137372699,-2.41742351531491,-2.41099056808390,-2.79553063372757,-1.76902163057785,-0.382302277986526;-10.2359222656664,-9.55185948343906,-3.14607333709229,2.36090905174150,1.50949916239800,-3.11486454725529,-5.22630359946415,-3.78782678509071,-2.33318469769639,-2.09497576477884,-1.54775924786337,-0.481253448761232;-10.5867990463412,-11.0560933448733,-6.89029809856297,-3.34847068618553,-3.78052255315306,-5.62679540895853,-5.25415521784573,-3.51929409083408,-2.88230566301155,-2.75936748477054,-1.68070872769370,-0.414677706729609;-9.09710246763254,-9.00457548594006,-4.18183710488182,-0.316306724403971,-1.79200434402207,-5.46483147597634,-5.84902677606135,-3.53111258217141,-2.41367895158650,-2.53232491848859,-1.76994227671839,-0.526345075447937;-6.20785406931988,-3.00775928807175,2.36656292935327,3.24489309432851,-0.416721259278301,-3.40525481979209,-3.12784463581975,-1.54346066471865,-0.807837163947345,-0.867568190187987,-0.899453879053944,-0.519988945673566;-3.72917693440528,-0.575634540367284,3.30932902205347,2.09580217455325,-2.86601030621307,-5.12660512922101,-2.64082371137209,0.416679462809215,0.838468530846086,-0.114988023141514,-0.556705064861166,-0.393599401237234;0.377670051976313,2.97906327983534,2.50883257010591,-2.63879965328752,-6.11675462353044,-4.22033172026864,-1.26702036406206,-0.864942146171124,-1.23385681304984,-0.857013560676071,-0.665355318444790,-0.538689474967369;5.24762789000734,1.96243233397669,-4.73186288442174,-8.33922657948775,-6.83816429421120,-4.54505299393654,-4.31582123778340,-3.91337901979114,-2.05749196665034,-0.970958903392067,-1.14637282258610,-0.649492872802199;7.83005542354891,0.775970066991494,-8.34841165587985,-9.53056294682028,-6.02817690644319,-5.07150189852252,-5.39370316836607,-3.45601904609045,-1.59459157735755,-2.16113969307143,-2.46191425973908,-0.842349830627002;7.08070617763245,2.64073601114737,-5.69258189831835,-9.45455146558611,-7.18253261654048,-4.65723808165221,-4.49769006884498,-3.82351580896533,-2.14764313884999,-2.06783122869298,-2.56311994065719,-1.08162510319291;5.30338615240640,3.84073359954042,-2.86965701561013,-9.45059307429258,-9.14536813045573,-3.31073773412743,0.842637412801565,0.554699961295544,-1.40505297471787,-2.71879670656333,-2.61320344628649,-0.999916927376488;3.10569915548717,3.38090850739264,-0.0794951139648172,-5.45824320989512,-7.06150799389493,-3.08038324641694,1.68369028337882,2.53819959536040,-0.0523811710277615,-2.67112703199853,-2.87794263198372,-1.06975185750731;1.70533213757194,5.10058050577560,3.10037095421733,-4.93592461850606,-7.91868930233325,-1.49079972331970,4.32021067477619,2.36585968811023,-2.14815672841159,-3.31723408984283,-2.13617179365666,-0.802246214988725;2.15704666336311,4.46602394969611,2.24922250180016,-4.21318183576888,-6.13782019486470,-0.623839657129102,3.95047873098438,1.76898044838797,-2.75404039103051,-4.16008734345524,-2.77399773450878,-0.913026046768560;2.62676683894223,2.51655765011151,-0.447043834907020,-3.02554441368889,-0.894453250477268,3.04324323006874,2.38002643430364,-2.19269348998201,-4.88709715531977,-4.16625856620542,-2.41536228795465,-0.796116297666870;1.15319777182983,1.20379596659796,0.553402496136700,0.667210116098151,2.43525149119830,3.40937078893473,1.00312147023887,-2.91567714381015,-4.66867992350319,-3.84307269319891,-2.25733395413927,-0.766128678430922;-2.09243197223717,1.22002347047954,3.57247114239196,1.61707174270082,-0.358673492207187,0.632026988285516,0.789335616061468,-2.26610351813221,-4.80776113884556,-3.97642442363588,-1.80680938358159,-0.497481477081157;-5.54613328818907,-7.63086919832646,-4.85914732396804,0.358949645177525,0.853466114964847,-3.04282336620253,-4.21728283272533,-1.72303890477661,-0.651338725731605,-1.59186557659911,-1.42589456141306,-0.366720574402728;-5.44390113089781,-6.27208757222090,-2.91238200487901,1.88796818003593,3.15180836029830,1.29255202721880,0.321336867762490,0.324637967670616,-0.697757641614151,-1.43717944911415,-0.516026768550734,0.171512203995564;-8.24182209314572,-9.02929398137949,-4.86981068144915,0.105123093706848,0.643568606182415,-2.16445019420603,-3.36805541969622,-2.39991654083974,-2.05513561522542,-2.23240056524466,-1.31485165447946,-0.213956337567454;-8.72789349417727,-9.90979684028138,-5.42807067860905,0.472751858429949,1.27806647201230,-2.02858821225129,-3.03239097322009,-0.790892093576028,0.344032389544138,-0.507991663270559,-0.770296087619528,-0.187722929832367;-3.84960237909830,-2.37086003233349,1.00289902544100,2.60989200770772,1.41184187374441,-0.396985948166406,-1.31494248842683,-2.10220389887235,-3.05097072657737,-2.96834027280220,-1.64113867436586,-0.400286957854115;3.56957255231716,-0.907973826920979,-6.71371895214135,-6.14196532026549,0.249528211023774,3.84164433365373,0.201550308328275,-5.32125925861299,-6.45021434529576,-3.94430122091966,-1.62429837394415,-0.452311761567839;3.63045505634974,-1.03776462853281,-6.26072879668372,-5.26794203218961,-0.177852453028166,2.19002221965325,-0.334573789080238,-4.45196479067683,-6.30798501786690,-4.77208759268115,-1.82349853146501,-0.158118926993572;1.98938006447405,-0.0325816587295955,-3.49591809478721,-4.13046466198497,-0.321305315516148,3.49827535222856,2.01139639528249,-3.43549159009661,-6.53605170510628,-4.69466574497830,-1.40508363773093,-0.0428434925012992;-4.48973718459526,-2.13325914936847,1.44961496350584,2.68604358838788,2.93043778125031,3.53894068632447,2.34822862488275,-1.27952091060754,-3.90962705052606,-3.30515740008223,-1.34147312192545,-0.236182286059822;-6.36897974647100,-5.20752602007130,-0.264239386595744,3.30798323658032,1.95215277798776,-1.65918921388790,-3.08564613254258,-2.42389932804449,-2.10759095598133,-2.00535254256951,-1.04301529423882,-0.124809978974030;-1.36437736238015,-2.75965610589315,-1.84371737172631,2.64940042742208,6.79350155064443,5.73144216193387,0.129029066257984,-4.75156394110257,-5.43010919243582,-3.11350087694235,-0.875065855215043,-0.0990510648992851;-8.81648521793255,-7.11108554793183,-0.127908890224114,5.54434949255312,5.08534162982704,0.848148475780311,-1.66364044777122,-1.16697871371409,-0.469342719441197,-0.938976578898557,-1.25874360103974,-0.616022553856458;-10.1541417146605,-10.5959100698306,-6.02277992170454,-1.15830698399157,-0.280383365990228,-2.45220504841123,-3.60629806172453,-2.35285903038172,-0.832716488009152,-0.594893321530385,-0.814842186520405,-0.461775497198708;-5.18089825991546,-7.35981835192460,-6.67852777520248,-3.94279145138424,-2.09917290400226,-1.48608755734041,-0.899668151954474,-1.44838349287662,-3.61682577312480,-4.34497594830270,-2.20350678451463,-0.230366558843506;-7.83027920338372,-9.20733215234427,-6.67906009719959,-2.63520534472246,-0.231992616499862,0.210943870338659,-0.0256876599756042,-0.942419317014597,-2.38011858026205,-2.57125698128354,-1.09657027470888,-0.00328097196321000;-7.94594519271502,-8.91529622678431,-7.14331946074132,-5.48919217886401,-5.06081872502042,-4.46408640782126,-2.94431621686490,-1.77302556222995,-1.65953506389284,-1.42181313320038,-0.433515993219803,0.162225631388930;-2.97138256096333,-2.38744661054110,-2.19282430509071,-3.71311193812218,-4.27869755001121,-2.56146147753729,-1.39843537912454,-2.64958818934978,-4.10602475624858,-3.44399534304378,-1.43953221161365,-0.110386890285081;-8.09029159381245,-7.49635361518510,-2.52078188345283,1.48493740550135,0.577383440365449,-2.77032911433443,-4.04086240274354,-3.23767523282871,-2.80182601130324,-2.56257331040581,-1.36825479249201,-0.204044011561308];
% 训练得到的参数模板:hi VeriSilicon 的16位定点数计算版本
train_data_fixed=ceil(train_data*256)';

% 构造mfcc模板矩阵
cc=zeros(100,12);
% 构造mfcc状态参数,0表示该帧静音、1表示该帧有语音、2表示一段语音结束
valid_sit=0;
% 构造mfcc帧计数
mfcc_count=0;
for ind=1:1:floor(length(wav_data)/N_frame)
%   取一帧数据进行运算,wav_data_in为刚刚得到的一帧的数据
	wav_data_in=wav_data((ind-1)*N_frame+1:ind*N_frame);
%   提取每一帧的MFCC矩阵 
	[cc,valid_sit,mfcc_count] = hello_fun(wav_data_in,valid_sit,cc,mfcc_count,ind);
%   valid_sit==2代表一段语音结束(回到静音段)
    if valid_sit==2
        valid_sit=0;
%       提取上一语音段的mfcc矩阵
        cc=cc(1:mfcc_count*1,:);
%       进行DTW算法
        dist=dtw(cc,train_data);
%       计算完成后清除缓存
        cc=zeros(100,12);
        mfcc_count=0;
%       历史数据记录      
        disp(dist)
        dist_arrary(dist_ind)=dist;
        dist_ind=dist_ind+1;
    end
end
end

函数hello_fun.m:识别算法的顶层,负责有无语音帧的处理:

function [cc,valid_sit,mfcc_count] = hello_fun(wav_data,valid_sit,cc,mfcc_count,ind)

global end_cnt
% 从频域进行功率计算,从而判断阈值
wav_data_fd=fft_power(wav_data);
vads_power=sum(wav_data_fd(6:end-6).*wav_data_fd(6:end-6));
% 语音VAD判断,使用滞回判断,vads_power大于400认为语音开始,vads_power连续end_cnt帧小于20认为语音结束
if vads_power>20
    end_cnt=4;
    %vads_power大于400认为语音开始
    if valid_sit==0&&vads_power>400
        valid_sit=1;
        cc(1*mfcc_count+1:1*mfcc_count+1,:)=wxp_mfcc(wav_data);
        mfcc_count=mfcc_count+1;
    %vads_power大于20认为语音连续
    elseif valid_sit==1
        cc(1*mfcc_count+1:1*mfcc_count+1,:)=wxp_mfcc(wav_data);
        mfcc_count=mfcc_count+1;        
    end
   %语音太长,直接结束判断
   if mfcc_count>25600/256
    valid_sit=2;
   end
   
else 
    %vads_power连续end_cnt帧小于20认为语音结束
    if valid_sit==1
        if end_cnt>0
            end_cnt=end_cnt-1;
            cc(1*mfcc_count+1:1*mfcc_count+1,:)=wxp_mfcc(wav_data);
            mfcc_count=mfcc_count+1;
        else
            valid_sit=2;
             mfcc_count=mfcc_count-4+end_cnt;
        end
    end
end
end

wxp_mfcc.m为MFCC特征提取的主要代码:

function cc=wxp_mfcc(AggrK)

% 预加重
[AggrK] = fixed_filter(AggrK);
% 加窗
AggrK=fixed_mfcc_window(AggrK);
% 频域变换
[fft_AggrK] = fft_power(AggrK');
% 其他处理流程
% M为滤波器个数,N为一帧语音采样点数
M=12; N=256;
% 归一化mel滤波器组系数
bank=wxp_melbankm(M,N,8000,0,0.5,'m');
bank=full(bank);
bank=bank/max(bank(:));
% DCT系数,12*24
for i=1:12
  j=0:M-1;
  dctcoef(i,:)=cos((2*j+1)*i*pi/(2*24));
end
% 归一化倒谱提升窗口
w=1+6*sin(pi*[1:12]./12);
w=w/max(w);
% 计算功率谱
S=((fft_AggrK))+1e-15;
% 将功率谱通过滤波器组
P=bank*S(1:N/2+1,:);
% 取对数后作离散余弦变换
D=dctcoef*log(P);
% 倒谱提升窗
m=(D.*w')';
% 差分系数
dtm=-2*m(3-2,:);
dtm=dtm/3;
cc=[m];
end

dtw算法实现(dtw.m):

function dist = dtw(test, ref)
% global x y_min y_max
% global t r
% global D d
% global m n

t = test;
r = ref;
n = size(t,1);
m = size(r,1);

d = zeros(m,1);
D =  ones(m,1) * 65535;
D(1) = 0;

% 如果两个模板长度相差过多,匹配失败
if (2*m-n<3) | (2*n-m<2)
	dist = 65535;
	return
end

% 计算匹配区域
xa = round((2*m-n)/3);
xb = round((2*n-m)*2/3);

if xb>xa
	%xb>xa, 按下面三个区域匹配
	%        1   :xa
	%        xa+1:xb
	%        xb+1:N
	for x = 1:xa
		y_max = 2*x;
		y_min = round(0.5*x);
		[D]=wxp_wrap(D,y_min,y_max,x,t,r);
	end
	for x = (xa+1):xb
		y_max = round(0.5*(x-n)+m);
		y_min = round(0.5*x);
		[D]=wxp_wrap(D,y_min,y_max,x,t,r);
	end
	for x = (xb+1):n
		y_max = round(0.5*(x-n)+m);
		y_min = round(2*(x-n)+m);
		[D]=wxp_wrap(D,y_min,y_max,x,t,r);
	end
elseif xa>xb
	%xa>xb, 按下面三个区域匹配
	%        0   :xb
	%        xb+1:xa
	%        xa+1:N
	for x = 1:xb
		y_max = 2*x;
		y_min = round(0.5*x);
		[D]=wxp_wrap(D,y_min,y_max,x,t,r);
	end
	for x = (xb+1):xa
		y_max = 2*x;
		y_min = round(2*(x-n)+m);
		[D]=wxp_wrap(D,y_min,y_max,x,t,r);
	end
	for x = (xa+1):n
		y_max = round(0.5*(x-n)+m);
		y_min = round(2*(x-n)+m);
		[D]=wxp_wrap(D,y_min,y_max,x,t,r);
	end
elseif xa==xb
	%xa=xb, 按下面两个区域匹配
	%        0   :xa
	%        xa+1:N
	for x = 1:xa
		y_max = 2*x;
		y_min = round(0.5*x);
		[D]=wxp_wrap(D,y_min,y_max,x,t,r);
	end
	for x = (xa+1):n
		y_max = round(0.5*(x-n)+m);
		y_min = round(2*(x-n)+m);
		[D]=wxp_wrap(D,y_min,y_max,x,t,r);
	end
end

%返回匹配分数
dist = D(m);

3、MFCC+DTW的Matlab代码实现效果

原来代码的识别目标为 hi VeriSilicon。
Voice_data_1为正确语音,运行结果为2104。
Voice_data_0为错误语音,运行结果为11072。

注意这边运算得到的结果为差异值,即结果越小则可以判断为识别到了目标语言。

4、全定点数MFCC+DTW的C语言实现

大家可以直接下载工程运行即可,在此给出主要函数展示:
定点数加窗算法wxp_window.h:

#include "hello_fun_top.h"
#include "wxp_mfcc.h"
#include "wxp_window.h"
void fixed_mfcc_window_fixpt(int AggrK[256], short AggrK2[256])
{
  int32_T i;
  static const uint16_T b0[256] = { 5242U, 5252U, 5279U, 5325U, 5389U, 5471U,
    5571U, 5690U, 5826U, 5981U, 6153U, 6343U, 6551U, 6776U, 7018U, 7278U, 7555U,
    7849U, 8159U, 8486U, 8829U, 9189U, 9564U, 9955U, 10362U, 10783U, 11220U,
    11671U, 12137U, 12617U, 13110U, 13618U, 14138U, 14671U, 15217U, 15775U,
    16345U, 16927U, 17519U, 18123U, 18737U, 19361U, 19995U, 20638U, 21291U,
    21951U, 22620U, 23297U, 23981U, 24672U, 25370U, 26073U, 26782U, 27497U,
    28216U, 28940U, 29667U, 30398U, 31132U, 31869U, 32607U, 33348U, 34089U,
    34832U, 35575U, 36317U, 37059U, 37800U, 38540U, 39278U, 40013U, 40746U,
    41475U, 42201U, 42922U, 43639U, 44351U, 45057U, 45758U, 46452U, 47140U,
    47820U, 48493U, 49158U, 49814U, 50462U, 51101U, 51730U, 52349U, 52958U,
    53556U, 54144U, 54719U, 55283U, 55835U, 56375U, 56902U, 57416U, 57916U,
    58403U, 58876U, 59334U, 59778U, 60207U, 60621U, 61020U, 61403U, 61771U,
    62122U, 62457U, 62776U, 63078U, 63363U, 63632U, 63883U, 64117U, 64333U,
    64532U, 64713U, 64877U, 65022U, 65150U, 65259U, 65350U, 65423U, 65478U,
    65515U, 65533U, 65533U, 65515U, 65478U, 65423U, 65350U, 65259U, 65150U,
    65022U, 64877U, 64713U, 64532U, 64333U, 64117U, 63883U, 63632U, 63363U,
    63078U, 62776U, 62457U, 62122U, 61771U, 61403U, 61020U, 60621U, 60207U,
    59778U, 59334U, 58876U, 58403U, 57916U, 57416U, 56902U, 56375U, 55835U,
    55283U, 54719U, 54144U, 53556U, 52958U, 52349U, 51730U, 51101U, 50462U,
    49814U, 49158U, 48493U, 47820U, 47140U, 46452U, 45758U, 45057U, 44351U,
    43639U, 42922U, 42201U, 41475U, 40746U, 40013U, 39278U, 38540U, 37800U,
    37059U, 36317U, 35575U, 34832U, 34089U, 33348U, 32607U, 31869U, 31132U,
    30398U, 29667U, 28940U, 28216U, 27497U, 26782U, 26073U, 25370U, 24672U,
    23981U, 23297U, 22620U, 21951U, 21291U, 20638U, 19995U, 19361U, 18737U,
    18123U, 17519U, 16927U, 16345U, 15775U, 15217U, 14671U, 14138U, 13618U,
    13110U, 12617U, 12137U, 11671U, 11220U, 10783U, 10362U, 9955U, 9564U, 9189U,
    8829U, 8486U, 8159U, 7849U, 7555U, 7278U, 7018U, 6776U, 6551U, 6343U, 6153U,
    5981U, 5826U, 5690U, 5571U, 5471U, 5389U, 5325U, 5279U, 5252U, 5242U };
  /*      hamming_256=hamming(256); */
  for (i = 0; i < 256; i++) {
    AggrK2[i] = (int16_T)((AggrK[i] * b0[i]) >> 16);
  }
}

剩下的不放了,有水字数嫌疑,大家下载工程查看即可。

5、全定点数MFCC+DTW的C语言效果

注意观察main函数,其中hello_fun_top(Voice_data_1);
代表识别的哪一段音频数据,如果需要识别其他数据修改这个数组就行。

/* Include Files */
#include "stdio.h"
#include 
#include "main.h"
#include "hello_fun_top.h"
/*
 * Arguments    : void
 * Return Type  : void
 */
void main_hello_fun(void)
{
  unsigned long CNT1,CNT2;
  //c_wav_data_false3
  //c_wav_data_true_slow1
  /* Call the entry-point 'hello_fun'. */
  double run_time;
  LARGE_INTEGER time_start;	//开始时间
	LARGE_INTEGER time_over;	//结束时间
	double dqFreq;		//计时器频率
	LARGE_INTEGER f;	//计时器频率
	QueryPerformanceFrequency(&f);
	dqFreq=(double)f.QuadPart;
	QueryPerformanceCounter(&time_start);	//计时开始
  //运行识别
  hello_fun_top(Voice_data_1);

  QueryPerformanceCounter(&time_over);	//计时结束
	run_time=1000000*(time_over.QuadPart-time_start.QuadPart)/dqFreq;

  printf("\nrun_time: %fus\n",run_time);

  // printf("%f",fs_tmp);
}

/*
 * Arguments    : int argc
 *                const char * const argv[]
 * Return Type  : int
 */
int main(void)
{
  main_hello_fun();

  return 0;
}

hello_fun_top()输入的数据默认是16位有符号数,数组大小为25600,如果需要修改必须同时在main.h中添加数据数组并修改wxp_config.h中的参数(要处理的数据长度 wave_data_remain =100,每帧256,一共25600):

#ifndef WXP_CONFIG_H
#define WXP_CONFIG_H

/* Include Files */
#include 
#include 
#include "rtwtypes.h"
#include "hello_fun_types.h"

//判断是否有语音的功率谱阈值
#define VADS_THRESHOLD 20 
//判断语音开始的功率谱阈值
#define VADS_BEGIN_THRESHOLD 400
//语音太长直接截断,最大帧MFCC_MAX_CNT
#define MFCC_CNT_MAX 100 
//语音消失等待帧数
#define FRAME_END_CNT_MAX 4 
//CC矩阵的长宽
#define CC_SIZE00 12
#define CC_SIZE01 100

//要处理的数据长度
#define wave_data_remain 100

//训练得到的MFCC模板长度
#define Train_SIZE01 42

static const short Mfcc_Template[Train_SIZE01*CC_SIZE00] = {-88,889,1208,872,-123,-1303,-1800,-2252,-2620,-2710,-2328,-1589,-954,97,1344,2005,1813,1358,796,437,553,673,296,-535,-1419,-1393,-2109,-2234,-985,914,930,510,-1149,-1630,-349,-2257,-2599,-1326,-2004,-2034,-760,-2071,-358,718,292,178,-167,-1495,-1719,-2350,-2445,-2830,-2305,-769,-147,763,503,199,677,984,866,1306,1144,645,309,313,-1953,-1605,-2311,-2536,-606,-232,-265,-8,-546,-1333,-706,-1820,-2712,-1884,-2357,-2282,-611,-1919,-1131,-620,-1374,-1142,-343,-1126,-634,-1257,-805,-1763,-1070,606,848,643,-1211,-2137,-1457,-734,-20,794,576,-114,142,915,-1243,-745,-1246,-1389,257,-1718,-1602,-894,372,-67,-471,-32,-1541,-1709,-1709,-1828,-561,-645,-2170,-2307,-2282,-1863,-594,-695,366,-176,605,-857,-80,831,537,-675,-2134,-2439,-2420,-2419,-1397,-1263,-1078,-774,171,414,92,484,27,122,669,-1572,-1348,-1057,688,847,679,1420,-296,-1009,-674,-1405,-950,381,-2600,-2678,-2013,-1518,-620,-725,335,-193,387,-967,-458,-106,-733,-1565,-1750,-1543,-1838,-2341,-1807,-2027,-1571,-228,624,-91,219,807,165,328,362,64,-45,-82,751,500,1740,1302,-71,-537,-59,-1295,-1095,148,-2005,-1579,-1328,-820,-446,-1177,-430,-801,-797,-1440,-1398,-871,-1312,-1080,-1163,-1298,-1192,-847,-788,-381,-159,780,873,162,-778,331,-554,-519,-101,984,561,896,906,-424,1468,218,-627,-380,55,-1142,-655,-709,-1109,-584,-901,-536,-483,-1580,-1052,-907,-1337,-1345,-1497,-800,-676,-324,-1104,-1380,-1151,216,432,1106,1012,610,257,203,-1079,83,-862,-776,-336,52,-85,515,602,-789,34,-425,-923,-230,-6,-753,-357,-1034,-757,-534,-720,-602,-780,-1622,-1270,-618,-969,-900,-903,-395,107,-221,-1001,-884,-978,143,650,606,453,-561,-746,-580,-441,84,-614,-202,-538,-1362,-1139,-879,-327,-620,-1216,-298,-602,-370,-241,-453,-678,-828,-804,-697,-511,-581,-843,-1332,-1240,-617,-597,-737,-617,-206,215,-315,-526,-408,-549,-359,-13,-549,-705,-1251,-1195,-1230,-166,-178,-526,89,-781,-1651,-1614,-1673,-1000,-539,-1390,-120,-213,-925,-609,-424,-1051,-717,-636,-473,-211,-365,-449,-847,-901,-715,-536,-706,-648,-222,-29,-219,-248,-553,-529,-696,-683,-849,-1064,-1066,-983,-1017,-407,-367,-571,-130,-759,-1009,-1221,-1201,-846,-513,-797,-240,-152,-1112,-658,-363,-881,-656,-240,-165,17,-115,-24,-344,-344,-452,-396,-430,-453,-230,-142,-170,-293,-630,-656,-668,-736,-546,-710,-618,-577,-462,-365,-132,-336,-197,-420,-415,-466,-359,-343,-267,-224,-322,-208,-564,-280,-110,-368,-350,-17,-37,47,19,61,-45,-17,-97,-123,-106,-134,-133,-100,-137,-166,-215,-276,-255,-273,-205,-233,-203,-196,-127,-93,44,-54,-48,-102,-115,-40,-10,-60,-31,-25,-157,-118,-58,0,42,-28,-52};

#endif

wxp_mfcc算法中所有的不需要计算的参数,如梅尔滤波器数组、DCT数组等全部在C代码中设置成了常量,如果需要使用其他的配置需要更新这些数组!!!

原来代码的识别目标为 hi VeriSilicon。
Voice_data_1为正确语音,运行结果为2476。
Voice_data_0为错误语音,运行结果为11942。

在这里插入图片描述

你可能感兴趣的:(单片机,语音识别,算法,matlab)