Zhankun Luo
PUID: 0031195279
Email: [email protected]
Fall-2018-ECE-59500-009
Instructor: Toma Hentea
Dynamic Time Warping in Speech Recognition
Experiment with the matlab script IsoDigitRec.m to match template recordings of digits zero.wav, one.wav, …
to a set of template patterns provided in the textbook’s software
ind=strfind(curDir,'\');
is changed to ind=strfind(curDir,'/');
[x, Fs, bits] = wavread()
will be removed, so replace them with [x,Fs]=audioread
protoNames={'zero', ...}
to protoNames={'zero.wav',...}
accordingly% IsoDigitRec.m (Example 5.4)
% "Introduction to Pattern Recognition: A MATLAB Approach"
% S. Theodoridis, A. Pikrakis, K. Koutroumbas, D. Cavouras
% At a first step, the data folder of Chapter 5 is appended to the existing
% MATLAB path.
curDir=pwd;
ind=strfind(curDir,'/');
curDir(ind(end)+1:end)=[];
addpath([curDir 'data'],'-end');
close('all');
clear;
% To build the system, we will use short-term Energy and short-term Zero-
% Crossing Rate (Section 7.5.4, [Theo 09]) as features, so that each signal is rep-
% resented by a sequence of two-dimensional feature vectors. Note that this is not
% an optimal feature set in any sense and it has only been adopted on the basis of
% simplicity. The feature extraction stage is accomplished by typing the following
% code:
protoNames={'zero.wav','one.wav','two.wav','three.wav','four.wav','five.wav','six.wav','seven.wav','eight.wav','nine.wav'};
for i=1:length(protoNames)
[x,Fs]=audioread(protoNames{i});
winlength = round(0.02*Fs); % 20 ms moving window length
winstep = winlength; % moving window step. No overlap
[E,T]=stEnergy(x,Fs,winlength,winstep);
[Zcr,T]=stZeroCrossingRate(x,Fs,winlength,winstep);
protoFSeq{i}=[E;Zcr];
end
% To find the best match for an unknown pattern, say a pattern stored in file
% "upattern1.wav", type the following code:
[test,Fs]=audioread('upattern1.wav');
winlength = round(0.02*Fs); % use the same values as before
winstep = winlength;
[E,T]=stEnergy(test,Fs,winlength,winstep);
[Zcr,T]=stZeroCrossingRate(test,Fs,winlength,winstep);
Ftest=[E;Zcr];
tolerance=0.1;
LeftEndConstr=round(tolerance/winstep); % left endpoint constraint
RightEndConstr = LeftEndConstr;
for i=1:length(protoNames)
[MatchingCost(i),BestPath{i},D{i},Pred{i}]=DTWSakoeEndp(protoFSeq{i},Ftest,LeftEndConstr,RightEndConstr,0);
end
[minCost,indexofBest]=min(MatchingCost);
fprintf('The unknown pattern has been identified as a "%s" \n',protoNames{indexofBest});
Change [test,Fs]=audioread('upattern1.wav');
to [test,Fs]=audioread('upattern02.wav');
, etc.
Then get Result for Patterns:
Name of Pattern | Identified as |
---|---|
upattern1.wav | zero.wav |
upattern02.wav | zero.wav |
upattern11.wav | zero.wav |
upattern12.wav | one.wav |
upattern13.wav | one.wav |
upattern14.wav | three.wav |
upattern15.wav | zero.wav |
upattern16.wav | four.wav |
upattern17.wav | four.wav |
upattern21.wav | three.wav |
upattern22.wav | two.wav |
upattern23.wav | two.wav |
upattern51.wav | five.wav |
upattern61.wav | six.wav |
HMM recognition and training
Run example633.m, example634.m, example635.m and example636.m
put BackTracking.m
of Chap 5 into Chap 6 function&example folder
Because MultSeqTrainDoHMMVITsc.m
use function BackTracking.m
% CHAPTER 6: m-files
%
% BWDoHMMsc - Computes the recognition probability of a HMM, given a sequence of % discrete observations, by means of the scaled version of the Baum- % Welch (any-path) method
% BWDoHMMst - Same as BWDoHMMSc, except that no scaling is employed.
% MultSeqTrainCoHMMBWsc - Baum-Welch training (scaled version) of a Continuous Observation
% HMM, given multiple training sequences. Each sequence
% consists of l-dimensional feature vectors.
% It is assumed that the pdf associated with each state
% is a multivariate Gaussian mixture.
% MultSeqTrainDoHMMBWsc - Baum-Welch training (scaled version) of a Discrete Observation
% HMM, given multiple training sequences.
% MultSeqTrainDoHMMVITsc - Viterbi training (scaled version) of a Discrete ObservationHMM,given
% multiple training sequences.
% VitCoHMMsc - Computes the scaledViterbi score of aHMM,given a sequence of l- % dimensional vectors
% of continuous observations, under the assumption that the pdf
% of each state is a Gaussian mixture.
% VitCoHMMst - Same as VitCoHMMsc except that no scaling is employed.
% VitDoHMMsc - Computes the scaled Viterbi score of a Discrete Observation HMM,
% given a sequence of observations.
% VitDoHMMst - Same as VitDoHMMsc, except that no scaling is employed.
epoch = 1
epoch = 2
piTrained_1 =
0.7141
0.2859
ATrained_1 =
0.6743 0.3257
0.6746 0.3254
BTrained_1 =
0.7672 0.3544
0.2328 0.6456
% press any key
epoch = 1
epoch = 2
epoch = 3
epoch = 4
epoch = 5
epoch = 6
epoch = 7
epoch = 8
epoch = 9
epoch = 10
epoch = 11
epoch = 12
epoch = 13
piTrained_2 =
1
0
ATrained_2 =
1.0000 0.0000
0 1.0000
BTrained_2 =
0.6333 0
0.3667 1.0000
theEpoch = 1
theEpoch = 2
piTrained_1 =
0.6857
0.3143
ATrained_1 =
0.6278 0.3722
0.6288 0.3712
BTrained_1 =
1 0
0 1
epoch = 1
epoch = 2
piTrained_1 =
0.7141
0.2859
ATrained_1 =
0.6743 0.3257
0.6746 0.3254
BTrained_1 =
0.7672 0.3544
0.2328 0.6456
% press any key
epoch = 1
epoch = 2
epoch = 3
epoch = 4
epoch = 5
epoch = 6
epoch = 7
epoch = 8
epoch = 9
epoch = 10
epoch = 11
epoch = 12
epoch = 13
piTrained_2 =
1
0
ATrained_2 =
1.0000 0.0000
0 1.0000
BTrained_2 =
0.6333 0
0.3667 1.0000
Pr1 = -8.8513
Pr2 = -15.1390
bs1 =
1 1 1 1 1 1 1 2 2 2 2 2 2
bs2 =
1 2 2 2 2 2 2 2 2 2 2 2 2