GMM-HMM语音识别原理
1. HMM
隐马尔科夫模型(HMM)是一种统计模型,用来描述含有隐含参数的马尔科夫过程。难点是从隐含状态确定出马尔科夫过程的参数,以此作进一步的分析。下图是一个三个状态的隐马尔可夫模型状态转移图,其中x 表示隐含状态,y 表示可观察的输出,a 表示状态转换概率,b 表示输出概率:
a:转移概率
b:输出概率
y:观测状态
x:隐含状态
一个HMM模型是一个五元组(π,A,B,O,S)
其中π:初始概率向量
A: 转移概率
B: 输出概率
O:观测状态
S:隐含状态
围绕着HMM,有三个和它相关的问题:
1. 已知隐含状态数目,已知转移概率,根据观测状态,估计观测状态对应的隐含状态
2. 已知隐含状态数目,已知转移概率,根据观测状态,估计输出这个观测状态的概率
3. 已知隐含状态数目,根据大量实验结果(很多组观测状态),反推转移概率
对于第一个问题,所对应的就是维特比算法
对于第二个问题,所对应的就是前向算法
对于第三个问题,就是前向后向算法
语音识别的过程其实就是上述的第一个问题。根据HMM模型和观测状态(即语音信号数字信号处理后的内容),得到我们要的状态,而状态的组合就是识别出来的文本。
为啥呢?
1) 在语音处理中,一个word有一个或多个音素构成。怎么说呢?这里补充一下语言学的一些知识。在语言学中,word(字)还可以再分解成音节(syllable),音节可以再分成音素(phoneme),音素之后就不能再分了.因此音素是语音中最小的单元,不管语音识别还是语音合成,都是在最小单元上进行操作的,即phoneme。比如我们的“我”,它的拼音是wo3(这个其实就是word,即字),由于中文的word和syllable是相同的,即中文是单音节语言,即中文有且只有一个音节,但英文就不一样,比如hello这个单词,他就是有两个音节,hello=hae|low,即hello有hae和low这两个音节组成.音节下一层是phoneme(音素),语音识别的过程就是把这些 音素找到,然后音素拼接成音节,音节在拼接成word.如下图所示
在识别过程中如果识别出来了音素,向上递归,就能得到word。
因此我们的目标是获取音素phoneme.
2) 在训练时,一个HMM对应一个音素,每个HMM包含n个state(状态),有的是3个状态,有的是5个状态,状态数在训练前确定后,一旦训练完成所有HMM的状态个数都是一致的,比如3个。
GMM是当做发射概率的,即在已知观测状态情况下,哪一种音素会产生这种状态的概率,概率最大的就是我们要的音素。因此,GMM是来计算某一个音素的概率。
GMM的全称是gaussmixture model(高斯混合模型),在训练前,一般会定义由几个高斯来决定音素的概率(高斯数目是超参数)。如下图所示为3高斯:
假设现在我们定义一个HMM由3个状态组成,每个状态中有一个GMM,每个GMM中是由3个gauss。
如上图假设y有状态1,2,3组成,每一个状态下面有一个GMM,高斯个数是3.
由此我们训练的参数有HMM的转移概率矩阵+每一个单高斯的方差,均值和权重(当然还有一个初始概率矩阵).如果我们能得到这些参数,我们是不是就能进行语音识别了?
接下来,就看看GMM-HMM到底是如何做到的?
1:将送进来的语音信号进行分帧(一般是20ms一帧,帧移是10ms),然后提取特征
2:用GMM,判断当前的特征序列属于哪个状态(计算概率)
3:根据前面两个步骤,得出状态序列,其实就得到了音素序列,即得到了声韵母序列。
如下面图所示.
对于GMM-HMM实现语音识别(确切的说是非连续语音识别),到此基本上就结束了,对于连续语音识别而言,还有一个语言模型(主要是通过语料库,n-gram模型)。而前面的GMM-HMM就是声学模型.
代码解析:
下面是关于GMM-HMM声学模型,特征序列提取到训练,并且实现识别的完整代码(操作系统:ubuntu16.04,python2)
该demo总共有三个文件,
1:gParam.py,主要是为了配置一些参数
2:核心文件是my_hmm.py,里面实现的是主要代码。
3:test.py是运行文件.
该demo主要是实现识别阿拉伯数字,1,2,3,4.....你可以自己录制训练数据和测试数据.然后设置好路径,运行下面的程序.
程序是完整的
程序是完整的
程序是完整的
说三遍.
wav数据的格式是:
gParam.py 代码解析:
#! /usr/bin python
# encoding:utf-8
TRAIN_DATA_PATH = './data/train/'
TEST_DATA_PATH = './data/test/'
NSTATE = 4
NPDF = 3
MAX_ITER_CNT = 100
NUM=10
这个就没什么好说的。设置路径参数而已.
核心文件my_hmm.py:
#! /usr/bin python
# encoding:utf_8
import numpy as np
from numpy import *
from sklearn.cluster import KMeans
from scipy import sparse
import scipy.io as sio
from scipy import signal
import wave
import math
import gParam
import copy
def pdf(m,v,x):
'''计算多元高斯密度函数
输入:
m---均值向量 SIZE×1
v---方差向量 SIZE×1
x---输入向量 SIZE×1
输出:
p---输出概率'''
test_v = np.prod(v,axis=0)
test_x = np.dot((x-m)/v,x-m)
p = (2*math.pi*np.prod(v,axis=0))**-0.5*np.exp(-0.5*np.dot((x-m)/v,x-m))
return p
# class of every sample infomation
class sampleInfo:
"""docstring for ClassName"""
def __init__(self):
self.smpl_wav = []
self.smpl_data = []
self.seg = []
def set_smpl_wav(self,wav):
self.smpl_wav.append(wav)
def set_smpl_data(self,data):
self.smpl_data.append(data)
def set_segment(self, seg_list):
self.seg = seg_list
#class of mix info from KMeans
class mixInfo:
"""docstring for mixInfo"""
def __init__(self):
self.Cmean = []
self.Cvar = []
self.Cweight = []
self.CM = []
class hmmInfo:
'''hmm model param'''
def __init__(self):
self.init = [] #初始矩阵
self.trans = [] #转移概率矩阵
self.mix = [] #高斯混合模型参数
self.N = 0 #状态数
# class of gmm_hmm model
class gmm_hmm:
def __init__(self):
self.hmm = [] #单个hmm序列,
self.gmm_hmm_model = [] #把所有的训练好的gmm-hmm写入到这个队列
self.samples = [] # 0-9 所有的音频数据
self.smplInfo = [] #这里面主要是单个数字的音频数据和对应mfcc数据
self.stateInfo = [gParam.NPDF,gParam.NPDF,gParam.NPDF,gParam.NPDF]#每一个HMM对应len(stateInfo)个状态,每个状态指定高斯个数(3)
def loadWav(self,pathTop):
for i in range(gParam.NUM):
tmp_data = []
for j in range(gParam.NUM):
wavPath = pathTop + str(i) + str(j) + '.wav'
f = wave.open(wavPath,'rb')
params = f.getparams()
nchannels,sampwidth,framerate,nframes = params[:4]
str_data = f.readframes(nframes)
#print shape(str_data)
f.close()
wave_data = np.fromstring(str_data,dtype=short)/32767.0
#wave_data.shape = -1,2
#wave_data = wave_data.T
#wave_data = wave_data.reshape(1,wave_data.shape[0]*wave_data.shape[1])
#print shape(wave_data),type(wave_data)
tmp_data.append(wave_data)
self.samples.append(tmp_data)
#循环读数据,然后进行训练
def hmm_start_train(self):
Nsmpls = len(self.samples)
for i in range(Nsmpls):
tmpSmplInfo0 = []
n = len(self.samples[i])
for j in range(n):
tmpSmplInfo1 = sampleInfo()
tmpSmplInfo1.set_smpl_wav(self.samples[i][j])
tmpSmplInfo0.append(tmpSmplInfo1)
#self.smplInfo.append(tmpSmplInfo0)
print '现在训练第%d个HMM模型' %i
hmm0 = self.trainhmm(tmpSmplInfo0,self.stateInfo)
print '第%d个模型已经训练完毕' %i
# self.gmm_hmm_model.append(hmm0)
#训练hmm
def trainhmm(self,sample,state):
K = len(sample)
print '首先进行语音参数计算-MFCC'
for k in range(K):
tmp = self.mfcc(sample[k].smpl_wav)
sample[k].set_smpl_data(tmp) # 设置MFCCdata
hmm = self.inithmm(sample,state)
pout = zeros((gParam.MAX_ITER_CNT,1))
for my_iter in range(gParam.MAX_ITER_CNT):
print '第%d遍训练' %my_iter
hmm = self.baum(hmm,sample)
for k in range(K):
pout[my_iter,0] = pout[my_iter,0] + self.viterbi(hmm,sample[k].smpl_data[0])
if my_iter > 0:
if(abs((pout[my_iter,0] - pout[my_iter-1,0])/pout[my_iter,0]) < 5e-6):
print '收敛'
self.gmm_hmm_model.append(hmm)
return hmm
self.gmm_hmm_model.append(hmm)
#获取MFCC参数
def mfcc(self,k):
M = 24 #滤波器的个数
N = 256 #一帧语音的采样点数
arr_mel_bank = self.melbank(M,N,8000,0,0.5,'m')
arr_mel_bank = arr_mel_bank/np.amax(arr_mel_bank)
#计算DCT系数, 12*24
rDCT = 12
cDCT = 24
dctcoef = []
for i in range(1,rDCT+1):
tmp = [np.cos((2*j+1)*i*math.pi*1.0/(2.0*cDCT)) for j in range(cDCT)]
dctcoef.append(tmp)
#归一化倒谱提升窗口
w = [1+6*np.sin(math.pi*i*1.0/rDCT) for i in range(1,rDCT+1)]
w = w/np.amax(w)
#预加重
AggrK = double(k)
AggrK = signal.lfilter([1,-0.9375],1,AggrK)# ndarray
#AggrK = AggrK.tolist()
#分帧
FrameK = self.enframe(AggrK[0],N,80)
n0,m0 = FrameK.shape
for i in range(n0):
#temp = multiply(FrameK[i,:],np.hamming(N))
#print shape(temp)
FrameK[i,:] = multiply(FrameK[i,:],np.hamming(N))
FrameK = FrameK.T
#计算功率谱
S = (abs(np.fft.fft(FrameK,axis=0)))**2
#将功率谱通过滤波器组
P = np.dot(arr_mel_bank,S[0:129,:])
#取对数后做余弦变换
D = np.dot(dctcoef,log(P))
n0,m0 = D.shape
m = []
for i in range(m0):
m.append(np.multiply(D[:,i],w))
n0,m0 = shape(m)
dtm = zeros((n0,m0))
for i in range(2,n0-2):
dtm[i,:] = -2*m[i-2][:] - m[i-1][:] + m[i+1][:] + 2*m[i+2][:]
dtm = dtm/3.0
# cc = [m,dtm]
cc =np.column_stack((m,dtm))
# cc.extend(list(dtm))
cc = cc[2:n0-2][:]
#print shape(cc)
return cc
#melbank
def melbank(self,p,n,fs,f1,fh,w):
f0 = 700.0/(1.0*fs)
fn2 = floor(n/2.0)
lr = math.log((float)(f0+fh)/(float)(f0+f1))/(float)(p+1)
tmpList = [0,1,p,p+1]
bbl = []
for i in range(len(tmpList)):
bbl.append(n*((f0+f1)*math.exp(tmpList[i]*lr) - f0))
#b1 = n*((f0+f1) * math.exp([x*lr for x in tmpList]) - f0)
#print bbl
b2 = ceil(bbl[1])
b3 = floor(bbl[2])
if(w == 'y'):
pf = np.log((f0+range(b2,b3)/n)/(f0+f1))/lr #note
fp = floor(pf)
r = [ones((1,b2)),fp,fp+1, p*ones((1,fn2-b3))]
c = [range(0,b3),range(b2,fn2)]
v = 2*[0.5,ones((1,b2-1)),1-pf+fp,pf-fp,ones((1,fn2-b3-1)),0.5]
mn = 1
mx = fn2+1
else:
b1 = floor(bbl[0])+1
b4 = min([fn2,ceil(bbl[3])])-1
pf = []
for i in range(int(b1),int(b4+1),1):
pf.append(math.log((f0+(1.0*i)/n)/(f0+f1))/lr)
fp = floor(pf)
pm = pf - fp
k2 = b2 - b1 + 1
k3 = b3 - b1 + 1
k4 = b4 - b1 + 1
r = fp[int(k2-1):int(k4)]
r1 = 1+fp[0:int(k3)]
r = r.tolist()
r1 = r1.tolist()
r.extend(r1)
#r = [fp[int(k2-1):int(k4)],1+fp[0:int(k3)]]
c = range(int(k2),int(k4+1))
c2 = range(1,int(k3+1))
# c = c.tolist()
# c2 = c2.tolist()
c.extend(c2)
#c = [range(int(k2),int(k4+1)),range(0,int(k3))]
v = 1-pm[int(k2-1):int(k4)]
v = v.tolist()
v1 = pm[0:int(k3)]
v1 = v1.tolist()
v.extend(v1)#[1-pm[int(k2-1):int(k4)],pm[0:int(k3)]]
v = [2*x for x in v]
mn = b1 + 1
mx = b4 + 1
if(w == 'n'):
v = 1 - math.cos(v*math.pi/2)
elif (w == 'm'):
tmpV = []
# for i in range(v):
# tmpV.append(1-0.92/1.08*math.cos(v[i]*math))
v = [1 - 0.92/1.08*math.cos(x*math.pi/2) for x in v]
#print type(c),type(mn)
col_list = [x+int(mn)-2 for x in c]
r = [x-1 for x in r]
x = sparse.coo_matrix((v,(r,col_list)),shape=(p,1+int(fn2)))
matX = x.toarray()
#np.savetxt('./data.csv',matX, delimiter=' ')
return matX#x.toarray()
#分帧函数
def enframe(self,x,win,inc):
nx = len(x)
try:
nwin = len(win)
except Exception as err:
# print err
nwin = 1
if (nwin == 1):
wlen = win
else:
wlen = nwin
#print inc,wlen,nx
nf = fix(1.0*(nx-wlen+inc)/inc) #here has a bug that nf maybe less than 0
f = zeros((int(nf),wlen))
indf = [inc*j for j in range(int(nf))]
indf = (mat(indf)).T
inds = mat(range(wlen))
indf_tile = tile(indf,wlen)
inds_tile = tile(inds,(int(nf),1))
mix_tile = indf_tile + inds_tile
for i in range(nf):
for j in range(wlen):
f[i,j] = x[mix_tile[i,j]]
#print x[mix_tile[i,j]]
if nwin>1: #TODOd
w = win.tolist()
#w_tile = tile(w,(int))
return f
# init hmm
def inithmm(self,sample,M):
K = len(sample)
N0 = len(M)
self.N = N0
#初始概率矩阵
hmm = hmmInfo()
hmm.init = zeros((N0,1))
hmm.init[0] = 1
hmm.trans = zeros((N0,N0))
hmm.N = N0
#初始化转移概率矩阵
for i in range(self.N-1):
hmm.trans[i,i] = 0.5
hmm.trans[i,i+1] = 0.5
hmm.trans[self.N-1,self.N-1] = 1
#概率密度函数的初始聚类
#分段
for k in range(K):
T = len(sample[k].smpl_data[0])
#seg0 = []
seg0 = np.floor(arange(0,T,1.0*T/N0))
#seg0 = int(seg0.tolist())
seg0 = np.concatenate((seg0,[T]))
#seg0.append(T)
sample[k].seg = seg0
#对属于每个状态的向量进行K均值聚类,得到连续混合正态分布
mix = []
for i in range(N0):
vector = []
for k in range(K):
seg1 = int(sample[k].seg[i])
seg2 = int(sample[k].seg[i+1])
tmp = []
tmp = sample[k].smpl_data[0][seg1:seg2][:]
if k == 0:
vector = np.array(tmp)
else:
vector = np.concatenate((vector, np.array(tmp)))
#vector.append(tmp)
# tmp_mix = mixInfo()
# print id(tmp_mix)
tmp_mix = self.get_mix(vector,M[i],mix)
# mix.append(tmp_mix)
hmm.mix = mix
return hmm
# get mix data
def get_mix(self,vector,K,mix0):
kmeans = KMeans(n_clusters = K,random_state=0).fit(np.array(vector))
#计算每个聚类的标准差,对角阵,只保存对角线上的元素
mix = mixInfo()
var0 = []
mean0 = []
#ind = []
for j in range(K):
#ind = [i for i in kmeans.labels_ if i==j]
ind = []
ind1 = 0
for i in kmeans.labels_:
if i == j:
ind.append(ind1)
ind1 = ind1 + 1
tmp = [vector[i][:] for i in ind]
var0.append(np.std(tmp,axis=0))
mean0.append(np.mean(tmp,axis=0))
weight0 = zeros((K,1))
for j in range(K):
tmp = 0
ind1 = 0
for i in kmeans.labels_:
if i == j:
tmp = tmp + ind1
ind1 = ind1 + 1
weight0[j] = tmp
weight0=weight0/weight0.sum()
mix.Cvar = multiply(var0,var0)
mix.Cmean = mean0
mix.CM = K
mix.Cweight = weight0
mix0.append(mix)
return mix0
#baum-welch 算法实现函数体
def baum(self,hmm,sample):
mix = copy.deepcopy(hmm.mix)#高斯混合
N = len(mix) #HMM状态数
K = len(sample) #语音样本数
SIZE = shape(sample[0].smpl_data[0])[1] #参数阶数,MFCC维数
print '计算样本参数.....'
c = []
alpha = []
beta = []
ksai = []
gama = []
for k in range(K):
c0,alpha0,beta0,ksai0,gama0 = self.getparam(hmm, sample[k].smpl_data[0])
c.append(c0)
alpha.append(alpha0)
beta.append(beta0)
ksai.append(ksai0)
gama.append(gama0)
# 重新估算概率转移矩阵
print '----- 重新估算概率转移矩阵 -----'
for i in range(N-1):
denom = 0
for k in range(K):
ksai0 = ksai[k]
tmp = ksai0[:,i,:]#ksai0[:][i][:]
denom = denom + sum(tmp)
for j in range(i,i+2):
norm = 0
for k in range(K):
ksai0 = ksai[k]
tmp = ksai0[:,i,j]#[:][i][j]
norm = norm + sum(tmp)
hmm.trans[i,j] = norm/denom
# 重新估算发射概率矩阵,即GMM的参数
print '----- 重新估算输出概率矩阵,即GMM的参数 -----'
for i in range(N):
for j in range(mix[i].CM):
nommean = zeros((1,SIZE))
nomvar = zeros((1,SIZE))
denom = 0
for k in range(K):
gama0 = gama[k]
T = shape(sample[k].smpl_data[0])[0]
for t in range(T):
x = sample[k].smpl_data[0][t][:]
nommean = nommean + gama0[t,i,j]*x
nomvar = nomvar + gama0[t,i,j] * (x - mix[i].Cmean[j][:])**2
denom = denom + gama0[t,i,j]
hmm.mix[i].Cmean[j][:] = nommean/denom
hmm.mix[i].Cvar[j][:] = nomvar/denom
nom = 0
denom = 0
#计算pdf权值
for k in range(K):
gama0 = gama[k]
tmp = gama0[:,i,j]
nom = nom + sum(tmp)
tmp = gama0[:,i,:]
denom = denom + sum(tmp)
hmm.mix[i].Cweight[j] = nom/denom
return hmm
#前向-后向算法
def getparam(self,hmm,O):
'''给定输出序列O,计算前向概率alpha,后向概率beta
标定系数c,及ksai,gama
输入: O:n*d 观测序列
输出: param: 包含各种参数的结构'''
T = shape(O)[0]
init = hmm.init #初始概率
trans = copy.deepcopy(hmm.trans) #转移概率
mix = copy.deepcopy(hmm.mix) #高斯混合
N = hmm.N #状态数
#给定观测序列,计算前向概率alpha
x = O[0][:]
alpha = zeros((T,N))
#----- 计算前向概率alpha -----#
for i in range(N): #t=0
tmp = hmm.init[i] * self.mixture(mix[i],x)
alpha[0,i] = tmp #hmm.init[i]*self.mixture(mix[i],x)
#标定t=0时刻的前向概率
c = zeros((T,1))
c[0] = 1.0/sum(alpha[0][:])
alpha[0][:] = c[0] * alpha[0][:]
for t in range(1,T,1): # t = 1~T
for i in range(N):
temp = 0.0
for j in range(N):
temp = temp + alpha[t-1,j]*trans[j,i]
alpha[t,i] = temp *self.mixture(mix[i],O[t][:])
c[t] = 1.0/sum(alpha[t][:])
alpha[t][:] = c[t]*alpha[t][:]
#----- 计算后向概率 -----#
beta = zeros((T,N))
for i in range(N): #T时刻
beta[T-1,i] = c[T-1]
for t in range(T-2,-1,-1):
x = O[t+1][:]
for i in range(N):
for j in range(N):
beta[t,i] = beta[t,i] + beta[t+1,j]*self.mixture(mix[j],x) * trans[i,j]
beta[t][:] = c[t] * beta[t][:]
# 过渡概率ksai
ksai = zeros((T-1,N,N))
for t in range(0,T-1):
denom = sum(np.multiply(alpha[t][:],beta[t][:]))
for i in range(N-1):
for j in range(i,i+2,1):
norm = alpha[t,i]*trans[i,j]*self.mixture(mix[j],O[t+1][:])*beta[t+1,j]
ksai[t,i,j] = c[t]*norm/denom
# 混合输出概率 gama
gama = zeros((T,N,max(self.stateInfo)))
for t in range(T):
pab = zeros((N,1))
for i in range(N):
pab[i] = alpha[t,i]*beta[t,i]
x = O[t][:]
for i in range(N):
prob = zeros((mix[i].CM,1))
for j in range(mix[i].CM):
m = mix[i].Cmean[j][:]
v = mix[i].Cvar[j][:]
prob[j] = mix[i].Cweight[j] * pdf(m,v,x)
if mix[i].Cweight[j] == 0.0:
print pdf(m,v,x)
tmp = pab[i]/pab.sum()
tmp = tmp[0]
temp_sum = prob.sum()
for j in range(mix[i].CM):
gama[t,i,j] = tmp*prob[j]/temp_sum
return c,alpha,beta,ksai,gama
def mixture(self,mix,x):
'''计算输出概率
输入:mix--混合高斯结构
x--输入向量 SIZE*1
输出: prob--输出概率'''
prob = 0.0
for i in range(mix.CM):
m = mix.Cmean[i][:]
v = mix.Cvar[i][:]
w = mix.Cweight[i]
tmp = pdf(m,v,x)
#print tmp
prob = prob + w * tmp #* pdf(m,v,x)
if prob == 0.0:
prob = 2e-100
return prob
#维特比算法
def viterbi(self,hmm,O):
'''%输入:
hmm -- hmm模型
O -- 输入观察序列, N*D, N为帧数,D为向量维数
输出:
prob -- 输出概率
q -- 状态序列
'''
init = copy.deepcopy(hmm.init)
trans = copy.deepcopy(hmm.trans)#hmm.trans
mix = hmm.mix
N = hmm.N
T = shape(O)[0]
#计算Log(init)
n_init = len(init)
for i in range(n_init):
if init[i] <= 0:
init[i] = -inf
else:
init[i]=log(init[i])
#计算log(trans)
m,n = shape(trans)
for i in range(m):
for j in range(n):
if trans[i,j] <=0:
trans[i,j] = -inf
else:
trans[i,j] = log(trans[i,j])
#初始化
delta = zeros((T,N))
fai = zeros((T,N))
q = zeros((T,1))
#t=0
x = O[0][:]
for i in range(N):
delta[0,i] = init[i] + log(self.mixture(mix[i],x))
#t=2:T
for t in range(1,T):
for j in range(N):
tmp = delta[t-1][:]+trans[:][j].T
tmp = tmp.tolist()
delta[t,j] = max(tmp)
fai[t,j] = tmp.index(max(tmp))
x = O[t][:]
delta[t,j] = delta[t,j] + log(self.mixture(mix[j],x))
tmp = delta[T-1][:]
tmp = tmp.tolist()
prob = max(tmp)
q[T-1]=tmp.index(max(tmp))
for t in range(T-2,-1,-1):
q[t] = fai[t+1,int(q[t+1,0])]
return prob
# ----------- 以下是用于测试的程序 ---------- #
#
def vad(self,k,fs):
'''语音信号端点检测程序
k ---语音信号
fs ---采样率
返回语音信号的起始和终止端点'''
k = double(k)
k = multiply(k,1.0/max(abs(k)))
# 计算短时过零率
FrameLen = 240
FrameInc = 80
FrameTemp1 = self.enframe(k[0:-2], FrameLen, FrameInc)
FrameTemp2 = self.enframe(k[1:], FrameLen, FrameInc)
signs = np.sign(multiply(FrameTemp1, FrameTemp2))
signs = map(lambda x:[[i,0] [i>0] for i in x],signs)
signs = map(lambda x:[[i,1] [i<0] for i in x], signs)
diffs = np.sign(abs(FrameTemp1 - FrameTemp2)-0.01)
diffs = map(lambda x:[[i,0] [i<0] for i in x], diffs)
zcr = sum(multiply(signs, diffs),1)
# 计算短时能量
amp = sum(abs(self.enframe(signal.lfilter([1,-0.9375],1,k),FrameLen, FrameInc)),1)
# print '短时能量%f' %amp
# 设置门限
print '设置门限'
ZcrLow = max([round(mean(zcr)*0.1),3])#过零率低门限
ZcrHigh = max([round(max(zcr)*0.1),5])#过零率高门限
AmpLow = min([min(amp)*10,mean(amp)*0.2,max(amp)*0.1])#能量低门限
AmpHigh = max([min(amp)*10,mean(amp)*0.2,max(amp)*0.1])#能量高门限
# 端点检测
MaxSilence = 8 #最长语音间隙时间
MinAudio = 16 #最短语音时间
Status = 0 #状态0:静音段,1:过渡段,2:语音段,3:结束段
HoldTime = 0 #语音持续时间
SilenceTime = 0 #语音间隙时间
print '开始端点检测'
StartPoint = 0
for n in range(len(zcr)):
if Status ==0 or Status == 1:
if amp[n] > AmpHigh or zcr[n] > ZcrHigh:
StartPoint = n - HoldTime
Status = 2
HoldTime = HoldTime + 1
SilenceTime = 0
elif amp[n] > AmpLow or zcr[n] > ZcrLow:
Status = 1
HoldTime = HoldTime + 1
else:
Status = 0
HoldTime = 0
elif Status == 2:
if amp[n] > AmpLow or zcr[n] > ZcrLow:
HoldTime = HoldTime + 1
else:
SilenceTime = SilenceTime + 1
if SilenceTime < MaxSilence:
HoldTime = HoldTime + 1
elif (HoldTime - SilenceTime) < MinAudio:
Status = 0
HoldTime = 0
SilenceTime = 0
else:
Status = 3
elif Status == 3:
break
if Status == 3:
break
HoldTime = HoldTime - SilenceTime
EndPoint = StartPoint + HoldTime
return StartPoint,EndPoint
def recog(self,pathTop):
N = gParam.NUM
for i in range(N):
wavPath = pathTop + str(i) + '.wav'
f = wave.open(wavPath,'rb')
params = f.getparams()
nchannels,sampwidth,framerate,nframes = params[:4]
str_data = f.readframes(nframes)
#print shape(str_data)
f.close()
wave_data = np.fromstring(str_data,dtype=short)/32767.0
x1,x2 = self.vad(wave_data,framerate)
O = self.mfcc([wave_data])
O = O[x1-3:x2-3][:]
print '第%d个词的观察矢量是:%d' %(i,i)
pout = []
for j in range(N):
pout.append(self.viterbi(self.gmm_hmm_model[j],O))
n = pout.index(max(pout))
print '第%d个词,识别是%d' %(i,n)
接下来就是test.py文件:
#! /usr/bin python
# encoding:utf-8
import numpy as np
from numpy import *
import gParam
from my_hmm import gmm_hmm
my_gmm_hmm = gmm_hmm()
my_gmm_hmm.loadWav(gParam.TRAIN_DATA_PATH)
#print len(my_gmm_hmm.samples[0])
my_gmm_hmm.hmm_start_train()
my_gmm_hmm.recog(gParam.TEST_DATA_PATH)
#my_gmm_hmm.melbank(24,256,8000,0,0.5,'m')
# my_gmm_hmm.mfcc(range(17280))
#my_gmm_hmm.enframe(range(0,17280),256,80)
最后运行的结果如下图所示:
最后:如果您想直接跑程序,您可以通过以下方式获取我的数据和源程序。由于考虑到个人的人工成本,我形式上只收取5块钱的人工费,既是对我的支持,也是对我的鼓励。谢谢大家的理解。把订单后面6位号码发送给我,我把源码和数据给您呈上。谢谢。
1:扫如下支付宝或微信二维码,支付5元
2:把支付单号的后6位,以邮件发送到我的邮箱[email protected]
3:您也可以在下方留言,把订单号写上来,我会核实。
谢谢大家。