不用框架,实现非连续语音识别Demo(python源码)

GMM-HMM语音识别原理

1.       HMM

隐马尔科夫模型(HMM)是一种统计模型,用来描述含有隐含参数的马尔科夫过程。难点是从隐含状态确定出马尔科夫过程的参数,以此作进一步的分析。下图是一个三个状态的隐马尔可夫模型状态转移图,其中x 表示隐含状态,y 表示可观察的输出,a 表示状态转换概率,b 表示输出概率:

不用框架,实现非连续语音识别Demo(python源码)_第1张图片

a:转移概率

b:输出概率

y:观测状态

x:隐含状态

一个HMM模型是一个五元组(π,A,B,O,S)

其中π:初始概率向量

A:   转移概率

B:   输出概率

O:观测状态

S:隐含状态

围绕着HMM,有三个和它相关的问题:

1.       已知隐含状态数目,已知转移概率,根据观测状态,估计观测状态对应的隐含状态

2.       已知隐含状态数目,已知转移概率,根据观测状态,估计输出这个观测状态的概率

3.       已知隐含状态数目,根据大量实验结果(很多组观测状态),反推转移概率

对于第一个问题,所对应的就是维特比算法

对于第二个问题,所对应的就是前向算法

对于第三个问题,就是前向后向算法

语音识别的过程其实就是上述的第一个问题。根据HMM模型和观测状态(即语音信号数字信号处理后的内容),得到我们要的状态,而状态的组合就是识别出来的文本。

为啥呢?

1)       在语音处理中,一个word有一个或多个音素构成。怎么说呢?这里补充一下语言学的一些知识。在语言学中,word(字)还可以再分解成音节(syllable),音节可以再分成音素(phoneme),音素之后就不能再分了.因此音素是语音中最小的单元,不管语音识别还是语音合成,都是在最小单元上进行操作的,即phoneme。比如我们的“我”,它的拼音是wo3(这个其实就是word,即字),由于中文的word和syllable是相同的,即中文是单音节语言,即中文有且只有一个音节,但英文就不一样,比如hello这个单词,他就是有两个音节,hello=hae|low,即hello有hae和low这两个音节组成.音节下一层是phoneme(音素),语音识别的过程就是把这些 音素找到,然后音素拼接成音节,音节在拼接成word.如下图所示

不用框架,实现非连续语音识别Demo(python源码)_第2张图片

在识别过程中如果识别出来了音素,向上递归,就能得到word。

因此我们的目标是获取音素phoneme.

2)       在训练时,一个HMM对应一个音素,每个HMM包含n个state(状态),有的是3个状态,有的是5个状态,状态数在训练前确定后,一旦训练完成所有HMM的状态个数都是一致的,比如3个。

GMM是当做发射概率的,即在已知观测状态情况下,哪一种音素会产生这种状态的概率,概率最大的就是我们要的音素。因此,GMM是来计算某一个音素的概率。

GMM的全称是gaussmixture model(高斯混合模型),在训练前,一般会定义由几个高斯来决定音素的概率(高斯数目是超参数)。如下图所示为3高斯:

不用框架,实现非连续语音识别Demo(python源码)_第3张图片

假设现在我们定义一个HMM由3个状态组成,每个状态中有一个GMM,每个GMM中是由3个gauss。

不用框架,实现非连续语音识别Demo(python源码)_第4张图片

如上图假设y有状态1,2,3组成,每一个状态下面有一个GMM,高斯个数是3.

由此我们训练的参数有HMM的转移概率矩阵+每一个单高斯的方差,均值和权重(当然还有一个初始概率矩阵).如果我们能得到这些参数,我们是不是就能进行语音识别了?

接下来,就看看GMM-HMM到底是如何做到的?

1:将送进来的语音信号进行分帧(一般是20ms一帧,帧移是10ms),然后提取特征

2:用GMM,判断当前的特征序列属于哪个状态(计算概率)

3:根据前面两个步骤,得出状态序列,其实就得到了音素序列,即得到了声韵母序列。

如下面图所示.

不用框架,实现非连续语音识别Demo(python源码)_第5张图片

对于GMM-HMM实现语音识别(确切的说是非连续语音识别),到此基本上就结束了,对于连续语音识别而言,还有一个语言模型(主要是通过语料库,n-gram模型)。而前面的GMM-HMM就是声学模型.

 

代码解析:

下面是关于GMM-HMM声学模型,特征序列提取到训练,并且实现识别的完整代码(操作系统:ubuntu16.04,python2)

该demo总共有三个文件,

1:gParam.py,主要是为了配置一些参数

2:核心文件是my_hmm.py,里面实现的是主要代码。

3:test.py是运行文件.

该demo主要是实现识别阿拉伯数字,1,2,3,4.....你可以自己录制训练数据和测试数据.然后设置好路径,运行下面的程序.

程序是完整的

程序是完整的

程序是完整的

说三遍.

wav数据的格式是:

不用框架,实现非连续语音识别Demo(python源码)_第6张图片

gParam.py 代码解析:

#! /usr/bin python
# encoding:utf-8

TRAIN_DATA_PATH = './data/train/'
TEST_DATA_PATH = './data/test/'
NSTATE = 4
NPDF = 3
MAX_ITER_CNT = 100
NUM=10

这个就没什么好说的。设置路径参数而已.

核心文件my_hmm.py:

#! /usr/bin python
# encoding:utf_8

import numpy as np
from numpy import *
from sklearn.cluster import KMeans
from scipy import sparse
import scipy.io as sio
from scipy import signal
import wave
import math
import gParam
import copy

def pdf(m,v,x):
	'''计算多元高斯密度函数
	输入:
	m---均值向量 SIZE×1
	v---方差向量 SIZE×1
	x---输入向量 SIZE×1
	输出:
	p---输出概率'''
	test_v = np.prod(v,axis=0)
	test_x = np.dot((x-m)/v,x-m)
	p = (2*math.pi*np.prod(v,axis=0))**-0.5*np.exp(-0.5*np.dot((x-m)/v,x-m))
	return p

# class of every sample infomation
class sampleInfo:
	"""docstring for ClassName"""
	def __init__(self):		
		self.smpl_wav = []
		self.smpl_data = []
		self.seg = []
	def set_smpl_wav(self,wav):
		self.smpl_wav.append(wav)
	def set_smpl_data(self,data):
		self.smpl_data.append(data)
	def set_segment(self, seg_list):
		self.seg = seg_list

#class of mix info from KMeans
class mixInfo:
	"""docstring for mixInfo"""
	def __init__(self):
		self.Cmean = []
		self.Cvar = []
		self.Cweight = []
		self.CM = []
class hmmInfo:
	'''hmm model param'''
	def __init__(self):
		self.init = [] #初始矩阵
		self.trans = [] #转移概率矩阵
		self.mix = [] #高斯混合模型参数
		self.N = 0 #状态数
# class of gmm_hmm model
class gmm_hmm:
	def __init__(self):
		self.hmm = [] #单个hmm序列,
		self.gmm_hmm_model = [] #把所有的训练好的gmm-hmm写入到这个队列
		self.samples = [] # 0-9 所有的音频数据
		self.smplInfo = [] #这里面主要是单个数字的音频数据和对应mfcc数据
		self.stateInfo = [gParam.NPDF,gParam.NPDF,gParam.NPDF,gParam.NPDF]#每一个HMM对应len(stateInfo)个状态,每个状态指定高斯个数(3)
	def loadWav(self,pathTop):
		for i in range(gParam.NUM):
			tmp_data = []
			for j in range(gParam.NUM):
				wavPath = pathTop + str(i) + str(j) + '.wav'
				f = wave.open(wavPath,'rb')
				params = f.getparams()
				nchannels,sampwidth,framerate,nframes = params[:4]
				str_data = f.readframes(nframes)
				#print shape(str_data)
				f.close()
				wave_data = np.fromstring(str_data,dtype=short)/32767.0
				#wave_data.shape = -1,2
				#wave_data = wave_data.T
				#wave_data = wave_data.reshape(1,wave_data.shape[0]*wave_data.shape[1])
				#print shape(wave_data),type(wave_data)				
				tmp_data.append(wave_data)
			self.samples.append(tmp_data)
	#循环读数据,然后进行训练		
	def hmm_start_train(self):
		Nsmpls = len(self.samples)
		for i in range(Nsmpls):
			tmpSmplInfo0 = []
			n = len(self.samples[i])
			for j in range(n):
				tmpSmplInfo1 = sampleInfo()
				tmpSmplInfo1.set_smpl_wav(self.samples[i][j])
				tmpSmplInfo0.append(tmpSmplInfo1)
			#self.smplInfo.append(tmpSmplInfo0)
			print '现在训练第%d个HMM模型' %i
			hmm0 = self.trainhmm(tmpSmplInfo0,self.stateInfo)
			print '第%d个模型已经训练完毕' %i
			# self.gmm_hmm_model.append(hmm0)
	#训练hmm		
	def trainhmm(self,sample,state):
		K = len(sample)
		print '首先进行语音参数计算-MFCC'
		for k in range(K):
			tmp = self.mfcc(sample[k].smpl_wav)
			sample[k].set_smpl_data(tmp) # 设置MFCCdata
		hmm = self.inithmm(sample,state)
		pout = zeros((gParam.MAX_ITER_CNT,1))
		for my_iter in range(gParam.MAX_ITER_CNT):
			print '第%d遍训练' %my_iter
			hmm = self.baum(hmm,sample)
			for k in range(K):
				pout[my_iter,0] = pout[my_iter,0] + self.viterbi(hmm,sample[k].smpl_data[0])
			if my_iter > 0:
				if(abs((pout[my_iter,0] - pout[my_iter-1,0])/pout[my_iter,0]) < 5e-6):
					print '收敛'
					self.gmm_hmm_model.append(hmm)
					return hmm
		self.gmm_hmm_model.append(hmm)
	#获取MFCC参数
	def mfcc(self,k):
		M = 24 #滤波器的个数		
		N = 256	#一帧语音的采样点数
		arr_mel_bank = self.melbank(M,N,8000,0,0.5,'m')
		arr_mel_bank = arr_mel_bank/np.amax(arr_mel_bank)
		#计算DCT系数, 12*24
		rDCT = 12
		cDCT = 24
		dctcoef = []
		for i in range(1,rDCT+1):			
			tmp = [np.cos((2*j+1)*i*math.pi*1.0/(2.0*cDCT)) for j in range(cDCT)]
			dctcoef.append(tmp)
		#归一化倒谱提升窗口
		w = [1+6*np.sin(math.pi*i*1.0/rDCT) for i in range(1,rDCT+1)]
		w = w/np.amax(w)
		#预加重
		AggrK = double(k)
		AggrK = signal.lfilter([1,-0.9375],1,AggrK)# ndarray
		#AggrK = AggrK.tolist()
		#分帧
		FrameK = self.enframe(AggrK[0],N,80)
		n0,m0 = FrameK.shape
		for i in range(n0):
			#temp = multiply(FrameK[i,:],np.hamming(N))
			#print shape(temp)
			FrameK[i,:] = multiply(FrameK[i,:],np.hamming(N))	
		FrameK = FrameK.T
		#计算功率谱
		S = (abs(np.fft.fft(FrameK,axis=0)))**2
		#将功率谱通过滤波器组		
		P = np.dot(arr_mel_bank,S[0:129,:])
		#取对数后做余弦变换
		D = np.dot(dctcoef,log(P))
		n0,m0 = D.shape
		m = []
		for i in range(m0):
			m.append(np.multiply(D[:,i],w))
		n0,m0 = shape(m)
		dtm = zeros((n0,m0))
		for i in range(2,n0-2):
			dtm[i,:] = -2*m[i-2][:] - m[i-1][:] + m[i+1][:] + 2*m[i+2][:]
		dtm = dtm/3.0
		# cc = [m,dtm]
		cc =np.column_stack((m,dtm))
		# cc.extend(list(dtm))
		cc = cc[2:n0-2][:]
		#print shape(cc)
		return cc
			
	#melbank
	def melbank(self,p,n,fs,f1,fh,w):
		f0 = 700.0/(1.0*fs)
		fn2 = floor(n/2.0)
		lr = math.log((float)(f0+fh)/(float)(f0+f1))/(float)(p+1)
		tmpList = [0,1,p,p+1]
		bbl = []
		for i in range(len(tmpList)):
			bbl.append(n*((f0+f1)*math.exp(tmpList[i]*lr) - f0))
		#b1 = n*((f0+f1) * math.exp([x*lr for x in tmpList]) - f0)
		#print bbl
		b2 = ceil(bbl[1])
		b3 = floor(bbl[2])
		if(w == 'y'):
			pf = np.log((f0+range(b2,b3)/n)/(f0+f1))/lr #note
			fp = floor(pf)
			r = [ones((1,b2)),fp,fp+1, p*ones((1,fn2-b3))]						
			c = [range(0,b3),range(b2,fn2)]
			v = 2*[0.5,ones((1,b2-1)),1-pf+fp,pf-fp,ones((1,fn2-b3-1)),0.5]				          
			mn = 1
			mx = fn2+1
		else:
			b1 = floor(bbl[0])+1
			b4 = min([fn2,ceil(bbl[3])])-1
			pf = []
			for i in range(int(b1),int(b4+1),1):
				pf.append(math.log((f0+(1.0*i)/n)/(f0+f1))/lr)
			fp = floor(pf)
			pm = pf - fp
			k2 = b2 - b1 + 1
			k3 = b3 - b1 + 1
			k4 = b4 - b1 + 1
			r = fp[int(k2-1):int(k4)]
			r1 = 1+fp[0:int(k3)]
			r = r.tolist()
			r1 = r1.tolist()
			r.extend(r1)
			#r = [fp[int(k2-1):int(k4)],1+fp[0:int(k3)]]
			c = range(int(k2),int(k4+1))
			c2 = range(1,int(k3+1))
			# c = c.tolist()
			# c2 = c2.tolist()
			c.extend(c2)
			#c = [range(int(k2),int(k4+1)),range(0,int(k3))]
			v = 1-pm[int(k2-1):int(k4)]
			v = v.tolist()
			v1 = pm[0:int(k3)]
			v1 = v1.tolist()
			v.extend(v1)#[1-pm[int(k2-1):int(k4)],pm[0:int(k3)]]
			v = [2*x for x in v]
			mn = b1 + 1
			mx = b4 + 1
		if(w == 'n'):
			v = 1 - math.cos(v*math.pi/2)
		elif (w == 'm'):
			tmpV = []
			# for i in range(v):
			# 	tmpV.append(1-0.92/1.08*math.cos(v[i]*math))
			v = [1 - 0.92/1.08*math.cos(x*math.pi/2) for x in v]
		#print type(c),type(mn)
		col_list = [x+int(mn)-2 for x in c]
		r = [x-1 for x in r]
		x = sparse.coo_matrix((v,(r,col_list)),shape=(p,1+int(fn2)))
		matX = x.toarray()
		#np.savetxt('./data.csv',matX, delimiter=' ')
		return matX#x.toarray()
	#分帧函数
	def enframe(self,x,win,inc):
		nx = len(x)
		try:
			nwin = len(win)
		except Exception as err:
			# print err
			nwin = 1	
		if (nwin == 1):
			wlen = win
		else:
			wlen = nwin					
		#print inc,wlen,nx	
		nf = fix(1.0*(nx-wlen+inc)/inc)	#here has a bug that nf maybe less than 0	
		f = zeros((int(nf),wlen))
		indf = [inc*j for j in range(int(nf))]
		indf = (mat(indf)).T
		inds = mat(range(wlen))
		indf_tile = tile(indf,wlen)
		inds_tile = tile(inds,(int(nf),1))
		mix_tile = indf_tile + inds_tile
		for i in range(nf):
			for j in range(wlen):
				f[i,j] = x[mix_tile[i,j]]
				#print x[mix_tile[i,j]]
		if nwin>1: #TODOd
			w = win.tolist()
			#w_tile = tile(w,(int))
		return f
	# init hmm
	def inithmm(self,sample,M):
		K = len(sample)
		N0 = len(M)
		self.N = N0
		#初始概率矩阵
		hmm = hmmInfo()
		hmm.init = zeros((N0,1))
		hmm.init[0] = 1
		hmm.trans = zeros((N0,N0))
		hmm.N = N0
		#初始化转移概率矩阵
		for i in range(self.N-1):
			hmm.trans[i,i] = 0.5
			hmm.trans[i,i+1] = 0.5
		hmm.trans[self.N-1,self.N-1] = 1
		#概率密度函数的初始聚类
		#分段
		for k in range(K):
			T = len(sample[k].smpl_data[0])
			#seg0 = []
			seg0 = np.floor(arange(0,T,1.0*T/N0))
			#seg0 = int(seg0.tolist())
			seg0 = np.concatenate((seg0,[T]))
			#seg0.append(T)
			sample[k].seg = seg0
		#对属于每个状态的向量进行K均值聚类,得到连续混合正态分布
		mix = []
		for i in range(N0):
			vector = []
			for k in range(K):
				seg1 = int(sample[k].seg[i])
				seg2 = int(sample[k].seg[i+1])
				tmp = []
				tmp = sample[k].smpl_data[0][seg1:seg2][:]
				if k == 0:
					vector = np.array(tmp)
				else:
					vector = np.concatenate((vector, np.array(tmp)))
				#vector.append(tmp)
			# tmp_mix = mixInfo()
			# print id(tmp_mix)
			tmp_mix = self.get_mix(vector,M[i],mix)
			# mix.append(tmp_mix)
		hmm.mix = mix
		return hmm
	# get mix data
	def get_mix(self,vector,K,mix0):
		kmeans = KMeans(n_clusters = K,random_state=0).fit(np.array(vector))
		#计算每个聚类的标准差,对角阵,只保存对角线上的元素
		mix = mixInfo()
		var0 = []
		mean0 = []
		#ind = []
		for j in range(K):
			#ind = [i for i in kmeans.labels_ if i==j]
			ind = []
			ind1 = 0
			for i in kmeans.labels_:
				if i == j:
					ind.append(ind1)
				ind1 = ind1 + 1
			tmp = [vector[i][:] for i in ind]
			var0.append(np.std(tmp,axis=0))
			mean0.append(np.mean(tmp,axis=0))
		weight0 = zeros((K,1))
		for j in range(K):
			tmp = 0
			ind1 = 0
			for i in kmeans.labels_:
				if i == j:
					tmp = tmp + ind1
				ind1 = ind1 + 1
			weight0[j] = tmp
		weight0=weight0/weight0.sum()
		mix.Cvar = multiply(var0,var0)
		mix.Cmean = mean0
		mix.CM = K
		mix.Cweight = weight0
		mix0.append(mix)
		return mix0
	#baum-welch 算法实现函数体
	def baum(self,hmm,sample):
		mix = copy.deepcopy(hmm.mix)#高斯混合
		N = len(mix)  #HMM状态数
		K = len(sample) #语音样本数
		SIZE = shape(sample[0].smpl_data[0])[1]	#参数阶数,MFCC维数
		print '计算样本参数.....'
		c = []
		alpha = []
		beta = []
		ksai = []
		gama = []
		for k in range(K):
			c0,alpha0,beta0,ksai0,gama0 = self.getparam(hmm, sample[k].smpl_data[0])
			c.append(c0)
			alpha.append(alpha0)
			beta.append(beta0)
			ksai.append(ksai0)
			gama.append(gama0)
		# 重新估算概率转移矩阵
		print '----- 重新估算概率转移矩阵 -----'
		for i in range(N-1):
			denom = 0
			for k in range(K):
				ksai0 = ksai[k]
				tmp = ksai0[:,i,:]#ksai0[:][i][:]
				denom = denom + sum(tmp)
			for j in range(i,i+2):
				norm = 0
				for k in range(K):
					ksai0 = ksai[k]
					tmp = ksai0[:,i,j]#[:][i][j]
					norm = norm + sum(tmp)
				hmm.trans[i,j] = norm/denom
		# 重新估算发射概率矩阵,即GMM的参数
		print '----- 重新估算输出概率矩阵,即GMM的参数 -----'
		for i in range(N):
			for j in range(mix[i].CM):
				nommean = zeros((1,SIZE))
				nomvar = zeros((1,SIZE))
				denom = 0
				for k in range(K):
					gama0 = gama[k]
					T = shape(sample[k].smpl_data[0])[0]
					for t in range(T):
						x = sample[k].smpl_data[0][t][:]
						nommean = nommean + gama0[t,i,j]*x
						nomvar = nomvar + gama0[t,i,j] * (x - mix[i].Cmean[j][:])**2
						denom = denom + gama0[t,i,j]
				hmm.mix[i].Cmean[j][:] = nommean/denom
				hmm.mix[i].Cvar[j][:] = nomvar/denom
				nom = 0
				denom = 0
				#计算pdf权值
				for k in range(K):
					gama0 = gama[k]
					tmp = gama0[:,i,j]
					nom = nom + sum(tmp)
					tmp = gama0[:,i,:]
					denom = denom + sum(tmp)
				hmm.mix[i].Cweight[j] = nom/denom
		return hmm
				
	#前向-后向算法
	def getparam(self,hmm,O):
		'''给定输出序列O,计算前向概率alpha,后向概率beta
		标定系数c,及ksai,gama
		输入: O:n*d 观测序列
		输出: param: 包含各种参数的结构'''
		T = shape(O)[0]
		init = hmm.init #初始概率
		trans = copy.deepcopy(hmm.trans) #转移概率
		mix = copy.deepcopy(hmm.mix) #高斯混合
		N = hmm.N #状态数
		#给定观测序列,计算前向概率alpha
		x = O[0][:]
		alpha = zeros((T,N))
		#----- 计算前向概率alpha -----#
		for i in range(N): #t=0
			tmp = hmm.init[i] * self.mixture(mix[i],x)
			alpha[0,i] = tmp #hmm.init[i]*self.mixture(mix[i],x)
		#标定t=0时刻的前向概率
		c = zeros((T,1))
		c[0] = 1.0/sum(alpha[0][:])
		alpha[0][:] = c[0] * alpha[0][:] 
		for t in range(1,T,1): # t = 1~T
			for i in range(N):
				temp = 0.0
				for j in range(N):
					temp = temp + alpha[t-1,j]*trans[j,i]
				alpha[t,i] = temp *self.mixture(mix[i],O[t][:])
			c[t] = 1.0/sum(alpha[t][:])
			alpha[t][:] = c[t]*alpha[t][:]

		#----- 计算后向概率 -----#
		beta = zeros((T,N))
		for i in range(N): #T时刻
			beta[T-1,i] = c[T-1]
		for t in range(T-2,-1,-1):
			x = O[t+1][:]
			for i in range(N):
				for j in range(N):
					beta[t,i] = beta[t,i] + beta[t+1,j]*self.mixture(mix[j],x) * trans[i,j]
			beta[t][:] = c[t] * beta[t][:]
		# 过渡概率ksai
		ksai = zeros((T-1,N,N))
		for t in range(0,T-1):
			denom = sum(np.multiply(alpha[t][:],beta[t][:]))
			for i in range(N-1):
				for j in range(i,i+2,1):
					norm = alpha[t,i]*trans[i,j]*self.mixture(mix[j],O[t+1][:])*beta[t+1,j]
					ksai[t,i,j] = c[t]*norm/denom
		# 混合输出概率 gama
		gama = zeros((T,N,max(self.stateInfo)))
		for t in range(T):
			pab = zeros((N,1))
			for i in range(N):
				pab[i] = alpha[t,i]*beta[t,i]
			x = O[t][:]
			for i in range(N):
				prob = zeros((mix[i].CM,1))
				for j in range(mix[i].CM):
					m = mix[i].Cmean[j][:]
					v = mix[i].Cvar[j][:]
					prob[j] =  mix[i].Cweight[j] * pdf(m,v,x)
					if mix[i].Cweight[j] == 0.0:
						print pdf(m,v,x)
				tmp = pab[i]/pab.sum()
				tmp = tmp[0]
				temp_sum = prob.sum()
				for j in range(mix[i].CM):
					gama[t,i,j] = tmp*prob[j]/temp_sum
		return c,alpha,beta,ksai,gama				
	def mixture(self,mix,x):
		'''计算输出概率
		输入:mix--混合高斯结构
		x--输入向量 SIZE*1
		输出: prob--输出概率'''		

		prob = 0.0
		for i in range(mix.CM):
			m = mix.Cmean[i][:]
			v = mix.Cvar[i][:]
			w = mix.Cweight[i]
			tmp = pdf(m,v,x)
			#print tmp
			prob = prob + w * tmp #* pdf(m,v,x)
		if prob == 0.0:
			prob = 2e-100
		return prob
    #维特比算法
	def viterbi(self,hmm,O):
		'''%输入:
		hmm -- hmm模型
		O   -- 输入观察序列, N*D, N为帧数,D为向量维数
		输出:
		prob -- 输出概率
		q    -- 状态序列
		'''
		init = copy.deepcopy(hmm.init)
		trans = copy.deepcopy(hmm.trans)#hmm.trans
		mix = hmm.mix
		N = hmm.N
		T = shape(O)[0]
		#计算Log(init)
		n_init = len(init)
		for i in range(n_init):
			if init[i] <= 0:
				init[i] = -inf
			else:
				init[i]=log(init[i])
		#计算log(trans)
		m,n = shape(trans)
		for i in range(m):
			for j in range(n):
				if trans[i,j] <=0:
					trans[i,j] = -inf
				else:
					trans[i,j] = log(trans[i,j])
		#初始化
		delta = zeros((T,N))
		fai = zeros((T,N))
		q = zeros((T,1))
		#t=0
		x = O[0][:]
		for i in range(N):
			delta[0,i] = init[i] + log(self.mixture(mix[i],x))
		#t=2:T
		for t in range(1,T):
			for j in range(N):
				tmp = delta[t-1][:]+trans[:][j].T
				tmp = tmp.tolist()
				delta[t,j] = max(tmp)
				fai[t,j] = tmp.index(max(tmp))
				x = O[t][:]
				delta[t,j] = delta[t,j] + log(self.mixture(mix[j],x))
		tmp = delta[T-1][:]
		tmp = tmp.tolist()
		prob = max(tmp)
		q[T-1]=tmp.index(max(tmp))
		for t in range(T-2,-1,-1):
			q[t] = fai[t+1,int(q[t+1,0])]
		return prob




# ----------- 以下是用于测试的程序 ---------- #
#
	def vad(self,k,fs):
		'''语音信号端点检测程序
		k 	---语音信号
		fs 	---采样率
		返回语音信号的起始和终止端点'''
		k = double(k)
		k = multiply(k,1.0/max(abs(k)))

		# 计算短时过零率
		FrameLen = 240
		FrameInc = 80
		FrameTemp1 = self.enframe(k[0:-2], FrameLen, FrameInc)
		FrameTemp2 = self.enframe(k[1:], FrameLen, FrameInc)
		signs = np.sign(multiply(FrameTemp1, FrameTemp2))
		signs = map(lambda x:[[i,0] [i>0] for i in x],signs)
		signs = map(lambda x:[[i,1] [i<0] for i in x], signs)
		diffs = np.sign(abs(FrameTemp1 - FrameTemp2)-0.01)
		diffs = map(lambda x:[[i,0] [i<0] for i in x], diffs)
		zcr = sum(multiply(signs, diffs),1)
		# 计算短时能量		
		amp = sum(abs(self.enframe(signal.lfilter([1,-0.9375],1,k),FrameLen, FrameInc)),1)
		# print '短时能量%f' %amp
		# 设置门限
		print '设置门限'
		ZcrLow = max([round(mean(zcr)*0.1),3])#过零率低门限
		ZcrHigh = max([round(max(zcr)*0.1),5])#过零率高门限
		AmpLow = min([min(amp)*10,mean(amp)*0.2,max(amp)*0.1])#能量低门限
		AmpHigh = max([min(amp)*10,mean(amp)*0.2,max(amp)*0.1])#能量高门限
		# 端点检测
		MaxSilence = 8 #最长语音间隙时间
		MinAudio = 16 #最短语音时间
		Status = 0 #状态0:静音段,1:过渡段,2:语音段,3:结束段
		HoldTime = 0 #语音持续时间
		SilenceTime = 0 #语音间隙时间
		print '开始端点检测'
		StartPoint = 0
		for n in range(len(zcr)):
			if Status ==0 or Status == 1:
				if amp[n] > AmpHigh or zcr[n] > ZcrHigh:
					StartPoint = n - HoldTime
					Status = 2
					HoldTime = HoldTime + 1
					SilenceTime = 0
				elif amp[n] > AmpLow or zcr[n] > ZcrLow:
					Status = 1
					HoldTime = HoldTime + 1
				else:
					Status = 0
					HoldTime = 0
			elif Status == 2:
				if amp[n] > AmpLow or zcr[n] > ZcrLow:
					HoldTime = HoldTime + 1
				else:
					SilenceTime = SilenceTime + 1
					if SilenceTime < MaxSilence:
						HoldTime = HoldTime + 1
					elif (HoldTime - SilenceTime) < MinAudio:
						Status = 0
						HoldTime = 0
						SilenceTime = 0
					else:
						Status = 3
			elif Status == 3:
					break
			if Status == 3:
				break
		HoldTime = HoldTime - SilenceTime
		EndPoint = StartPoint + HoldTime
		return StartPoint,EndPoint												

	def recog(self,pathTop):
		N = gParam.NUM
		for i in range(N):						
			wavPath = pathTop + str(i) + '.wav'
			f = wave.open(wavPath,'rb')
			params = f.getparams()
			nchannels,sampwidth,framerate,nframes = params[:4]
			str_data = f.readframes(nframes)
			#print shape(str_data)
			f.close()
			wave_data = np.fromstring(str_data,dtype=short)/32767.0
			x1,x2 = self.vad(wave_data,framerate)
			O = self.mfcc([wave_data])
			O = O[x1-3:x2-3][:]
			print '第%d个词的观察矢量是:%d' %(i,i)
			pout = []
			for j in range(N):
				pout.append(self.viterbi(self.gmm_hmm_model[j],O))
			n = pout.index(max(pout))
			print '第%d个词,识别是%d' %(i,n)

接下来就是test.py文件:

#! /usr/bin python
# encoding:utf-8

import numpy as np
from numpy import *
import gParam
from my_hmm import gmm_hmm
my_gmm_hmm = gmm_hmm()
my_gmm_hmm.loadWav(gParam.TRAIN_DATA_PATH)
#print len(my_gmm_hmm.samples[0])
my_gmm_hmm.hmm_start_train()
my_gmm_hmm.recog(gParam.TEST_DATA_PATH)
#my_gmm_hmm.melbank(24,256,8000,0,0.5,'m')
# my_gmm_hmm.mfcc(range(17280))
#my_gmm_hmm.enframe(range(0,17280),256,80)

最后运行的结果如下图所示:

不用框架,实现非连续语音识别Demo(python源码)_第7张图片不用框架,实现非连续语音识别Demo(python源码)_第8张图片

最后:如果您想直接跑程序,您可以通过以下方式获取我的数据和源程序。由于考虑到个人的人工成本,我形式上只收取5块钱的人工费,既是对我的支持,也是对我的鼓励。谢谢大家的理解。把订单后面6位号码发送给我,我把源码和数据给您呈上。谢谢。

1:扫如下支付宝或微信二维码,支付5元

2:把支付单号的后6位,以邮件发送到我的邮箱[email protected]

3:您也可以在下方留言,把订单号写上来,我会核实。

谢谢大家。

不用框架,实现非连续语音识别Demo(python源码)_第9张图片不用框架,实现非连续语音识别Demo(python源码)_第10张图片
			


你可能感兴趣的:(机器学习算法,信号处理,语音识别+语音合成)