本文我们来讲一下语言模型之前缀搜索算法,我们知道通过声学模型神经网络出来会产生一个概率矩阵,当然根据这个矩阵,我们可以直接使用最大概率法来找到一个看似概率最大的字符链,但是,那样并不是最优的序列.由于我们这里使用的是CTC方法,CTC的主要特点是允许字符重复出现,并且最后剔除重复字符即可.例如假如一段概率矩阵为
0.7 0.3
0.6 0.4
其中每一行代表一个时间点,第一列代表空格,第二列代表字母A,如果我们选择最大概率法,那么我们得到的最优序列是(--),即两个空格;而如果我们考虑全局最优的话,P(l=A)=P(AA)+P(A-)+P(-A)=0.58,而P(-)=P(--)=0.42,则实际上序列为A的概率更大一点,因此这里选择局部概率最大的方法是不妥的.
首先我们假设字母表的个数是26+1个空格=27个.记prob为概率矩阵,时长为T,我们记gamma_blank_not[p][t]为截止时间t产生了前缀p,并且t时刻并不是以空格结尾的概率;gamma_blank[p][t]为截止时间t产生了前缀p,并且t时刻是以空格结尾的概率.注意我们的初始化规则,我们认为所有时刻的空白前缀的概率是所有blank概率累积;所有时刻空白前缀但不是以blank结尾的概率为0.
并且我们记结果为空白的概率是T时刻产生空白前缀并以空白结尾的概率,即结果为以空白为前缀的概率是1-结果为空白的概率.对应的程序为:
pr[self.blank] = gamma_blank[self.blank][T-1]
pr_prefix[self.blank] = 1 - pr[self.blank]
并且记l*,p*为空白,记P是只包含空白的集合.
以上是所有初始化过程.对应的代码即:
T = self.prob.shape[0] gamma_blank_not = dict() gamma_blank = dict() gamma_blank_not[self.blank] = np.zeros(T) gamma_blank[self.blank] = [] for t in range(0,T): sumProd = 1 for ind in range(0,t+1): sumProd = sumProd * self.prob[ind][0] gamma_blank[self.blank].append(sumProd) pr = dict() pr_prefix = dict() pr[self.blank] = gamma_blank[self.blank][T-1] pr_prefix[self.blank] = 1 - pr[self.blank] lstar = self.blank pstar = self.blank P = {self.blank}
然后遍历每个字母,每个时刻,得到所有不同的组合,添加到P集合中去,最终从中选择概率最大的那个序列.具体代码如下:
#coding=utf-8 ''' search decoding algorithm @date:2016-4-9 @author:zhang zewang ''' import numpy as np import copy class Search(object): ''' Search algorithm for automatic speech recognition ''' def __init__(self,prob,alphabetSize=26,blank='0'): self.prob = np.array(prob) self.alphabetSize = alphabetSize self.blank = blank def prefixSearch(self): ''' prefix search decoding algorithm,no n-gram used here ''' T = self.prob.shape[0] gamma_blank_not = dict() gamma_blank = dict() gamma_blank_not[self.blank] = np.zeros(T) gamma_blank[self.blank] = [] for t in range(0,T): sumProd = 1 for ind in range(0,t+1): sumProd = sumProd * self.prob[ind][0] gamma_blank[self.blank].append(sumProd) pr = dict() pr_prefix = dict() pr[self.blank] = gamma_blank[self.blank][T-1] pr_prefix[self.blank] = 1 - pr[self.blank] lstar = self.blank pstar = self.blank P = {self.blank} while pr_prefix[pstar] > pr[lstar]: probRemaining = pr_prefix[pstar] for k in range(1,self.alphabetSize+1): p = pstar+str(k) gamma_blank_not[p] = np.zeros(T) gamma_blank[p] = np.zeros(T) if pstar==self.blank: gamma_blank_not[p][0] = self.prob[0][k] else: gamma_blank_not[p][0] = 0.0 gamma_blank[p][0] = 0.0 prefixProb = gamma_blank_not[p][0] for t in range(1,T): tmp = 0 if pstar[-1]==str(k): tmp = 0 else: tmp = gamma_blank_not[pstar][t-1] newLabelProb = gamma_blank[pstar][t-1] + tmp gamma_blank_not[p][t] = self.prob[t][k] * (newLabelProb + gamma_blank_not[p][t-1]) gamma_blank[p][t] = self.prob[t][0] * (gamma_blank[p][t-1] + gamma_blank_not[p][t-1]) prefixProb = prefixProb + self.prob[t][k] * newLabelProb pr[p] = gamma_blank_not[p][T-1] + gamma_blank[p][T-1] pr_prefix[p] = prefixProb - pr[p] probRemaining = probRemaining - pr[p] if pr[p] > pr[lstar]: lstar = p if pr_prefix[p] > pr[lstar]: P.add(p) if probRemaining <= pr[lstar]: break P.remove(pstar) tmp = list(P) filtered_pr_prefix = dict((k, pr_prefix[k]) for k in tmp if k in pr_prefix) pstar = max(filtered_pr_prefix, key=filtered_pr_prefix.get) return lstar if __name__ == '__main__': s = Search([[0.9,0.05,0.05],[0.2,0.4,0.4],[0.3,0.5,0.2],[0.9,0.05,0.05],[0.2,0.6,0.2]],alphabetSize=2) lstar = s.prefixSearch() print lstar