Python语音识别API总结

目录

  • 语音识别相关API
    • 梅尔频率倒谱系数(mfcc)
    • 声音合成
    • 语音识别

语音识别相关API

语音识别关键概念:声音的本质是震动,震动的本质是位移关于时间的函数,波形文件(.wav)中记录了不同采样时刻的位移。
通过傅里叶变换,可以将时间域的声音函数分解为一系列不同频率的正弦函数的叠加,通过频率谱线的特殊分布,建立音频内容和文本的对应关系,以此作为模型训练的基础。

梅尔频率倒谱系数(mfcc)

主要思想:提取13个特征,生成梅尔频率倒谱系数矩阵。
API:

import scipy.io.wavfile as wf
import python_speech_features as sf

# 提取采样率,特征
sample_rate,sigs = wf.read('xxx.wav')
# 生成mfcc矩阵
mfcc = sf.mfcc(sigs,sample_rate)

声音合成

案例:

import json
import numpy as np
import scipy.io.wavfile as wf
# 读取存有音频信息的json文件
with open('../data/12.json', 'r') as f:
    freqs = json.loads(f.read())

tones = [
('G5', 1.5),
('A5', 0.5),
('G5', 1.5),
('E5', 0.5),
('D5', 0.5),
('E5', 0.25),
('D5', 0.25),
('C5', 0.5),
('A4', 0.5),
('C5', 0.75)]

# 设置采样率
sample_rate = 44100
# 创建一个空数组储存合成音频信息
music = np.empty(shape=1)

for tone, duration in tones:
    times = np.linspace(0, duration, duration * sample_rate)
    sound = np.sin(2 * np.pi * freqs[tone] * times)
    music = np.append(music, sound)
music *= 2 ** 15
music = music.astype(np.int16)
wf.write('music.wav', sample_rate, music)

语音识别

基本步骤: 提取声音信息(采样率,特征),生成梅尔频率倒谱系数矩阵(mfcc),训练隐马尔科夫模型,做最后的识别。
API:

import numpy as np
import scipy.io.wavfile as wf
import python_speech_features as sf
import hmmlearn.hmm as hl

# 提取样本信息
train_x,train_y = [],[]
mfccs = np.array([])
for sound_files in files_list:
    for sound_file in sound_files:
        sample_rate,sigs = wf.read(sound_file)
        mfcc = sf.mfcc(sigs,sample_rate)
        # 将mfcc矩阵添加到mfccs中
        if len(mfccs) == 0:
            mfccs == mfcc
        else:
            mfccs = np.append(mfccs,mfcc)
    # 将mfccs矩阵列表添加到训练集中  
    train_x.append(mfccs)
# 最终的train_x len(sound_files)个特征的矩阵
# train_y存的是特征标签,比如:apple,banana,pear

# 构建并训练隐马模型
models = {}
for mfccs,label in zip(train_x,train_y):
    model = hl.GaussianHMM(
        n_components = 4, covariance_type = 'diag',
        n_iter = 1000
    )
    models[label] = model.fit(mfccs)

# 同样方法获取测试集数据
# 测试
pred_y = []
for mfccs in test_x: 
    # 验证每个模型对当前mfcc的匹配度得分
    best_score, best_label = None, None
    for label, model in models.items():
        score = model.score(mfccs)
        if (best_score is None) or (best_score < score):
            best_score = score
            best_label = label
    pred_y.append(best_label)

print(test_y)
print(pred_y)

你可能感兴趣的:(学习笔记汇总,Python,语音识别)