语音识别入门第二节:语音信号处理及特征提取(实战篇)

练习基础代码(包括音频文件、音频文件读取代码、预加重代码、分帧加窗代码、快速傅里叶变换代码)可从Github中获取,链接如下:https://github.com/nwpuaslp/ASR_Course.git。

代码中preemphasis函数为预加重,enframe函数为分帧加窗,get_spectrum为快速傅里叶变换。

预加重主要代码如下:

    np.append(signal[0], signal[1:] - coeff * signal[:-1])

分帧加窗主要代码如下(此处使用汉明窗):

    num_samples = signal.size
    num_frames = np.floor((num_samples - frame_len) / frame_shift)+1
    frames = np.zeros((int(num_frames),frame_len))
    for i in range(int(num_frames)):
        frames[i,:] = signal[i*frame_shift:i*frame_shift + frame_len] 
        frames[i,:] = frames[i,:] * win

快速傅里叶变换代码如下:

    cFFT = np.fft.fft(frames, n=fft_len)
    valid_len = int(fft_len / 2 ) + 1
    spectrum = np.abs(cFFT[:,0:valid_len])

获取Fbank特征主要代码如下:

    low_freq_mel = 0
    high_freq_mel = 2595 * np.log10(1 + ((fs / 2) / 700))
    mel_points = np.linspace(low_freq_mel, high_freq_mel, num_filter + 2)

    hz_points = 700 * (10 ** (mel_points / 2952) - 1)

    feats = np.zeros((num_filter, int(fft_len / 2 + 1)))
    bin = (hz_points / (fs / 2)) * (fft_len / 2)

    for i in range(1, num_filter + 1):
        low = int(bin[i - 1])
        center = int(bin[i])
        high = int(bin[i + 1])
        for j in range(low, center):
            feats[i - 1][j] = (j - bin[i - 1]) / (bin[i] - bin[i - 1])
        for j in range(center, high):
            feats[i - 1][j] = (bin[i + 1] - j) / (bin[i + 1] - bin[i])

    fbank = np.dot(spectrum, feats.T)
    fbank = np.where(fbank == 0, np.finfo(float).eps, fbank)
    fbank = 20 * np.log10(fbank)

获取MFCC12维特征主要代码如下:

    feats = dct(fbank, type=2, axis=1, norm='ortho')[:, 1:(num_mfcc + 1)]

更多详细内容可参考:https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html

你可能感兴趣的:(语音识别入门,语音识别)