feature extraction technique to get a vector of a fixed length from each voice sample, the data folder contain only the features and not the actual mp3 samples (the dataset is too large, about 13GB).
If you wish to download the dataset and extract the features files (.npy files) on your own, preparation.py
is the responsible script for that, once you unzip it, put preparation.py
in the root directory of the dataset and run it.
This will take sometime to extract features from the audio files and generate new .csv files.
//特征提取技术为了从每个语音样本中获得固定长度的向量,[data](data/)文件夹只包含特征,而不包含实际的mp3样本(数据集太大,约13GB)。
如果您希望自己下载数据集并提取功能文件(.npy文件),则[preparation.py
](preparation.py)是负责此操作的脚本,解压缩后,将其放入preparation。py
在数据集的根目录中,并运行它。
这将需要一些时间从音频文件中提取功能并生成新的.csv文件。
import glob
import os
import pandas as pd
import numpy as np
import shutil
import librosa
from tqdm import tqdm
def extract_feature(file_name, **kwargs):
"""
Extract feature from audio file `file_name`
Features supported:
- MFCC (mfcc)
- Chroma (chroma)
- MEL Spectrogram Frequency (mel)
- Contrast (contrast)
- Tonnetz (tonnetz)
e.g:
`features = extract_feature(path, mel=True, mfcc=True)`
"""
mfcc = kwargs.get("mfcc")
chroma = kwargs.get("chroma")
mel = kwargs.get("mel")
contrast = kwargs.get("contrast")
tonnetz = kwargs.get("tonnetz")
X, sample_rate = librosa.core.load(file_name)
if chroma or contrast:
stft = np.abs(librosa.stft(X))
result = np.array([])
if mfcc:
mfccs = np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T, axis=0)
result = np.hstack((result, mfccs))
if chroma:
chroma = np.mean(librosa.feature.chroma_stft(S=stft, sr=sample_rate).T,axis=0)
result = np.hstack((result, chroma))
if mel:
mel = np.mean(librosa.feature.melspectrogram(X, sr=sample_rate).T,axis=0)
result = np.hstack((result, mel))
if contrast:
contrast = np.mean(librosa.feature.spectral_contrast(S=stft, sr=sample_rate).T,axis=0)
result = np.hstack((result, contrast))
if tonnetz:
tonnetz = np.mean(librosa.feature.tonnetz(y=librosa.effects.harmonic(X), sr=sample_rate).T,axis=0)
result = np.hstack((result, tonnetz))
return result
dirname = "data"
if not os.path.isdir(dirname):
os.mkdir(dirname)
csv_files = glob.glob("*.csv")
for j, csv_file in enumerate(csv_files):
print("[+] Preprocessing", csv_file)
df = pd.read_csv(csv_file)
# only take filename and gender columns
new_df = df[["filename", "gender"]]
print("Previously:", len(new_df), "rows")
# take only male & female genders (i.e droping NaNs & 'other' gender)
new_df = new_df[np.logical_or(new_df['gender'] == 'female', new_df['gender'] == 'male')]
print("Now:", len(new_df), "rows")
new_csv_file = os.path.join(dirname, csv_file)
# save new preprocessed CSV
new_df.to_csv(new_csv_file, index=False)
# get the folder name
folder_name, _ = csv_file.split(".")
audio_files = glob.glob(f"{folder_name}/{folder_name}/*")
all_audio_filenames = set(new_df["filename"])
for i, audio_file in tqdm(list(enumerate(audio_files)), f"Extracting features of {folder_name}"):
splited = os.path.split(audio_file)
# audio_filename = os.path.join(os.path.split(splited[0])[-1], splited[-1])
audio_filename = f"{os.path.split(splited[0])[-1]}/{splited[-1]}"
# print("audio_filename:", audio_filename)
if audio_filename in all_audio_filenames:
# print("Copyying", audio_filename, "...")
src_path = f"{folder_name}/{audio_filename}"
target_path = f"{dirname}/{audio_filename}"
#create that folder if it doesn't exist
if not os.path.isdir(os.path.dirname(target_path)):
os.mkdir(os.path.dirname(target_path))
features = extract_feature(src_path, mel=True)
target_filename = target_path.split(".")[0]
np.save(target_filename, features)
# shutil.copyfile(src_path, target_path)
如下图代码。这是该特征提取函数的一些解释——
MFCC:梅尔倒谱系数
Chroma:色度
MEL Spectrogram Frequency:梅尔频谱图
Contrast:对比度
Tonnetz:音高网格图
def extract_feature(file_name, **kwargs):
"""
Extract feature from audio file `file_name`
Features supported:
- MFCC (mfcc)
- Chroma (chroma)
- MEL Spectrogram Frequency (mel)
- Contrast (contrast)
- Tonnetz (tonnetz)
e.g:
`features = extract_feature(path, mel=True, mfcc=True)`
"""
**kwargs相当于是可变长度关键字参数(字典)
dict中搜索给定的键
librosa在音频、乐音信号的分析中经常用到,是python的一个工具包
特征提取的工具
load:读取文件,可以是wav、mp3等格式
Sample_Rate:模拟信号转为数字信号时的采样频率
mfcc = kwargs.get("mfcc")
chroma = kwargs.get("chroma")
mel = kwargs.get("mel")
contrast = kwargs.get("contrast")
tonnetz = kwargs.get("tonnetz")
X, sample_rate = librosa.core.load(file_name)
**
**短时傅立叶变换(STFT),返回一个复数矩阵使得D(f,t)
*复数的实部:np.abs(D(f,t))频率的振幅
np.mean:计算沿指定轴的算术平均值,返回数组元素的平均值
np.hstack:参数元组的元素数组按水平方向进行叠加
(例如:a[1 2] b[3 4] -> [1 2 3 4])
TIP:axis=0表述列 axis=1表述行
X是音频时间序列(signal)
sr是采样率
n_mfcc:要返回的MFCC数量
(每一帧的特征向量维数由参数 n_mfcc 决定,它代表了在梅尔倒谱系数计算过程中的最后一步即离散余弦变换(DCT)后取原数据的前多少个维度)
(chroma)色度频率是音乐音频有趣且强大的表示,其中整个频谱被投影到12个区间,代表音乐八度音的12个不同的半音(或色度),librosa.feature.chroma_stft 用于计算。
(MEL)频率转换为mel标度的谱图
即:创建一个滤波器组矩阵以将FFT合并成Mel频率
(FFT是一种DFT的高效算法,称为快速傅立叶变换)
((离散傅里叶变换(Discrete Fourier Transform,DFT)傅里叶分析方法是信号分析的最基本方法,傅里叶变换是傅里叶分析的核心,通过它把信号从时间域变换到频率域,进而研究信号的频谱结构和变化规律)
librosa.feature.spectral_contrast:谱对比度
librosa.feature.tonnetz:计算色调质心特征
librosa.effects:时域音频处理,如音高移动和时间拉伸,子模块还为分解子模块提供时域包装器。
harmonic:和声/泛音
if chroma or contrast:
stft = np.abs(librosa.stft(X))
result = np.array([])
if mfcc:
mfccs = np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T, axis=0)
result = np.hstack((result, mfccs))
if chroma:
chroma = np.mean(librosa.feature.chroma_stft(S=stft, sr=sample_rate).T,axis=0)
result = np.hstack((result, chroma))
if mel:
mel = np.mean(librosa.feature.melspectrogram(X, sr=sample_rate).T,axis=0)
result = np.hstack((result, mel))
if contrast:
contrast = np.mean(librosa.feature.spectral_contrast(S=stft, sr=sample_rate).T,axis=0)
result = np.hstack((result, contrast))
if tonnetz:
tonnetz = np.mean(librosa.feature.tonnetz(y=librosa.effects.harmonic(X), sr=sample_rate).T,axis=0)
result = np.hstack((result, tonnetz))
return result
1.os.path.isdir:检测该文件是否存在,如果不存在就建立
2.glob.glob:(包含一个路径信息的字符串),返回匹配 pathname 的路径名列表,返回的类型是list类型
#获取当前目录下的所有csv文件
dirname = "data"
if not os.path.isdir(dirname):
os.mkdir(dirname)
csv_files = glob.glob("*.csv")
TIP:
enumerate() 函数用于将一个可遍历的数据对象(如列表、元组或字符串)组合为一个索引序列,同时列出数据和数据下标。
np.logical.or(X1,X2)::返回X1和X2或逻辑后的布尔值。
os.path.join:连接两个或更多的路径名组件
.to_csv;index=0:对象写入一个逗号分隔值(csv)文件,不保存行索引
.split(’.’):以.为分隔符
set():去重输出
list():元组列表化
tqdm():在终端上出现一个进度条,使得代码进度可视化。os
os.path.split:按照路径将文件名和路径分开
**os.path.dirname:**去掉文件名,返回目录
** f{} **也称为“格式化字符串文字”,F字符串是开头有一个f的字符串文字,以及包含表达式的大括号将被其值替换。
.split(".")[0]:输出.前的内容(如果是【1】就是后)
only take filename and gender columns只取文件名和性别列
take only male & female genders (i.e droping NaNs & ‘other’ gender)只考虑男性和女性
save new preprocessed CSV保存新的预处理CSV
get the folder name获取文件夹名
print(“audio_filename:”, audio_filename)打印(“音频文件名:”,音频文件名)
print(“Copyying”, audio_filename, “…”)打印(“复印”,音频文件名,“…”)
create that folder if it doesn’t exist如果该文件夹不存在,请创建该文件夹
for j, csv_file in enumerate(csv_files):
print("[+] Preprocessing", csv_file)
df = pd.read_csv(csv_file)
# only take filename and gender columns
new_df = df[["filename", "gender"]]
print("Previously:", len(new_df), "rows")
# take only male & female genders (i.e droping NaNs & 'other' gender)
new_df = new_df[np.logical_or(new_df['gender'] == 'female', new_df['gender'] == 'male')]
print("Now:", len(new_df), "rows")
new_csv_file = os.path.join(dirname, csv_file)
# save new preprocessed CSV
new_df.to_csv(new_csv_file, index=False)
# get the folder name
folder_name, _ = csv_file.split(".")
audio_files = glob.glob(f"{folder_name}/{folder_name}/*")
all_audio_filenames = set(new_df["filename"])
for i, audio_file in tqdm(list(enumerate(audio_files)), f"Extracting features of {folder_name}"):
splited = os.path.split(audio_file)
# audio_filename = os.path.join(os.path.split(splited[0])[-1], splited[-1])
audio_filename = f"{os.path.split(splited[0])[-1]}/{splited[-1]}"
# print("audio_filename:", audio_filename)
if audio_filename in all_audio_filenames:
# print("Copyying", audio_filename, "...")
src_path = f"{folder_name}/{audio_filename}"
target_path = f"{dirname}/{audio_filename}"
#create that folder if it doesn't exist
if not os.path.isdir(os.path.dirname(target_path)):
os.mkdir(os.path.dirname(target_path))
features = extract_feature(src_path, mel=True)
target_filename = target_path.split(".")[0]
np.save(target_filename, features)
# shutil.copyfile(src_path, target_path)