基准来自 voxceleb_trainer 项目的已训练模型,在新的 dev 数据集上的结果为:
6.7480%
with Threshold 0.4959
.6.7541%
with threshold -1.0027
.具体测试过程可下载该 Notebook to Html 文件:https://github.com/mechanicalsea/voxsrc2020/blob/master/Baseline.html
考虑到 voxceleb_trainer 代码相对繁琐,作者从中提取并修改的部分内容,建立了一个便于数据增益和模型设计的工具包:
if __name__ == "__main__":
# 定义训练集、测试集及其两者的根目录
trainlst = "/workspace/rwang/voxceleb/train_list.txt"
testlst = "/workspace/rwang/VoxSRC2020/data/verif/trials.txt"
traindir = "/workspace/rwang/voxceleb/voxceleb2/"
testdir = "/workspace/rwang/voxceleb/"
maptrain5994 = "/workspace/rwang/competition/voxsrc2020/maptrain5994.txt"
# 载入训练集
train = load_train(trainlst=trainlst, traindir=traindir,
maptrain5994=maptrain5994)
# 载入测试集
trial = load_trial(testlst=testlst, testdir=testdir)
# 定义说话人嵌入提取模型
net = ResNetSE34L(nOut=512, num_filters=[16, 32, 64, 128])
# 定义顶层分类器模型
top = AMSoftmax(in_feats=512, n_classes=5994, m=0.2, s=30)
# sklearn 模型生成
snet = SpeakerNet(net=net, top=top)
# 模型训练
modelst, step_num, loss, prec1, prec5 = snet.train(train, num_epoch=1)
# 模型评估
eer, thresh, all_scores, all_labels, all_trials, trials_feat = snet.eval(
trial, step_num=0, trials_feat=None)
欢迎关注,欢迎交流。