VoxSRC 2020 基准模型和开发工具

VoxSRC 2020

  • 竞赛链接:http://www.robots.ox.ac.uk/~vgg/data/voxceleb/competition2020.html
  • Baseline Codebases:https://github.com/clovaai/voxceleb_trainer
  • Development toolkit:https://github.com/a-nagrani/VoxSRC2020

基准测试

基准来自 voxceleb_trainer 项目的已训练模型,在新的 dev 数据集上的结果为:

  • Cosine: 6.7480% with Threshold 0.4959.
  • 2-norm: 6.7541% with threshold -1.0027.

具体测试过程可下载该 Notebook to Html 文件:https://github.com/mechanicalsea/voxsrc2020/blob/master/Baseline.html

开发工具箱

考虑到 voxceleb_trainer 代码相对繁琐,作者从中提取并修改的部分内容,建立了一个便于数据增益和模型设计的工具包:

  • 链接:https://github.com/mechanicalsea/voxsrc2020/blob/master/base.py
  • 案例:
if __name__ == "__main__":
    # 定义训练集、测试集及其两者的根目录
    trainlst = "/workspace/rwang/voxceleb/train_list.txt"
    testlst = "/workspace/rwang/VoxSRC2020/data/verif/trials.txt"
    traindir = "/workspace/rwang/voxceleb/voxceleb2/"
    testdir = "/workspace/rwang/voxceleb/"
    maptrain5994 = "/workspace/rwang/competition/voxsrc2020/maptrain5994.txt"
    # 载入训练集
    train = load_train(trainlst=trainlst, traindir=traindir,
                       maptrain5994=maptrain5994)
    # 载入测试集
    trial = load_trial(testlst=testlst, testdir=testdir)
    # 定义说话人嵌入提取模型
    net = ResNetSE34L(nOut=512, num_filters=[16, 32, 64, 128])
    # 定义顶层分类器模型
    top = AMSoftmax(in_feats=512, n_classes=5994, m=0.2, s=30)
    # sklearn 模型生成
    snet = SpeakerNet(net=net, top=top)
    # 模型训练
    modelst, step_num, loss, prec1, prec5 = snet.train(train, num_epoch=1)
    # 模型评估
    eer, thresh, all_scores, all_labels, all_trials, trials_feat = snet.eval(
        trial, step_num=0, trials_feat=None)

欢迎关注,欢迎交流。

你可能感兴趣的:(说话人识别,Python,voxsrc,说话人识别,pytorch)