Introduction
Welcome to the Interspeech 2020 Far-Field Speaker Verification Challenge (FFSVC 2020).
Speaker verification is a key technology in speech processing and biometric authentication, which has broad impact on our daily lives, e.g. security, customer service, mobile devices, smart speakers. Recently, speech based human computer interaction has become more and more popular in far-field smart home and smart city applications, e.g. mobile devices, smart speakers, smart TVs, automobiles. Due to the usage of deep learning methods, the performances of speaker verification in telephone channel and close-talking microphone channel have been enhanced dramatically. However, there are still some open research questions that can be further explored for speaker verification in the far-field and complex environments, including but not limited to
• Far-field text-dependent speaker verification for wake up control
• Far-field text-independent speaker verification with complex environments
• Far-field speaker verification with cross-channel enrollment and test
• Far-field speaker verification with single multi-channel microphone array
• Far-field speaker verification with multiple distributed microphone arrays
• Far-field speaker verification with front-end speech enhancement methods
• Far-field speaker verification with end-to-end modeling using data augmentation
• Far-field speaker verification with front-end and back-end joint modeling
• Far-field speaker verification with transfer learning and domain adaptation
The FFSVC 2020 challenge is designed to boost the speaker verification research with special focus on far-field distributed microphone arrays under noisy conditions in real scenes. The objectives of this challenge are to: 1) benchmark the current speech verification technology under this challenging condition, 2) promote the development of new ideas and technologies in speaker verification, 3) provide an open, free, and large scale speech database to the community that exhibits the far-field characteristics in real scenes.
The challenge have three tasks in different scenes.
•Task 1: Far-Field Text-Dependent Speaker Verification from single microphone array
•Task 2: Far-Field Text-Independent Speaker Verification from single microphone array
•Task 3: Far-Field Text-Dependent Speaker Verification from distributed microphone arrays
All three tasks follow the cross-channel setup. The recordings of close-talking cellphone will be selected as enrollment and the recordings of far-field microphone array will be used for test.
Each registered team could participate in any one or two or all three tasks.
The Organizing Committee:
Ming Li, Duke Kunshan University (DKU)
Haizhou Li, National University of Singapore (NUS)
Shrikanth Narayanan, University of Southern California (USC)
Rohan Kumar Das, National University of Singapore (NUS)
Wei Rao, National University of Singapore (NUS)
Hui Bu, AISHELL foundation
The FFSVC20 Challenge Dataset Description
This FFSVC20 challenge database is provided by AISHELL , which has released multiple open source databases, namely AISHELL 1 ,
AISHELL 2 and HI-MIA.
This FFSVC20 challenge database is part of the AISHELL Distributed Microphone Arrays in Smart Home (DMASH) Database. The recording devices include one close-talking microphone (48kHz, 16 bit), one cellphone (48kHz, 16 bit) at 25cm distance and multiple circular microphone arrays (16kHz, 16bit, 4 out of 16 microphones, 5cm radius). The language is Chinese Mandarin. Text content include 'ni hao, mi ya' as text dependent utterances as well as other text independent ones.
The data collection setup is shown in Fig 1.Red arrow points to channel 0 of microphone arrays.
Figure1 The setup of the challenge
Each speaker visit 3 times with 7-15 days gap.
The HIMIA dataset description
The HI-MIA database includes two sub databases, which are the AISHELL- wakeup1 with 254 speakers and the AISHELL-2019B-eval with 86 speakers. The HIMIA_FFSVC2020_overlap.txtcontains the list of speakers that overlap between the HIMIA dataset and our challenge data.
The content of utterances is “ni hao, mi ya”(“你 好, 米 雅”) ’ in Mandarin. The HI-MIA database served as the benchmark data for AISHELL Speaker Verification Challenge 2019. Click here to download the HI-MIA dataset and dataset description.
Since the original audio name format of the HI-MIA database is a little bit confusing, we provide a new version of the label files (train_rename.scp and dev_rename.scp for Hi-MIA). Click here to download.
Since the audio files in the HI-MIA test set is not strictly synchronized , different channels of the same device may have slightly different audio lengths. The original synchronized test data can be downloaded here For the training and development set of HI-MIA, all channels from each microphone array are synchronized.
Data Download
The FFSVC20 challenge is completed, please email to aishell.f[email protected] with subject “Apply for the FFSVC2020 Challenge data”if you want to download the challenge data. The whole challenge data includes the training/development/evaluation data and dev/eval trial files with keys.
Evaluation Plan
This evaluation plan introduces the details of the challenge plan, the data description, the task description, the evaluation rules and so on. Please refer to the evaluation plan for more details.
Baseline Paper
Interspeech Challenge Baseline Papers:
• Xiaoyi Qin, Ming Li, Hui Bu, Wei Rao, Rohan Kumar Das, Shrikanth Narayanan, Haizhou Li,
" The Interspeech 2020 Far-Field Speaker Verification Challenge v1.pdf "
• Xiaoyi Qin, Ming Li, Hui Bu, Wei Rao, Rohan Kumar Das, Shrikanth Narayanan, Haizhou Li,
" The Interspeech 2020 Far-Field Speaker Verification Challenge v2.pdf "
System Descriptions
• AntVoice , AntVoice: The Neural Speaker Embedding System for FFSVC 2020
• Fly Speech , NPU Speaker Verification System for INTERSPEECH 2020 Far-Field Speaker Verification Challenge
• GREAT@SHU , The GREAT System for the Far-Field Speaker Verification Challenge 2020
• HCCL , The HCCL Speaker Verification System for Far-Field Speaker Verification Challenge
• hhu_lb , SYSTERM DESCRIPTION BY TEAM HHU-LB
• IBG_AI , IBG AI Speaker Recognition System for Far-Field Speaker Verification Challenge 2020
• IMU&Elevoc , IMU&Elevoc System for Far-Field Speaker Verification Challenge 2020
• NSYSU+CHT , NSYSU+CHT Speaker Verification System for Far-Field Speaker Verification Challenge 2020
• RoyalFlush , Analysis of RoyalFlush Submission in INTERSPEECH 2020 Far-Field Speaker Verification Challenge
• RRI MRC ,System description of team RRI MRC for FFSVC 2020
• STC-Innovations , STC-innovation Far-Field Speaker Verification Challenge 2020 System Description
• Tong_JD , The JD AI Speaker Verification System for the FFSVC 2020 Challenge
• try123 , The Huawei System for 2020 Far-Field Speaker Verification Challenge
• voice of R , System description of Voice_of_R
• XD_RTB , SPEAKER VERIFICATION SYSTEM FOR FAR-FIELD SPEAKER VERIFICATION CHALLENGE BY TEAM XD-RTB
• xuucas , FFSVC2020 Challenge TASK2 : x-vector based solution
• 友谊第一 , The UJS System for The Far-Field Speaker Verification Challenge 2020
Task 1: Far-Field Text-Dependent Speaker Verification from single microphone array
Training Data
The training data includes 120 speakers and each speaker has 3 visits. In each visit, there are multiple (“ni hao mi ya”) text-dependent utterances as well as multiple text-independent utterances. The recording from five recording devices for each utterance are provided for training. These five recording devices include one close-talk microphone, one 25cm distance cellphone, and three randomly selected microphone arrays (4 channels per array).
Any publicly open and freely accessible database shared on openslr.org before Feb 1st 2020 (including HI-MIA) can be used in this task.
Development Data
The Development data includes 35 speakers and each speaker has 3 visits. In each visit, there are multiple (“ni hao mi ya”) text-dependent utterances as well as multiple text-independent utterances. The recording from five recording devices for each utterance are provided. These five recording devices include one close-talk microphone, one 25cm distance cellphone, and three randomly selected microphone arrays (4 channels per array).
Evaluation Data
The evaluation data includes 80 speakers and each speaker has 3 visits. In each visit, there are multiple (“ni hao mi ya”) utterances, The recording from two recording devices for each utterance are provided. These two recording devices include one 25cm distance cellphone and one randomly selected microphone arrays (4 channels per array).
The recording from 25cm distance cellphone will be selected as enrollment and recording from single far-field microphone array will be used for test. For any true trial, the enrollment and the testing utterances are from different visits of the same speaker.
There is no overlapping among the speakers in the evaluation data in task1, task2, and task3.
Task 2: Far-Field Text-Independent Speaker Verification from single microphone array
Training Data
The same as the training data for task 1.
Development Data
The same as the development data for task 1.
Evaluation Data
The evaluation data includes 80 speakers and each speaker has 3 visits. In each visit, there are multiple text-independent utterances, The recording from two recording devices for each utterance are provided. These two recording devices include one 25cm distance cellphone and one randomly selected microphone arrays (4 channels per array).
The recording from 25cm distance cellphone will be selected as enrollment and recording from single far-field microphone array will be used for test. For any true trial, the enrollment and the testing utterances are from different visits of the same speaker.
There is no overlapping among the speakers in the evaluation data in task1, task2, and task3.
Task 3: Far-Field Text-Dependent Speaker Verification from distributed microphone arrays
Training Data
The same as the training data for task 1.
Development Data
The same as the development data for task 1.
Evaluation Data
The evaluation data includes 80 speakers and each speaker has 3 visits. In each visit, there are multiple (“ni hao mi ya”) utterances. For each utterance, its corresponding recordings from one 25cm distance cellphone and 2-4 randomly selected microphone arrays are provided. For each microphone array, the selected four microphones are equally distributed along the circle with a random start channel index to simulate the scenarios with unknown array orientation angles. (e.g. channel 0,4,8,12; channel 1,5,9,13; channel 6,10,14,2, etc.)
Recording from 25cm distance cellphone will be selected as enrollment and the recordings from 2-4 randomly selected far-field microphone arrays will be used for test. For any true trial, the enrollment and the testing utterances are from different visits of the same speaker.
There is no overlapping among the speakers in the evaluation data in task1, task2, and task3.
Important Dates / Timeline
Feb 1st | Releasing the training and development data as well as the evaluation plan |
March 5th | Releasing the evaluation data and launching the leaderboard (30% of the trials) |
March 15th April 15th | Challenge registration deadline |
March 23th May 1st | Mid-term deadline of score submission (up to 5 chances) |
March 30th May 8th | Interspeech 2020 paper submission deadline |
June15th August 1st | Final deadline of score submission(another5 chances) |
August 7th (Anywhere on Earth) | System description submission deadline |
Interspeech 2020 special session | Official results announcement |
FFSVC 2020 Website
http://2020.ffsvc.org