AISHELL-4 多通道中文会议语音数据库


The AISHELL-4 is a sizable real-recorded Mandarin speech dataset collected by 8-channel circular microphone array for speech processing in conference scenario. The dataset consists of 211 recorded meeting sessions, each containing 4 to 8 speakers, with a total length of 120 hours. This dataset aims to bride the advanced research on multi-speaker processing and the practical application scenario in three aspects. With real recorded meetings, AISHELL-4 provides realistic acoustics and rich natural speech characteristics in conversation such as short pause, speech overlap, quick speaker turn, noise, etc. Meanwhile, the accurate transcription and speaker voice activity are provided for each meeting in AISHELL-4. This allows the researchers to explore different aspects in meeting processing, ranging from individual tasks such as speech front-end processing, speech recognition and speaker diarization, to multi-modality modeling and joint optimization of relevant tasks. We also release a PyTorch-based training and evaluation framework as baseline system to promote reproducible research in this field.

AISHELL-4 多通道中文会议语音数据库_第1张图片

 120 小时 丨 120 Hours

211 场会议 丨 211 Meeting Sessions

10个 会议室 丨 10 Meeting Rooms

60 人 丨 60 Speakers

 Speech front-end processing

Speech Recognition

Speaker Diarization

AISHELL-4 多通道中文会议语音数据库_第2张图片


Open Source

AISHELL-4 is part of the AISHELL-ASR0055 Corpus

AISHELL-4 多通道中文会议语音数据库_第3张图片

 The setup of the recording environment.

20 个会议室 丨 20 Meeting Rooms

639 场会议 丨 639 Meeting Sessions

370 小时/单通道 丨 370 Hours/Single Channel

162 人 丨 162 Speakers
