语音发生检测VAD



webrtc 的各个音频处理都很值得大家学习,


不说个人感觉最牛的aec, 就这个vad就很好!


基本实现思想是 通过把信号分为 6个频带,对各个子频带进行 噪声和语音 的高斯模型特征判决!


对不同的信号频率均降频到8k hz,内部对 16、24、32、48、做了分频

如果需要做不同信号频率的检测,需要单独做分频到8k。


判决参数均可调整:

个人新增了一个具有明显辨识度的语音信号参数:

Custom as 4

// Mode 0, Quality.
static const int16_t kOverHangMax1Q[3] = { 8, 4, 3 };
static const int16_t kOverHangMax2Q[3] = { 14, 7, 5 };
static const int16_t kLocalThresholdQ[3] = { 24, 21, 24 };
static const int16_t kGlobalThresholdQ[3] = { 57, 48, 57 };
// Mode 1, Low bitrate.
static const int16_t kOverHangMax1LBR[3] = { 8, 4, 3 };
static const int16_t kOverHangMax2LBR[3] = { 14, 7, 5 };
static const int16_t kLocalThresholdLBR[3] = { 37, 32, 37 };
static const int16_t kGlobalThresholdLBR[3] = { 100, 80, 100 };
// Mode 2, Aggressive.
static const int16_t kOverHangMax1AGG[3] = { 6, 3, 2 };
static const int16_t kOverHangMax2AGG[3] = { 9, 5, 3 };
static const int16_t kLocalThresholdAGG[3] = { 82, 78, 82 };
static const int16_t kGlobalThresholdAGG[3] = { 285, 260, 285 };
// Mode 3, Very aggressive.
static const int16_t kOverHangMax1VAG[3] = { 6, 3, 2 };
static const int16_t kOverHangMax2VAG[3] = { 9, 5, 3 };
static const int16_t kLocalThresholdVAG[3] = { 94, 94, 94 };
static const int16_t kGlobalThresholdVAG[3] = { 1100, 1050, 1100 };

// Mode 4, custom.
static const int16_t kOverHangMax1Cus[3] = { 6, 3, 2 };
static const int16_t kOverHangMax2Cus[3] = { 9, 5, 3 };
static const int16_t kLocalThresholdCus[3] = { 96, 96, 96 };
static const int16_t kGlobalThresholdCus[3] = { 1300, 1200, 1300 };

单独抽稀的vad模块源码:

https://github.com/dreamno23/vad


Demo for iOS 地址:https://github.com/dreamno23/VADTest


你可能感兴趣的:(语音发生检测VAD)