原来参考android GitHub - gkonovalov/android-vad: Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.的kt改写java demo 可费劲了
上个月 https://github.com/snakers4/silero-vad/tree/master/examples/java-example 官方新增了例子 在java判断pcm 的静音简单了
package org.example;
import ai.onnxruntime.OrtException;
import javax.sound.sampled.*;
import java.util.Map;
public class App {
private static final String MODEL_PATH = "src/main/resources/silero_vad.onnx";
private static final int SAMPLE_RATE = 16000;
private static final float START_THRESHOLD = 0.6f;
private static final float END_THRESHOLD = 0.45f;
private static final int MIN_SILENCE_DURATION_MS = 600;
private static final int SPEECH_PAD_MS = 500;
private static final int WINDOW_SIZE_SAMPLES = 2048;
public static void main(String[] args) {
// Initialize the Voice Activity Detector
SlieroVadDetector vadDetector;
try {
vadDetector = new SlieroVadDetector(MODEL_PATH, START_THRESHOLD, END_THRESHOLD, SAMPLE_RATE, MIN_SILENCE_DURATION_MS, SPEECH_PAD_MS);
} catch (OrtException e) {
System.err.println("Error initializing the VAD detector: " + e.getMessage());
return;
}
// Set audio format
AudioFormat format = new AudioFormat(SAMPLE_RATE, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
// Get the target data line and open it with the specified format
TargetDataLine targetDataLine;
try {
targetDataLine = (TargetDataLine) AudioSystem.getLine(info);
targetDataLine.open(format);
targetDataLine.start();
} catch (LineUnavailableException e) {
System.err.println("Error opening target data line: " + e.getMessage());
return;
}
// Main loop to continuously read data and apply Voice Activity Detection
while (targetDataLine.isOpen()) {
byte[] data = new byte[WINDOW_SIZE_SAMPLES];
int numBytesRead = targetDataLine.read(data, 0, data.length);
if (numBytesRead <= 0) {
System.err.println("Error reading data from target data line.");
continue;
}
// Apply the Voice Activity Detector to the data and get the result
Map
try {
detectResult = vadDetector.apply(data, true);
} catch (Exception e) {
System.err.println("Error applying VAD detector: " + e.getMessage());
continue;
}
if (!detectResult.isEmpty()) {
System.out.println(detectResult);
}
}
// Close the target data line to release audio resources
targetDataLine.close();
}
}
运行加下onnx的 dll git 下载下
System.load("F:\\jar\\onnxruntime-win-x64-1.16.3\\lib\\onnxruntime.dll");
对应基于freeswitch 获取到的pcm数据判断静音就简单了
vadDetector.apply(data, true); 主要方法就是get float值
// Call the model to get the prediction probability of speech float speechProb = 0; try { speechProb = model.call(new float[][]{audioData}, samplingRate)[0]; } catch (OrtException e) { throw new RuntimeException(e); }
有兴趣可以到https://item.taobao.com/item.htm?id=653611115230