本文记录百度离线识别与讯飞离线语音识别,针对的是应用本身级别的。
首先下载开发架包:bdasr_V3_20180801_d6f298a.jar,这个是离在线融合的SDK ;
需要百度开放平台上去申请 API Key 和 Secret Key,这是认证权限的必备信息,只有认证通过才可以继续使用语音功能,如语音唤醒,在线识别,离线识别等等。首先需要百度开放平台上去申请 API Key 和 Secret Key,这是认证权限的必备信息,只有认证通过才可以继续使用语音功能,如语音唤醒,在线识别,离线识别等等。
而鉴权认证就是要成功获取到AccessToken(Access Token 是用户身份验证和授权的凭证),而这个认证过程需要自己在应用中编写,之后会提到。
将获取到的AppID、API Key 和 Secret Key添加到项目的配置文件中去;
AndroidManifest.xml 文件的配置:
设置权限(如果是Android 6.0 以上还需要获取动态权限,之后会说到)
设置APP_ID, APP_KEY, APP_SECRET
1.可以先准备本地词库和唤醒词库
2.基于SDK的使用:
EventManager asr = EventManagerFactory.create(context, "asr");//语音识别器
EventManager wp = EventManagerFactory.create(context, "wp");//语音唤醒器
public class RecogResultManager implements EventListener{
...........
/**
* 回调事件
*/
@Override
public void onEvent(String name, String params, byte[] data, int offset, int length) {
String logMessage = "name:" + name + "; params:" + params;
Log.d(TAG, logMessage);
switch (name) {
case SpeechConstant.CALLBACK_EVENT_ASR_READY:
// 引擎准备就绪,可以开始说话
break;
case SpeechConstant.CALLBACK_EVENT_ASR_BEGIN:
// 检测到用户的已经开始说话
break;
case SpeechConstant.CALLBACK_EVENT_ASR_END:
// 检测到用户的已经停止说话
break;
case SpeechConstant.CALLBACK_EVENT_ASR_PARTIAL:
// 临时识别结果, 长语音模式需要从此消息中取出结果
break;
case SpeechConstant.CALLBACK_EVENT_ASR_FINISH:
// 识别结束, 最终识别结果或可能的错误
break;
case SpeechConstant.CALLBACK_EVENT_ASR_LONG_SPEECH:
// 长语音
break;
case SpeechConstant.CALLBACK_EVENT_ASR_VOLUME:
//音量值
break;
case SpeechConstant.CALLBACK_EVENT_ASR_AUDIO:
if (data.length != length) {
//可能出错的地方:回调返回的数据有问题
Log.d(TAG, "internal error: asr.audio" +
" callback data length is not equal to length param");
.....
}
break;
case SpeechConstant.CALLBACK_EVENT_WAKEUP_SUCCESS:
//语音唤醒成功
break;
case SpeechConstant.CALLBACK_EVENT_WAKEUP_ERROR:
//语音唤醒失败
break;
.......
default:
break;
}
}
}
asr.registerListener(eventListener);
wp.registerListener(eventListener);
/**
- 加载唤醒词库
*/
private void loadWakeup() {
Map params = new HashMap<>();
params.put(SpeechConstant.WP_WORDS_FILE, "assets://WakeUp.bin");
//开始识别语音唤醒
mWakeup.start(params);
}
/**
- 加载离线词库
*/
private void loadWakeup() {
Map params = new LinkedHashMap<>();
//设置此参数使用与语音唤醒后进入语音识别模式
params.put(SpeechConstant.VAD, SpeechConstant.VAD_DNN);
params.put(SpeechConstant.DECODER, 2);
params.put(SpeechConstant.ASR_OFFLINE_ENGINE_GRAMMER_FILE_PATH, "assets://baidu_speech_grammar.bsg");
//设置唤醒到识别的停顿时间
if (backTrackInMs > 0) {
params.put(SpeechConstant.AUDIO_MILLS, System.currentTimeMillis() - backTrackInMs);
}
mRecognizer.cancel();
//开始进行识别
mRecognizer.start(params);
}
注:对于语音唤醒到识别有两种方案:
方案1: backTrackInMs > 0,唤醒词说完后,直接接句子,中间没有停顿,
开启回溯,连同唤醒词一起整句识别,推荐4个字 1500ms,backTrackInMs 最大 15000,即15s.
方案2: backTrackInMs = 0,唤醒词说完后,中间有停顿,不开启回溯。唤醒词识别回调后,正常开启识别。
官方demo里采用默认设置的是1500,本人demo中选择的是方案2,因为测试结果方案2的识别效果好一些,详见下面测试结果。
public void check() {
appendLogMessage("try to check appId " + appId + " ,appKey=" + appKey + " ,secretKey" + secretKey);
if (appId == null || appId.isEmpty()) {
errorMessage = "appId 为空";
fixMessage = "填写appID";
}
if (appKey == null || appKey.isEmpty()) {
errorMessage = "appKey 为空";
fixMessage = "填写appID";
}
if (secretKey == null || secretKey.isEmpty()) {
errorMessage = "secretKey 为空";
fixMessage = "secretKey";
}
try {
checkOnline();
} catch (UnknownHostException e) {
infoMessage = "无网络或者网络不连通,忽略检测 : " + e.getMessage();
} catch (Exception e) {
errorMessage = e.getClass().getCanonicalName() + ":" + e.getMessage();
fixMessage = " 重新检测appId, appKey, appSecret是否正确";
}
}
/**
* 获取并校验token
*/
public void checkOnline() throws Exception {
String urlpath = "http://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id="
+ appKey + "&client_secret=" + secretKey;
URL url = new URL(urlpath);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
conn.setConnectTimeout(1000);
InputStream is = conn.getInputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
StringBuilder result = new StringBuilder();
String line = "";
do {
line = reader.readLine();
if (line != null) {
result.append(line);
}
} while (line != null);
String res = result.toString();
if (!res.contains("audio_voice_assistant_get")) {
errorMessage = "appid:" + appId + ",没有audio_voice_assistant_get 权限,请在网页上开通\"语音识别\"能力";
fixMessage = "secretKey";
return;
}
appendLogMessage("openapi return " + res);
JSONObject jsonObject = new JSONObject(res);
String error = jsonObject.optString("error");
if (error != null && !error.isEmpty()) {
errorMessage = "appkey secretKey 错误" + ", error:" + error + ", json is" + result;
fixMessage = " 重新检测appId对应的 appKey, appSecret是否正确";
return;
}
String token = jsonObject.getString("access_token");
if (token == null || !token.endsWith("-" + appId)) {
errorMessage = "appId 与 appkey及 appSecret 不一致。
appId = " + appId + " ,token = " + token;
fixMessage = " 重新检测appId对应的 appKey, appSecret是否正确";
}
}
}
private void initPermission() {
String[] permissions = {Manifest.permission.RECORD_AUDIO,
Manifest.permission.ACCESS_NETWORK_STATE,
Manifest.permission.INTERNET,
Manifest.permission.READ_PHONE_STATE,
Manifest.permission.WRITE_EXTERNAL_STORAGE
};
ArrayList toApplyList = new ArrayList<>();
for (String perm : permissions) {
if (PackageManager.PERMISSION_GRANTED != ContextCompat.checkSelfPermission(this, perm)) {
toApplyList.add(perm);
// 进入到这里代表没有权限.
}
}
String[] tmpList = new String[toApplyList.size()];
if (!toApplyList.isEmpty()) {
ActivityCompat.requestPermissions(this, toApplyList.toArray(tmpList), 123);
}
}
@Override
public void onRequestPermissionsResult(int requestCode,
@NonNull String[] permissions,
@NonNull int[] grantResults) {
// 此处为android 6.0以上动态授权的回调,用户自行实现。
}
/**
* 语音唤醒开始
*/
public void start(Map params) {
String json = new JSONObject(params).toString();
wp.send(SpeechConstant.WAKEUP_START, json, null, 0, 0);
}
/**
* 停止语音唤醒
*/
public void stop() {
wp.send(SpeechConstant.WAKEUP_STOP, null, null, 0, 0);
}
/**
* 开始识别
*/
public void start(Map params) {
String json = new JSONObject(params).toString();
asr.send(SpeechConstant.ASR_START, json, null, 0, 0);
}
/**
* 提前结束录音等待识别结果。
*/
public void stop() {
if (!isInited) {
throw new RuntimeException("release() was called");
}
asr.send(SpeechConstant.ASR_CANCEL, "{}", null, 0, 0);
}
/**
* 取消本次识别,取消后将立即停止不会返回识别结果。
* cancel 与stop的区别是 cancel在stop的基础上,完全停止整个识别流程,
*/
public void cancel() {
if (!isInited) {
throw new RuntimeException("release() was called");
}
asr.send(SpeechConstant.ASR_CANCEL, "{}", null, 0, 0);
}
/**
* 释放语音唤醒资源
*/
public void release() {
if (wp== null) {
return;
}
stop();
wp.unregisterListener(eventListener);
wp = null;
isInited = false;
}
/**
* 释放语音识别资源
*/
public void release() {
if (asr == null) {
return;
}
cancel();
asr.send(SpeechConstant.ASR_KWS_UNLOAD_ENGINE, null, null, 0, 0);
asr.unregisterListener(eventListener);
asr = null;
isInited = false;
}
之前考虑是用数据库匹配识别结果,来找到相应的组件,但是demo里做得着急,只是通过管理activity,以activity和name为媒介来找到相应的组件,从而执行相应的动作。
public abstract class BaseActivity extends AppCompatActivity {
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
VrApplication.getInstance().addActivity(this);
}
@Override
protected void onDestroy() {
super.onDestroy();
VrApplication.getInstance().removeActivity(this);
}
}
public class VrApplication extends Application {
private static final VrApplication INSTANCE = new VrApplication();
private List mActivities;
private VrApplication() {}
public static VrApplication getInstance() {
return VrApplication.INSTANCE;
}
public void addActivity(BaseActivity activity){
if(mActivities == null){
mActivities = new ArrayList<>();
}
mActivities.add(activity);
}
public void removeActivity(BaseActivity activity){
mActivities.remove(activity);
}
public List getActivities() {
return mActivities;
}
/**
* 关闭应用后需要清空管理
*/
public void finishAll(){
for(BaseActivity activity : mActivities){
if(! activity.isFinishing()){
activity.finish();
}
}
}
}
所有的界面activity需要继承BaseActivity,初始化组件时保存起来,界面销毁时clear()。
private void initView() {
mViews = new HashMap<>();
EditText etCode = (EditText) findViewById(R.id.tv_point_code);
Button action = (Button) findViewById(R.id.btn_action);
mViews.put(etCode.getHint().toString(), etCode);
mViews.put(action.getText().toString(), action);
}
回调监听处理结果,进行确认组件去执行相应的动作:
public class RecogResultManager implements EventListener, IStatus {
...............
/**
* 回调事件
*/
@Override
public void onEvent(String name, String params, byte[] data, int offset, int length) {
switch (name) {
........
case SpeechConstant.CALLBACK_EVENT_ASR_PARTIAL:
// 临时识别结果, 长语音模式需要从此消息中取出结果
handlePartial(params, data, offset, length);
break;
case SpeechConstant.CALLBACK_EVENT_ASR_FINISH:
// 识别结束, 最终识别结果或可能的错误
handleFinish(params);
break;
........
case SpeechConstant.CALLBACK_EVENT_WAKEUP_SUCCESS:
//语音唤醒成功
handleWpSuccess(name, params);
break;
case SpeechConstant.CALLBACK_EVENT_WAKEUP_ERROR:
handleWpErro(name, params);
break;
}
}
........
}
处理语音唤醒,语音唤醒后,要切换开始进行语音识别:
private void handleWpMsg() {
Map mParams = new LinkedHashMap<>();
mParams.put(SpeechConstant.VAD, SpeechConstant.VAD_DNN);
mParams.put(SpeechConstant.DECODER, 2);
mParams.put(SpeechConstant.ASR_OFFLINE_ENGINE_GRAMMER_FILE_PATH, BSGPATH);
if (mBackTrackInMs > 0) {
mParams.put(SpeechConstant.AUDIO_MILLS, System.currentTimeMillis() - mBackTrackInMs);
}
mRecognizer.cancel();
mRecognizer.start(mParams);
}
处理语音识别结果:
private void analysData(RecogResult recogResult) {
String results = recogResult.getBestResult();
//从缓存中获取当前的Activity
BaseActivity activityInstance = UiUtil.getActivityInstance(mContext);
if (activityInstance == null) {
return;
}
Map views = activityInstance.getViews();
for (Map.Entry entry : views.entrySet()) {
if (results.contains(entry.getKey())) {
action(entry.getValue(), results);
}
}
}
执行动作:
private void action(View value, String results) {
if (value instanceof Button) {
value.performClick();
} else if (value instanceof EditText) {
((EditText) value).setText(results);
}
}
1.即使只进行离线操作,第一次也需要进行联网进行认证授权才可以进行接下来的语音识别。
2. 不用时要及时释放,不能一直占用mic资源,影响其他使用。
3. 环境嘈杂时,识别率会很低,需要依赖降噪算法。
4.本demo针对一个应用来说,通过语音识别命令,并执行操作,跳转界面,点击操作,选择操作等,有一定的局限性,因为离线时,只能识别词库中的词,所以EditText或是TextView填写值的时候,可能没有用户所说的那个值,目前只是依赖在线识别来控制这块,还有待进一步研究。
链接: https://download.csdn.net/download/weixin_44328479/10910474.