最近碰到ANR的问题,需要分析定位。可是ANR的问题是真的难受,有时候即使手握着trace.txt日志也无法看到端倪,因为一般ANR问题出现都伴随着高CPU、高内存占用,确实难以定位。
花了一些时间学习Android ANR 问题的引发和系统如何检测ANR问题,以下做个记录,方便以后追溯,好记性不如烂笔头。
ANR(App Not Respond)表示程序在一定时间内没有反应。
根本原因就是ui线程长时间无法处理消息或者处理消息时间过长。
主要分成三类
Service的监测ANR是利用定时消息处理的。
在学习Service的启动流程之后你应该知道,AMS是作为一个分发任务的角色,真正处理启动Service的是ActiveServices。
ActiveServices有一个scheduleServiceTimeoutLocked方法,当创建service时候会被调用。
// ActiveService
void scheduleServiceTimeoutLocked(ProcessRecord proc) {
if (proc.executingServices.size() == 0 || proc.thread == null) {
return;
}
Message msg = mAm.mHandler.obtainMessage(
ActivityManagerService.SERVICE_TIMEOUT_MSG);
msg.obj = proc;
mAm.mHandler.sendMessageDelayed(msg,
proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
}
在启动了service以后会不停发一个延迟消息ActivityManagerService.SERVICE_TIMEOUT_MSG
上述代码中,有两个变量:
mAm就是AcivityManagerService,他是由SystemServer启动,运行在独立线程。
mHanlder就是AcivityManagerService的Hanlder,并不运行在AMS的线程中,而是运行在
AMS启动的HandlerThread(名字是MainHandler)。
根据proc.execServiceFg 判断是前台服务还是后台服务,决定延迟时间
// ActiveServices
// 前台服务的ANR时间
static final int SERVICE_TIMEOUT = 20*1000;
// 后台服务的ANR时间
static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;
我们看AMS如何处理ActivityManagerService.SERVICE_TIMEOUT_MSG
消息
// ActiveServices
void serviceTimeout(ProcessRecord proc) {
String anrMessage = null;
synchronized(mAm) {
// 当前进程没有运行需要检测的services
if (proc.executingServices.size() == 0 || proc.thread == null) {
return;
}
final long now = SystemClock.uptimeMillis();
final long maxTime = now -
(proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
ServiceRecord timeout = null;
long nextTime = 0;
for (int i=proc.executingServices.size()-1; i>=0; i--) {
ServiceRecord sr = proc.executingServices.valueAt(i);
// 遍历service并计算service是否超过anr时间
if (sr.executingStart < maxTime) {
// 找到了一个anr的service
timeout = sr;
break;
}
if (sr.executingStart > nextTime) {
nextTime = sr.executingStart;
}
}
if (timeout != null && mAm.mLruProcesses.contains(proc)) {
Slog.w(TAG, "Timeout executing service: " + timeout);
StringWriter sw = new StringWriter();
PrintWriter pw = new FastPrintWriter(sw, false, 1024);
pw.println(timeout);
// dump出当前的service的信息,记录在sw中
timeout.dump(paw, " ");
pw.close();
mLastAnrDump = sw.toString();
mAm.mHandler.removeCallbacks(mLastAnrDumpClearer);
mAm.mHandler.postDelayed(mLastAnrDumpClearer, LAST_ANR_LIFETIME_DURATION_MSECS);
anrMessage = "executing service " + timeout.shortName;
} else {
// 没有发生anr的service,继续发送 ActivityManagerService.SERVICE_TIMEOUT_MSG延迟消息
Message msg = mAm.mHandler.obtainMessage(
ActivityManagerService.SERVICE_TIMEOUT_MSG);
msg.obj = proc;
mAm.mHandler.sendMessageAtTime(msg, proc.execServicesFg
? (nextTime+SERVICE_TIMEOUT) : (nextTime + SERVICE_BACKGROUND_TIMEOUT));
}
}
if (anrMessage != null) {
// 发生了anr,交给了AppErrors处理
mAm.mAppErrors.appNotResponding(proc, null, null, false, anrMessage);
}
}
Service的检测ANR是利用AMS的HandlerThread间隔时间发送消息、定时处理消息,如果在消息处理过程中,检测到Service处理时间超过限制就说明该Service已经ANR,记录下相关信息并交由AppError负责处理
Android的输入事件都是在native层的InputDispatcher的分发处理,包括输入事件的anr。
参考《ANR机制以及问题分析》
InputDispatcherThread是一个线程,它处理一次消息的派发
输入事件作为一个消息,需要排队等待派发,每一个Connection都维护两个队列:
* outboundQueue: 等待发送给窗口的事件。每一个新消息到来,都会先进入到此队列
* waitQueue: 已经发送给窗口的事件
publishKeyEvent完成后,表示事件已经派发了,就将事件从outboundQueue挪到了waitQueue
从java层面来说,native层发生了input的anr会调用InputManagerService的notifyANR()
方法
// Native callback.
private long notifyANR(InputApplicationHandle inputApplicationHandle,
InputWindowHandle inputWindowHandle, String reason) {
return mWindowManagerCallbacks.notifyANR(
inputApplicationHandle, inputWindowHandle, reason);
}
其中mWindowManagerCallbacks就是inputMonitor,notifyANR()
方法被调用。
// InputMonitor
/* Notifies the window manager about an application that is not responding.
* Returns a new timeout to continue waiting in nanoseconds, or 0 to abort dispatch.
* Called by the InputManager.
*/
@Override
public long notifyANR(InputApplicationHandle inputApplicationHandle,
InputWindowHandle inputWindowHandle, String reason) {
AppWindowToken appWindowToken = null;
WindowState windowState = null;
...
if (appWindowToken != null && appWindowToken.appToken != null) {
...
// 当前activity存在
final boolean abort = controller != null
&& controller.keyDispatchingTimedOut(reason,
(windowState != null) ? windowState.mSession.mPid : -1);
if (!abort) {
return appWindowToken.mInputDispatchingTimeoutNanos;
}
} else if (windowState != null) {
try {
long timeout = ActivityManager.getService().inputDispatchingTimedOut(
windowState.mSession.mPid, aboveSystem, reason);
if (timeout >= 0) {
return timeout * 1000000L; // nanoseconds
}
} catch (RemoteException ex) {
}
}
return 0;
}
上述有两种ANR情况
先看keyDispatchingTimedOut
// ActivityRecord
@Override
public boolean keyDispatchingTimedOut(String reason, int windowPid) {
ActivityRecord anrActivity;
ProcessRecord anrApp;
boolean windowFromSameProcessAsActivity;
synchronized (service) {
anrActivity = getWaitingHistoryRecordLocked();
anrApp = app;
windowFromSameProcessAsActivity =
app == null || app.pid == windowPid || windowPid == -1;
}
if (windowFromSameProcessAsActivity) {
return service.inputDispatchingTimedOut(anrApp, anrActivity, this, false, reason);
} else {
return service.inputDispatchingTimedOut(windowPid, false /* aboveSystem */, reason) < 0;
}
}
最后还是统一走到了AMS的inputDispatchingTimedOut()
public boolean inputDispatchingTimedOut(final ProcessRecord proc,
final ActivityRecord activity, final ActivityRecord parent,
final boolean aboveSystem, String reason) {
if (checkCallingPermission(android.Manifest.permission.FILTER_EVENTS)
!= PackageManager.PERMISSION_GRANTED) {
throw new SecurityException("Requires permission "
+ android.Manifest.permission.FILTER_EVENTS);
}
final String annotation;
if (reason == null) {
annotation = "Input dispatching timed out";
} else {
annotation = "Input dispatching timed out (" + reason + ")";
}
if (proc != null) {
synchronized (this) {
if (proc.debugging) {
return false;
}
if (proc.instr != null) {
Bundle info = new Bundle();
info.putString("shortMsg", "keyDispatchingTimedOut");
info.putString("longMsg", annotation);
finishInstrumentationLocked(proc, Activity.RESULT_CANCELED, info);
return true;
}
}
mHandler.post(new Runnable() {
@Override
public void run() {
mAppErrors.appNotResponding(proc, activity, parent, aboveSystem, annotation);
}
});
}
return true;
}
最后还是交由AppError进行处理
InputEvent的ANR检测逻辑是在Native层的InputDispatcher,当检测发生anr时候调用了java层的InputManagerService的notifyANR()
方法,最后还是由AppError收集信息,弹窗等
Broadcast发生ANR是因为在onReceive中处理的时间过长
跟ActiveService一样,负责真正管理广播的是BroadcastQueue
在App进程创建时候,AMS会调用sendPendingBroadcastsLocked(app),
sendPendingBroadcastsLocked()会调用processCurBroadcastLocked(),
通过app.thread.scheduleReceiver(),发送到用户进程,完成广播流程
在processCurBroadcastLocked()中会调用 setBroadcastTimeoutLocked(timeoutTime);
final void setBroadcastTimeoutLocked(long timeoutTime) {
if (! mPendingBroadcastTimeoutMessage) {
Message msg = mHandler.obtainMessage(BROADCAST_TIMEOUT_MSG, this);
mHandler.sendMessageAtTime(msg, timeoutTime);
mPendingBroadcastTimeoutMessage = true;
}
}
往mHandler发送一个BROADCAST_TIMEOUT_MSG,其实基本机制跟Service检测ANR差不多。
其中mHandler的消息处理也是在AMS创建的的HandlerThread中,跟Service的ActivityManagerService.SERVICE_TIMEOUT_MSG在同一线程中。
来看mHanlder如何处理BROADCAST_TIMEOUT_MSG的消息。
// BroadcastQueue.java
public void handleMessage(Message msg) {
switch (msg.what) {
case BROADCAST_INTENT_MSG: {
if (DEBUG_BROADCAST) Slog.v(
TAG_BROADCAST, "Received BROADCAST_INTENT_MSG");
processNextBroadcast(true);
} break;
case BROADCAST_TIMEOUT_MSG: {
synchronized (mService) {
broadcastTimeoutLocked(true);
}
} break;
}
}
最后是Broadcast的broadcastTimeoutLocked()
final void broadcastTimeoutLocked(boolean fromMsg) {
...
if (mOrderedBroadcasts.size() == 0) {
return;
}
long now = SystemClock.uptimeMillis();
BroadcastRecord r = mOrderedBroadcasts.get(0);
if (fromMsg) {
...
long timeoutTime = r.receiverTime + mTimeoutPeriod;
// 没有超时,就继续发送延迟消息
if (timeoutTime > now) {
setBroadcastTimeoutLocked(timeoutTime);
return;
}
}
...
// 到这说明出现ANR,开始收集进程信息、广播信息
r.receiverTime = now;
if (!debugging) {
r.anrCount++;
}
ProcessRecord app = null;
String anrMessage = null;
Object curReceiver;
if (r.nextReceiver > 0) {
curReceiver = r.receivers.get(r.nextReceiver-1);
r.delivery[r.nextReceiver-1] = BroadcastRecord.DELIVERY_TIMEOUT;
} else {
curReceiver = r.curReceiver;
}
Slog.w(TAG, "Receiver during timeout of " + r + " : " + curReceiver);
logBroadcastReceiverDiscardLocked(r);
if (curReceiver != null && curReceiver instanceof BroadcastFilter) {
BroadcastFilter bf = (BroadcastFilter)curReceiver;
if (bf.receiverList.pid != 0
&& bf.receiverList.pid != ActivityManagerService.MY_PID) {
synchronized (mService.mPidsSelfLocked) {
app = mService.mPidsSelfLocked.get(
bf.receiverList.pid);
}
}
} else {
app = r.curApp;
}
if (app != null) {
anrMessage = "Broadcast of " + r.intent.toString();
}
if (mPendingBroadcast == r) {
mPendingBroadcast = null;
}
// 结束当前的receiver,继续处理下一个receiver
finishReceiverLocked(r, r.resultCode, r.resultData,
r.resultExtras, r.resultAbort, false);
scheduleBroadcastsLocked();
if (!debugging && anrMessage != null) {
// 处理anr信息
mHandler.post(new AppNotResponding(app, anrMessage));
}
}
private final class AppNotResponding implements Runnable {
...
@Override
public void run() {
mService.mAppErrors.appNotResponding(mApp, null, null, false, mAnnotation);
}
跟service处理ANR的情况一样,最后交由AMS中的AppError去处理收集anr信息、弹窗等问题。
broadcast检测anr的机制基本跟service一致,利用handler发送间隔消息,在AMS的handlerThread中检测是否anr,最后也是交由appError处理
大概了解了ANR的几种情况是如何产生的,如何定位到问题呢?
采集当前cpu占用情况:
adb shell dumpsys cpuinfo
采集当前memory占用情况:
adb shell dumpsys memoryinfo
采集当前activity、service的情况:
adb shell dumpsys activity
adb shell service list
利用dumpsys对整体环境有个大致了解
检索am_anr关键字和anr关键字,如果你幸运的话是可以找到对应的anr进程
(我遇到好几次,界面显示anr可是却没有anr的logcat信息)
data/anr/traces.txt
一般来说,发生anr的进程会出现traces日志的最前面,会有线程的所有信息,包括堆栈信息。
一般anr是因为我们的程序不够健壮,借助bugly可以让我们比较快速的定位到问题,但是也不是万能的。
非常感谢这篇文章的作者,让我对ANR有一个整体的学习
https://duanqz.github.io/2015-10-12-ANR-Analysis#service