anr问题一般出现在app代码中,systemserver进程中的inputDispatcher线程会一直监听app的响应时间,如果键盘或者触摸事件超时等待5s没有响应,broadcastreceiver超时10s没有响应,或者service超时响应都会发生anr,ActivityManagerService会将anr的直接原因在aplog中打印出来,另外通知kernel往对应进程发送signal 3,将该进程的各个线程的函数堆栈信息打印出来,输出到data/anr/traces.txt中。所以分析anr问题一般主要看的就是aplog和traces.txt。具体类型分解如下:
当应用程序的窗口处于活动状态并且能够接收输入事件(例如按键事件、触摸事件等)时,系统底层上报的事件就会被InputDispatcher 分发给该应用程序。对大多数窗口而言“处于活动状态”可以理解为“能够获得焦点且已经获取焦点”,但是一些具有FLAG_NOT_FOCUSABLE 属性的窗口(设置之后window永远不会获取焦点,所以用户不能给此window发送点击事件焦点会传递给在其下面的可获取焦点的window)除外。
应用程序的主线程通过InputChannel读取输入事件并交给界面视图处理,界面视图是一个树状结构,DecorView是视图树的根,事件从树根开始一层一层向焦点控件(例如一个 Button)传递。开发者通常需要注册监听器来接收并处理事件,或者创建自定义的视图控件来处理事件。
InputDispatcher运行在system_server进程的一个子线程中,每当接收到一个新的输入事件,InputDispatcher就会检测前一个已经发给应用程序的输入时间是否已经处理完毕,如果超时,会通过一系列的回调通知WMS的notifyANR函数报告ANR发生。
需要注意的是,产生这种ANR的前提是要有输入事件,如果没有输入事件,即使主线程阻塞了也不会报告ANR。从设计的角度看,此时系统会推测用户没有关注手机,寄希望于一段时间后阻塞会自行消失,因此会暂时“隐瞒不报”。从实现的角度看,InputDispatcher没有分发事件给应用程序,当然也不会检测处理超时和报告ANR了。
此类ANR发生时的提示语是:Reason: Input dispatching timed out (Waiting because the focused window has not finished processing the input events that were previously delivered to it.)需要注意区分同为Input dispatching timed out大类的窗口获取焦点超时,这两类超时括号内的提示语是不同的。
此类ANR的超时时间在ActivityManagerService.java中定义,默认为5秒。如果有需要可以修改代码将小内存设备上的超时时间改为大于5秒。或者在某一段时间内将此参数值设置为相应合理值。
窗口获取焦点超时是用户输入事件处理超时的一种子类型,它们都由InputDispatcher向AMS上报。当应用程序的窗口处于“活动状态”并且能够接收输入事件时,系统底层上报的事件就会被InputDispatcher分发给该应用程序。如果由于某种原因,窗口迟迟不能达到“活动状态”,不能接收输入事件,此时InputDispatcher就会报出“窗口获取焦点超时”。
此类ANR发生时的提示语是:Reason: Input dispatching timed out (Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)需要注意区分同为Input dispatching timed out大类的用户输入事件处理超时,这两类超时括号内的提示语是不同的。
为了研究窗口为什么会获取焦点超时,我们需要简单了解在窗口切换过程中焦点应用和焦点窗口的切换逻辑。假设当前正处于应用A中,将要启动应用B。启动过程中焦点应用和焦点窗口转换如下:
流程开始,焦点应用是A,焦点窗口是A(的某一个窗口) ====》 当A开始OnPause流程后,焦点应用是A,焦点窗口是null ====》 在zygote创建B的进程完毕后,焦点应用是B,焦点窗口是null ====》 应用B的OnResume流程完成后,焦点应用是B,焦点窗口是B(的某一个窗口) |
在这个过程当中有两个阶段的焦点窗口是null,那么如果焦点窗口为 null 阶段的时间超过了5秒,应用就会被报告为窗口获取焦点超时类的ANR。另外这个过程当中有两个阶段的焦点窗口是null,系统报告的ANR应用不一定是真实产生ANR的应用。因此在分析窗口获取焦点超时的ANR时,一定要注意分析当前焦点应用和焦点窗口是否一致,首先要明确ANR的真正应用是哪一个,再进行进一步分析才会更有意义。
那么“焦点窗口为 null 阶段的时间超过了5秒”这种情况又是为什么会出现呢?一般由下面几个原因导致:
当应用程序主线程在执行BroadcastReceiver的onReceive方法时,超时没有执行完毕,就会报出广播超时类型的ANR。对于前台进程超时时间是10秒,后台进程超时时间是60秒。如果需要完成一项比较耗时的工作,应当通过发送Intent给应用的Service来完成,而不应长时间占用OnReceive主线程。与前两类ANR不同,系统对这类ANR不会显示对话框提示,仅在slog中输出异常信息。
此类ANR发生时的提示语是:Reason: Broadcast of Intent { act=android.intent.action.NEW_OUTGOING_CALL flg=0x10000010 cmp=com.qualcomm.location/.GpsNetInitiatedHandler$OutgoingCallReceiver (has extras) }
在小内存Android设备上,Kernel中的LowMemoryKiller会频繁地杀死一些后台应用以释放内存。如果一个应用恰好在开始执行OnReceive方法时被LMK杀死,那么在60秒后BoardcastQueue检查广播处理情况时此应用就一定会发生ANR。这种场景的关键特征是报出ANR时System.log中会显示ANR应用的PID为0。
为避免此类问题发生,提高Monkey测试首错时间,可以在BoardcastQueue中添加代码,检测广播超时ANR的PID,为0时不报ANR。
Service 的各个生命周期函数,如OnStart、OnCreate、OnStop也运行在主线程中,当这些函数超过 20 秒钟没有返回就会触发 ANR。同样对这种情况的 ANR 系统也不会显示对话框提示,仅输出 log。
此类ANR的提示语是:Reason: Executing service com.ysxj.RenHeDao/.Service.PollingService
主线程在执行 ContentProvider 相关操作时没有在规定的时间内执行完毕。log如:Reason: ContentProvider not responding。不会报告 ANR弹框。
产生这类ANR是应用启动,调用AMS.attachApplicationLocked()方法,发布启动进程的所有
ContentProvider时发生
在android5.1中相关安然提示出处如下(ActivityManagerService.java):
-
public boolean inputDispatchingTimedOut(final ProcessRecord proc,
-
final ActivityRecord activity,
final ActivityRecord parent,
-
final
boolean aboveSystem, String reason) {
-
if (checkCallingPermission(android.Manifest.permission.FILTER_EVENTS)
-
!= PackageManager.PERMISSION_GRANTED) {
-
throw
new SecurityException(
"Requires permission "
-
+ android.Manifest.permission.FILTER_EVENTS);
-
}
-
-
final String annotation;
-
if (reason ==
null) {
-
annotation =
"Input dispatching timed out";
-
}
else {
-
annotation =
"Input dispatching timed out (" + reason +
")";
-
}
-
......
-
int32_t InputDispatcher::findFocusedWindowTargetsLocked(nsecs_t currentTime,
-
const EventEntry* entry, Vector
& inputTargets, nsecs_t* nextWakeupTime) {
-
int32_t injectionResult;
-
String8 reason;
-
-
// If there is no currently focused window and no focused application
-
// then drop the event.
-
if (mFocusedWindowHandle == NULL) {
-
if (mFocusedApplicationHandle != NULL) {
-
injectionResult = handleTargetsNotReadyLocked(currentTime, entry,
-
mFocusedApplicationHandle, NULL, nextWakeupTime,
-
"Waiting because no window has focus but there is a "
-
"focused application that may eventually add a window "
-
"when it finishes starting up.");
-
goto Unresponsive;
-
}
-
-
ALOGI(
"Dropping event because there is no focused window or focused application.");
-
injectionResult = INPUT_EVENT_INJECTION_FAILED;
-
goto Failed;
-
}
-
......
-
String8 InputDispatcher::checkWindowReadyForMoreInputLocked(nsecs_t currentTime,
-
const sp
& windowHandle,
const EventEntry* eventEntry,
-
const
char* targetType) {
-
// If the window is paused then keep waiting.
-
if (windowHandle->getInfo()->paused) {
-
return String8::format(
"Waiting because the %s window is paused.", targetType);
-
}
-
-
// If the window's connection is not registered then keep waiting.
-
ssize_t connectionIndex = getConnectionIndexLocked(windowHandle->getInputChannel());
-
if (connectionIndex <
0) {
-
return String8::format(
"Waiting because the %s window's input channel is not "
-
"registered with the input dispatcher. The window may be in the process "
-
"of being removed.", targetType);
-
}
-
-
// If the connection is dead then keep waiting.
-
sp
connection = mConnectionsByFd.valueAt(connectionIndex);
-
if (connection->status != Connection::STATUS_NORMAL) {
-
return String8::format(
"Waiting because the %s window's input connection is %s."
-
"The window may be in the process of being removed.", targetType,
-
connection->getStatusLabel());
-
}
-
-
// If the connection is backed up then keep waiting.
-
if (connection->inputPublisherBlocked) {
-
return String8::format(
"Waiting because the %s window's input channel is full. "
-
"Outbound queue length: %d. Wait queue length: %d.",
-
targetType, connection->outboundQueue.count(), connection->waitQueue.count());
-
}
-
-
// Ensure that the dispatch queues aren't too far backed up for this event.
-
if (eventEntry->type == EventEntry::TYPE_KEY) {
-
// If the event is a key event, then we must wait for all previous events to
-
// complete before delivering it because previous events may have the
-
// side-effect of transferring focus to a different window and we want to
-
// ensure that the following keys are sent to the new window.
-
//
-
// Suppose the user touches a button in a window then immediately presses "A".
-
// If the button causes a pop-up window to appear then we want to ensure that
-
// the "A" key is delivered to the new pop-up window. This is because users
-
// often anticipate pending UI changes when typing on a keyboard.
-
// To obtain this behavior, we must serialize key events with respect to all
-
// prior input events.
-
if (!connection->outboundQueue.isEmpty() || !connection->waitQueue.isEmpty()) {
-
return String8::format(
"Waiting to send key event because the %s window has not "
-
"finished processing all of the input events that were previously "
-
"delivered to it. Outbound queue length: %d. Wait queue length: %d.",
-
targetType, connection->outboundQueue.count(), connection->waitQueue.count());
-
}
-
}
else {
-
// Touch events can always be sent to a window immediately because the user intended
-
// to touch whatever was visible at the time. Even if focus changes or a new
-
// window appears moments later, the touch event was meant to be delivered to
-
// whatever window happened to be on screen at the time.
-
//
-
// Generic motion events, such as trackball or joystick events are a little trickier.
-
// Like key events, generic motion events are delivered to the focused window.
-
// Unlike key events, generic motion events don't tend to transfer focus to other
-
// windows and it is not important for them to be serialized. So we prefer to deliver
-
// generic motion events as soon as possible to improve efficiency and reduce lag
-
// through batching.
-
//
-
// The one case where we pause input event delivery is when the wait queue is piling
-
// up with lots of events because the application is not responding.
-
// This condition ensures that ANRs are detected reliably.
-
if (!connection->waitQueue.isEmpty()
-
&& currentTime >= connection->waitQueue.head->deliveryTime
-
+ STREAM_AHEAD_EVENT_TIMEOUT) {
-
return String8::format(
"Waiting to send non-key event because the %s window has not "
-
"finished processing certain input events that were delivered to it over "
-
"%0.1fms ago. Wait queue length: %d. Wait queue head age: %0.1fms.",
-
targetType, STREAM_AHEAD_EVENT_TIMEOUT *
0.000001f,
-
connection->waitQueue.count(),
-
(currentTime - connection->waitQueue.head->deliveryTime) *
0.000001f);
-
}
-
}
-
return String8::empty();
-
}
-
public void appNotRespondingViaProvider(IBinder connection) {
-
enforceCallingPermission(
-
android.Manifest.permission.REMOVE_TASKS,
"appNotRespondingViaProvider()");
-
-
final ContentProviderConnection conn = (ContentProviderConnection) connection;
-
if (conn ==
null) {
-
Slog.w(TAG,
"ContentProviderConnection is null");
-
return;
-
}
-
-
final ProcessRecord host = conn.provider.proc;
-
if (host ==
null) {
-
Slog.w(TAG,
"Failed to find hosting ProcessRecord");
-
return;
-
}
-
-
final
long token = Binder.clearCallingIdentity();
-
try {
-
appNotResponding(host,
null,
null,
false,
"ContentProvider not responding");
-
}
finally {
-
Binder.restoreCallingIdentity(token);
-
}
-
}
以上表现形式上的anr其总体上可以有下面这些情况:
首先anr主要是由于应用程序的不合理设计导致,其主要由一下这些方面引入:
另外其他进程CPU占用率过高,导致当前应用进程无法抢占到CPU时间片。如文件读写频繁,io进程CPU占用率过高,导致当前应用出现ANR。
具体来说主要有一下情况:
整体来说以上几方面是由于系统原因,不能提供应用正常运行的时间保证导致。
以上ANR产生的原因及类型基本介绍完毕,随后看看如何来分析anr问题。
android anr问题分析之一