android anr 产生的类型及原因

android anr 产生的条件

android 系统中anr的本质是主线程无法响应。而导致主线程无法响应的原因大致如下:

  • 主线程请求网络资源,数据库访问或者io访问,这些操作都是耗时操作,主线程处于阻塞状态,如果超时等待,会发生anr;
  • cpu处于饥饿状态,无法让主线程运行,导致anr;
  • 其他进程或者线程占用cpu资源,无法释放资源让该主线程运行,导致anr;
  • 死锁,即主线程等待的锁正在被其它线程占用,无法释放。

anr问题一般出现在app代码中,systemserver进程中的inputDispatcher线程会一直监听app的响应时间,如果键盘或者触摸事件超时等待5s没有响应,broadcastreceiver超时10s没有响应,或者service超时响应都会发生anr,ActivityManagerService会将anr的直接原因在aplog中打印出来,另外通知kernel往对应进程发送signal 3,将该进程的各个线程的函数堆栈信息打印出来,输出到data/anr/traces.txt中。所以分析anr问题一般主要看的就是aplog和traces.txt。具体类型分解如下:

1. 输入事件处理无响应

当应用程序的窗口处于活动状态并且能够接收输入事件(例如按键事件、触摸事件等)时,系统底层上报的事件就会被InputDispatcher 分发给该应用程序。对大多数窗口而言“处于活动状态”可以理解为“能够获得焦点且已经获取焦点”,但是一些具有FLAG_NOT_FOCUSABLE 属性的窗口(设置之后window永远不会获取焦点,所以用户不能给此window发送点击事件焦点会传递给在其下面的可获取焦点的window)除外。

应用程序的主线程通过InputChannel读取输入事件并交给界面视图处理,界面视图是一个树状结构,DecorView是视图树的根,事件从树根开始一层一层向焦点控件(例如一个 Button)传递。开发者通常需要注册监听器来接收并处理事件,或者创建自定义的视图控件来处理事件。

InputDispatcher运行在system_server进程的一个子线程中,每当接收到一个新的输入事件,InputDispatcher就会检测前一个已经发给应用程序的输入时间是否已经处理完毕,如果超时,会通过一系列的回调通知WMS的notifyANR函数报告ANR发生。

需要注意的是,产生这种ANR的前提是要有输入事件,如果没有输入事件,即使主线程阻塞了也不会报告ANR。从设计的角度看,此时系统会推测用户没有关注手机,寄希望于一段时间后阻塞会自行消失,因此会暂时“隐瞒不报”。从实现的角度看,InputDispatcher没有分发事件给应用程序,当然也不会检测处理超时和报告ANR了。

此类ANR发生时的提示语是:Reason: Input dispatching timed out (Waiting because the focused window has not finished processing the input events that were previously delivered to it.)需要注意区分同为Input dispatching timed out大类的窗口获取焦点超时,这两类超时括号内的提示语是不同的。

此类ANR的超时时间在ActivityManagerService.java中定义,默认为5秒。如果有需要可以修改代码将小内存设备上的超时时间改为大于5秒。或者在某一段时间内将此参数值设置为相应合理值。

2 . get focus timeout 窗口获取焦点超时

窗口获取焦点超时是用户输入事件处理超时的一种子类型,它们都由InputDispatcher向AMS上报。当应用程序的窗口处于“活动状态”并且能够接收输入事件时,系统底层上报的事件就会被InputDispatcher分发给该应用程序。如果由于某种原因,窗口迟迟不能达到“活动状态”,不能接收输入事件,此时InputDispatcher就会报出“窗口获取焦点超时”。

此类ANR发生时的提示语是:Reason: Input dispatching timed out (Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)需要注意区分同为Input dispatching timed out大类的用户输入事件处理超时,这两类超时括号内的提示语是不同的。

为了研究窗口为什么会获取焦点超时,我们需要简单了解在窗口切换过程中焦点应用和焦点窗口的切换逻辑。假设当前正处于应用A中,将要启动应用B。启动过程中焦点应用和焦点窗口转换如下:

流程开始,焦点应用是A,焦点窗口是A(的某一个窗口) ====》 当A开始OnPause流程后,焦点应用是A,焦点窗口是null  ====》 在zygote创建B的进程完毕后,焦点应用是B,焦点窗口是null  ====》 应用B的OnResume流程完成后,焦点应用是B,焦点窗口是B(的某一个窗口)

在这个过程当中有两个阶段的焦点窗口是null,那么如果焦点窗口为 null 阶段的时间超过了5秒,应用就会被报告为窗口获取焦点超时类的ANR。另外这个过程当中有两个阶段的焦点窗口是null,系统报告的ANR应用不一定是真实产生ANR的应用。因此在分析窗口获取焦点超时的ANR时,一定要注意分析当前焦点应用和焦点窗口是否一致,首先要明确ANR的真正应用是哪一个,再进行进一步分析才会更有意义。

那么“焦点窗口为 null 阶段的时间超过了5秒”这种情况又是为什么会出现呢?一般由下面几个原因导致:

  • 应用程序创建慢。程序的OnCreate/OnStart/OnResume方法执行速度慢/存在死锁/死循环导致OnResume迟迟不能执行完毕,超时造成ANR。
  • 应用程序'OnPause'慢。对同一个应用而言,前一次OnPause执行完毕之前后一次OnResume不会执行。但不同应用之间不会互相影响。
  • 系统整体性能慢。由于系统性能原因,如CPU占用率高/平均等待队列长/内存碎片化/页错误高/GC慢/用户空间冻结/进程陷入不可打断的睡眠,会造成整体运行慢使ANR频繁发生。
  • 'WMS'异常。由于4.4上存在的原生Bug,有时应用OnResume执行完毕后8秒焦点仍然不会转换。导致ANR发生。

3 . Broadcast timeout 广播接收处理超时

当应用程序主线程在执行BroadcastReceiver的onReceive方法时,超时没有执行完毕,就会报出广播超时类型的ANR。对于前台进程超时时间是10秒,后台进程超时时间是60秒。如果需要完成一项比较耗时的工作,应当通过发送Intent给应用的Service来完成,而不应长时间占用OnReceive主线程。与前两类ANR不同,系统对这类ANR不会显示对话框提示,仅在slog中输出异常信息。

此类ANR发生时的提示语是:Reason: Broadcast of Intent  { act=android.intent.action.NEW_OUTGOING_CALL  flg=0x10000010 cmp=com.qualcomm.location/.GpsNetInitiatedHandler$OutgoingCallReceiver (has extras) }

在小内存Android设备上,Kernel中的LowMemoryKiller会频繁地杀死一些后台应用以释放内存。如果一个应用恰好在开始执行OnReceive方法时被LMK杀死,那么在60秒后BoardcastQueue检查广播处理情况时此应用就一定会发生ANR。这种场景的关键特征是报出ANR时System.log中会显示ANR应用的PID为0。

为避免此类问题发生,提高Monkey测试首错时间,可以在BoardcastQueue中添加代码,检测广播超时ANR的PID,为0时不报ANR。

4.Service Timeout 服务超时

Service 的各个生命周期函数,如OnStart、OnCreate、OnStop也运行在主线程中,当这些函数超过 20 秒钟没有返回就会触发 ANR。同样对这种情况的 ANR 系统也不会显示对话框提示,仅输出 log。

此类ANR的提示语是:Reason: Executing service com.ysxj.RenHeDao/.Service.PollingService

5.ContentProvider执行超时

主线程在执行 ContentProvider 相关操作时没有在规定的时间内执行完毕。log如:Reason: ContentProvider not responding。不会报告 ANR弹框。

产生这类ANR是应用启动,调用AMS.attachApplicationLocked()方法,发布启动进程的所有
ContentProvider时发生

在android5.1中相关安然提示出处如下(ActivityManagerService.java):


 
   
   
   
   
  1. public boolean inputDispatchingTimedOut(final ProcessRecord proc,
  2. final ActivityRecord activity, final ActivityRecord parent,
  3. final boolean aboveSystem, String reason) {
  4. if (checkCallingPermission(android.Manifest.permission.FILTER_EVENTS)
  5. != PackageManager.PERMISSION_GRANTED) {
  6. throw new SecurityException( "Requires permission "
  7. + android.Manifest.permission.FILTER_EVENTS);
  8. }
  9. final String annotation;
  10. if (reason == null) {
  11. annotation = "Input dispatching timed out";
  12. } else {
  13. annotation = "Input dispatching timed out (" + reason + ")";
  14. }
  15. ......

 
   
   
   
   
  1. int32_t InputDispatcher::findFocusedWindowTargetsLocked(nsecs_t currentTime,
  2. const EventEntry* entry, Vector& inputTargets, nsecs_t* nextWakeupTime) {
  3. int32_t injectionResult;
  4. String8 reason;
  5. // If there is no currently focused window and no focused application
  6. // then drop the event.
  7. if (mFocusedWindowHandle == NULL) {
  8. if (mFocusedApplicationHandle != NULL) {
  9. injectionResult = handleTargetsNotReadyLocked(currentTime, entry,
  10. mFocusedApplicationHandle, NULL, nextWakeupTime,
  11. "Waiting because no window has focus but there is a "
  12. "focused application that may eventually add a window "
  13. "when it finishes starting up.");
  14. goto Unresponsive;
  15. }
  16. ALOGI( "Dropping event because there is no focused window or focused application.");
  17. injectionResult = INPUT_EVENT_INJECTION_FAILED;
  18. goto Failed;
  19. }
  20. ......

 
   
   
   
   
  1. String8 InputDispatcher::checkWindowReadyForMoreInputLocked(nsecs_t currentTime,
  2. const sp& windowHandle, const EventEntry* eventEntry,
  3. const char* targetType) {
  4. // If the window is paused then keep waiting.
  5. if (windowHandle->getInfo()->paused) {
  6. return String8::format( "Waiting because the %s window is paused.", targetType);
  7. }
  8. // If the window's connection is not registered then keep waiting.
  9. ssize_t connectionIndex = getConnectionIndexLocked(windowHandle->getInputChannel());
  10. if (connectionIndex < 0) {
  11. return String8::format( "Waiting because the %s window's input channel is not "
  12. "registered with the input dispatcher. The window may be in the process "
  13. "of being removed.", targetType);
  14. }
  15. // If the connection is dead then keep waiting.
  16. sp connection = mConnectionsByFd.valueAt(connectionIndex);
  17. if (connection->status != Connection::STATUS_NORMAL) {
  18. return String8::format( "Waiting because the %s window's input connection is %s."
  19. "The window may be in the process of being removed.", targetType,
  20. connection->getStatusLabel());
  21. }
  22. // If the connection is backed up then keep waiting.
  23. if (connection->inputPublisherBlocked) {
  24. return String8::format( "Waiting because the %s window's input channel is full. "
  25. "Outbound queue length: %d. Wait queue length: %d.",
  26. targetType, connection->outboundQueue.count(), connection->waitQueue.count());
  27. }
  28. // Ensure that the dispatch queues aren't too far backed up for this event.
  29. if (eventEntry->type == EventEntry::TYPE_KEY) {
  30. // If the event is a key event, then we must wait for all previous events to
  31. // complete before delivering it because previous events may have the
  32. // side-effect of transferring focus to a different window and we want to
  33. // ensure that the following keys are sent to the new window.
  34. //
  35. // Suppose the user touches a button in a window then immediately presses "A".
  36. // If the button causes a pop-up window to appear then we want to ensure that
  37. // the "A" key is delivered to the new pop-up window. This is because users
  38. // often anticipate pending UI changes when typing on a keyboard.
  39. // To obtain this behavior, we must serialize key events with respect to all
  40. // prior input events.
  41. if (!connection->outboundQueue.isEmpty() || !connection->waitQueue.isEmpty()) {
  42. return String8::format( "Waiting to send key event because the %s window has not "
  43. "finished processing all of the input events that were previously "
  44. "delivered to it. Outbound queue length: %d. Wait queue length: %d.",
  45. targetType, connection->outboundQueue.count(), connection->waitQueue.count());
  46. }
  47. } else {
  48. // Touch events can always be sent to a window immediately because the user intended
  49. // to touch whatever was visible at the time. Even if focus changes or a new
  50. // window appears moments later, the touch event was meant to be delivered to
  51. // whatever window happened to be on screen at the time.
  52. //
  53. // Generic motion events, such as trackball or joystick events are a little trickier.
  54. // Like key events, generic motion events are delivered to the focused window.
  55. // Unlike key events, generic motion events don't tend to transfer focus to other
  56. // windows and it is not important for them to be serialized. So we prefer to deliver
  57. // generic motion events as soon as possible to improve efficiency and reduce lag
  58. // through batching.
  59. //
  60. // The one case where we pause input event delivery is when the wait queue is piling
  61. // up with lots of events because the application is not responding.
  62. // This condition ensures that ANRs are detected reliably.
  63. if (!connection->waitQueue.isEmpty()
  64. && currentTime >= connection->waitQueue.head->deliveryTime
  65. + STREAM_AHEAD_EVENT_TIMEOUT) {
  66. return String8::format( "Waiting to send non-key event because the %s window has not "
  67. "finished processing certain input events that were delivered to it over "
  68. "%0.1fms ago. Wait queue length: %d. Wait queue head age: %0.1fms.",
  69. targetType, STREAM_AHEAD_EVENT_TIMEOUT * 0.000001f,
  70. connection->waitQueue.count(),
  71. (currentTime - connection->waitQueue.head->deliveryTime) * 0.000001f);
  72. }
  73. }
  74. return String8::empty();
  75. }

 
   
   
   
   
  1. public void appNotRespondingViaProvider(IBinder connection) {
  2. enforceCallingPermission(
  3. android.Manifest.permission.REMOVE_TASKS, "appNotRespondingViaProvider()");
  4. final ContentProviderConnection conn = (ContentProviderConnection) connection;
  5. if (conn == null) {
  6. Slog.w(TAG, "ContentProviderConnection is null");
  7. return;
  8. }
  9. final ProcessRecord host = conn.provider.proc;
  10. if (host == null) {
  11. Slog.w(TAG, "Failed to find hosting ProcessRecord");
  12. return;
  13. }
  14. final long token = Binder.clearCallingIdentity();
  15. try {
  16. appNotResponding(host, null, null, false, "ContentProvider not responding");
  17. } finally {
  18. Binder.restoreCallingIdentity(token);
  19. }
  20. }

以上表现形式上的anr其总体上可以有下面这些情况:

首先anr主要是由于应用程序的不合理设计导致,其主要由一下这些方面引入:

  • 调用thread的join()方法、sleep()方法、wait()方法或者其他线程持有锁或者其它线程终止或崩溃导致主线程等待超时;
  • service binder的数量达到上限,system server中发生WatchDog ANR,service忙导致超时无响应
  • 在主线程中做了非常耗时的操作:像耗时的网络访问,大量的数据读写,数据库操作,硬件操作(比如camera),耗时的计算如操作位图;

另外其他进程CPU占用率过高,导致当前应用进程无法抢占到CPU时间片。如文件读写频繁,io进程CPU占用率过高,导致当前应用出现ANR。

具体来说主要有一下情况:

  • 应用使用外设的有问题的驱动导致运行不稳定最终在应用层出现anr问题。
  • Kernel将用户空间冻结导致任何程序都不能执行
  • I/O吞吐量低下导致应用程序长时间等待I/O
  • HAL层实时进程长时间占用CPU导致调度队列过长
  • AMS原生Bug导致系统焦点不能正确转换

整体来说以上几方面是由于系统原因,不能提供应用正常运行的时间保证导致。

注意以下方面以避免ANR

  • 避免在主线程进行复杂耗时的操作,特别是文件读取或者数据库操作;
  • 避免频繁实时更新UI;
  • BroadCastReceiver 要进行复杂操作的的时候,可以在onReceive()方法中启动一个Service来处理;
  • 避免在IntentReceiver里启动一个Activity,因为它会创建一个新的画面,并从当前用户正在运行的程序上抢夺焦点。如果你的应用程序在响应Intent广 播时需要向用户展示什么,你应该使用Notification Manager来实现。
  • 在设计及代码编写阶段避免出现出现同步/死锁或者错误处理不恰当等情况。

以上ANR产生的原因及类型基本介绍完毕,随后看看如何来分析anr问题。

android anr问题分析之一

你可能感兴趣的:(android)