记一次分析解决ANR过程

ANR即Application Not Responding,一直觉得只要在主线程中注意不要执行耗时操作就不会发生ANR,然而事实告诉我我还是too young too simple,sometimes naive 啊。以前也没遇到过ANR,没分析过ANR,问过同事,同事也说没弄过,没办法只能自己找资料了。

ANR问题的分类

InputDispatchingTimedOut
应用程序主线程在5秒内没有完成用户的input事件(比如按键事件、屏幕触摸事件)

Service Timeout
应用程序没有执行完成service的bind/create/start/destroy/unbind操作 前台服务20秒超时,后台服务200秒超时

Broadcast Timeout
应用程序在规定时间内没有执行完成onReceive操作 前台广播10秒超时,后台广播60秒超时(BROADCAST_FG_TIMEOUT /BROADCAST_FG_TIMEOUT)

Content Provider Timeout
应用程序在20秒内没有执行完成ContentProvider相关操作

经过我查看log信息发现
Reason: Input dispatching timed out (Waiting because the touched window has not finished processing the input events that were previously delivered to it.)
Load: 0.9 / 0.57 / 0.68
CPU usage from 2505ms to -3307ms ago:
94% 20357/com.richeninfo.cmoa: 94% user+ 0.3% kernel / faults: 1 minor
22% 810/system_server: 17% user + 5.1% kernel / faults: 1061 minor
0.5% 146/debuggerd: 0.2% user + 0.3% kernel / faults: 2717 minor
3.4% 977/com.android.systemui: 3.2% user + 0.1% kernel / faults: 11 minor
1.8% 1310/com.android.phone: 1.5% user + 0.3% kernel
1.3% 9586/adbd: 0% user + 1.3% kernel / faults: 629 minor
1% 1345/com.miui.whetstone: 1% user + 0% kernel
0.1% 1329/com.xiaomi.finddevice: 0% user + 0% kernel
0.8% 11382/kworker/0:0: 0% user + 0.8% kernel
0.8% 22403/com.mi.liveassistant: 0.1% user + 0.6% kernel
0.5% 925/com.xiaomi.xmsf: 0.3% user + 0.1% kernel
0.1% 3/ksoftirqd/0: 0% user + 0.1% kernel
0% 1322/com.xiaomi.mitunes: 0% user + 0% kernel
0% 15074/com.android.defcontainer: 0% user + 0% kernel / faults: 14 minor
0% 20675/android.process.media: 0% user + 0% kernel / faults: 14 minor
0% 15/migration/2: 0% user + 0% kernel
0.1% 37/cfinteractive: 0% user + 0.1% kernel
0% 107/dhd_watchdog_th: 0% user + 0% kernel
0% 1293/com.xiaomi.metoknlp: 0% user + 0% kernel / faults: 2 minor
0% 5809/kworker/u10:2: 0% user + 0% kernel
0% 13679/kworker/3:3: 0% user + 0% kernel
0% 13901/com.sohu.inputmethod.sogou: 0% user + 0% kernel
0.1% 20282/com.xiaomi.gamecenter: 0.1% user + 0% kernel
0.1% 26880/com.tencent.mobileqq:MSF: 0.1% user + 0% kernel / faults: 6 minor
1.2% TOTAL: 0% user + 0% kernel + 1.2% iowait + 0% softirq
CPU usage from 2756ms to 3285ms later with 99% awake:
100% 20357/com.richeninfo.cmoa: 100% user + 0% kernel
100% 20357/richeninfo.cmoa: 100% user + 0% kernel
5.6% 810/system_server: 1.8% user + 3.7% kernel
3.7% 826/ActivityManager: 1.8% user + 1.8% kernel
1.8% 969/CMEM Purge: 1.8% user + 0% kernel
0.7% 15/migration/2: 0% user + 0.7% kernel
1.1% 11382/kworker/0:0: 0% user + 1.1% kernel
2.2% TOTAL: 0% user + 0% kernel + 2.2% iowait

经过我查资料得知:

从LOG可以看出ANR的类型,CPU的使用情况,如果CPU使用量接近100%,说明当前设备很忙,有可能是CPU饥饿导致了ANR
如果CPU使用量很少,说明主线程被BLOCK了
如果IOwait很高,说明ANR有可能是主线程在进行I/O操作造成的

所以看我的这里打印出来log的信息几个关键点是
1.Reason: Input dispatching timed out
2. CPU usage from 2505ms to -3307ms ago:
94% 20357/com.richeninfo.cmoa: 94% user+ 0.3% kernel / faults: 1 minor

3.2.2% iowait

所以我这里导致ANR的原因应该是CPU不足。

仅仅查看log的信息还不足以帮我们定位到ANR的原因,所以需要去看data/anr/trace文件或者data/system/dropbox/下的log日志(真机需要root,模拟器用DDMS如果看不了的话按照这篇文章http://www.cnblogs.com/Stay-Hungry-Stay-Foolish/p/6940219.html去做就可以。)
我的trace文件信息如下:
----- pid 21029 at 2017-08-30 12:41:54 -----
Cmd line: com.richeninfo.cmoa

JNI: CheckJNI is off; workarounds are off; pins=0; globals=329

DALVIK THREADS:
(mutexes: tll=0 tsl=0 tscl=0 ghl=0)

"main" prio=5 tid=1 SUSPENDED
| group="main" sCount=1 dsCount=0 obj=0x41d6bcd8 self=0x41d5a408
| sysTid=21029 nice=0 sched=0/0 cgrp=apps handle=1074229588
| state=S schedstat=( 7063110348 1097534177 6665 ) utm=669 stm=37 core=0
at android.support.v4.view.ViewPager.populate(ViewPager.java:~1162)
at android.support.v4.view.ViewPager.setCurrentItemInternal(ViewPager.java:666)
at android.support.v4.view.ViewPager.setCurrentItemInternal(ViewPager.java:628)
at android.support.v4.view.ViewPager.setCurrentItem(ViewPager.java:620)
at com.richeninfo.cmoa.widget.AutoScrollViewPager.onTouchEvent(AutoScrollViewPager.java:219)
at android.view.View.dispatchTouchEvent(View.java:7719)
at android.view.ViewGroup.dispatchTransformedTouchEvent(ViewGroup.java:2210)
at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:1945)
at android.view.ViewGroup.dispatchTransformedTouchEvent(ViewGroup.java:2216)
at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:1959)
at android.view.ViewGroup.dispatchTransformedTouchEvent(ViewGroup.java:2216)
at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:1959)
at android.view.ViewGroup.dispatchTransformedTouchEvent(ViewGroup.java:2216)
at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:1959)
at android.view.ViewGroup.dispatchTransformedTouchEvent(ViewGroup.java:2216)
at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:1959)
at android.view.ViewGroup.dispatchTransformedTouchEvent(ViewGroup.java:2216)
at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:1959)
at android.view.ViewGroup.dispatchTransformedTouchEvent(ViewGroup.java:2216)
at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:1959)
at android.view.ViewGroup.dispatchTransformedTouchEvent(ViewGroup.java:2216)
at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:1959)
at android.view.ViewGroup.dispatchTransformedTouchEvent(ViewGroup.java:2216)
at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:1959)
at com.android.internal.policy.impl.PhoneWindow$DecorView.superDispatchTouchEvent(PhoneWindow.java:2144)
at com.android.internal.policy.impl.PhoneWindow.superDispatchTouchEvent(PhoneWindow.java:1589)
at android.app.Activity.dispatchTouchEvent(Activity.java:2582)
at com.android.internal.policy.impl.PhoneWindow$DecorView.dispatchTouchEvent(PhoneWindow.java:2092)
at android.view.View.dispatchPointerEvent(View.java:7899)
at android.view.ViewRootImpl$ViewPostImeInputStage.processPointerEvent(ViewRootImpl.java:4009)
at android.view.ViewRootImpl$ViewPostImeInputStage.onProcess(ViewRootImpl.java:3886)
at android.view.ViewRootImpl$InputStage.deliver(ViewRootImpl.java:3447)
at android.view.ViewRootImpl$InputStage.onDeliverToNext(ViewRootImpl.java:3497)
at android.view.ViewRootImpl$InputStage.forward(ViewRootImpl.java:3466)
at android.view.ViewRootImpl$AsyncInputStage.forward(ViewRootImpl.java:3573)
at android.view.ViewRootImpl$InputStage.apply(ViewRootImpl.java:3474)
at android.view.ViewRootImpl$AsyncInputStage.apply(ViewRootImpl.java:3630)
at android.view.ViewRootImpl$InputStage.deliver(ViewRootImpl.java:3447)
at android.view.ViewRootImpl$InputStage.onDeliverToNext(ViewRootImpl.java:3497)
at android.view.ViewRootImpl$InputStage.forward(ViewRootImpl.java:3466)
at android.view.ViewRootImpl$InputStage.apply(ViewRootImpl.java:3474)
at android.view.ViewRootImpl$InputStage.deliver(ViewRootImpl.java:3447)
at android.view.ViewRootImpl.deliverInputEvent(ViewRootImpl.java:5594)
at android.view.ViewRootImpl.doProcessInputEvents(ViewRootImpl.java:5574)
at android.view.ViewRootImpl.enqueueInputEvent(ViewRootImpl.java:5545)
at android.view.ViewRootImpl$WindowInputEventReceiver.onInputEvent(ViewRootImpl.java:5677)
at android.view.InputEventReceiver.dispatchInputEvent(InputEventReceiver.java:185)
at android.view.InputEventReceiver.nativeConsumeBatchedInputEvents(Native Method)
at android.view.InputEventReceiver.consumeBatchedInputEvents(InputEventReceiver.java:176)
at android.view.ViewRootImpl.doConsumeBatchedInput(ViewRootImpl.java:5650)
at android.view.ViewRootImpl$ConsumeBatchedInputRunnable.run(ViewRootImpl.java:5696)
at android.view.Choreographer$CallbackRecord.run(Choreographer.java:761)
at android.view.Choreographer.doCallbacks(Choreographer.java:574)
at android.view.Choreographer.doFrame(Choreographer.java:542)
at android.view.Choreographer$FrameDisplayEventReceiver.run(Choreographer.java:747)
at android.os.Handler.handleCallback(Handler.java:733)
at android.os.Handler.dispatchMessage(Handler.java:95)
at android.os.Looper.loop(Looper.java:136)
at android.app.ActivityThread.main(ActivityThread.java:5135)
at java.lang.reflect.Method.invokeNative(Native Method)
at java.lang.reflect.Method.invoke(Method.java:515)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:798)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:614)
at dalvik.system.NativeStart.main(Native Method)

从这些信息中首先看到线程的状态为"main" prio=5 tid=1 SUSPENDED
而经过一位朋友提示线程状态为SUSPENDED 只有在debug的时候会这样,可是我没在debug啊,所以网上查到下面资料图:

记一次分析解决ANR过程_第1张图片
thread.png

可以看到资料说这种状态通常是由于GC或者debug,所以我的情况应该是就GC了,这也验证了前面说的CPU不足的原因。
再看到trace里另外一个关键点:
com.richeninfo.cmoa.widget.AutoScrollViewPager.onTouchEvent(AutoScrollViewPager.java:219)
这里指定在我的项目中其他同事引入的第三方控件添加的onTouchEvent监听里因为某些操作频繁引起GC导致CPU不足,前面log信息打印出的:
Reason: Input dispatching timed out(Waiting because the touched window has not finished processing the input events that were previously delivered to it.)
说明这CPU不足导致无法相应下一个input events导致ANR。
那就去看看onTouchEvent里都执行了什么鬼操作。

 announcePager.setOnTouchListener(new View.OnTouchListener() {

                @Override
                public boolean onTouch(View v, MotionEvent event) {

                    switch (event.getAction()) {
                        case MotionEvent.ACTION_DOWN:
                            announcePager.stopAutoScroll();

                            break;
                        case MotionEvent.ACTION_MOVE:
                        announcePager.startAutoScroll();

                            break;
                        case MotionEvent.ACTION_UP:
                            announcePager.startAutoScroll();

                            break;

                        default:
                            break;
                    }

                    return false;
                }

            });

可以看到相应了三个action,那么就去看看startAutoScroll()stopAutoScroll()都写了什么。

public void startAutoScroll(int delayTimeInMills) {
        isAutoScroll = true;
        sendScrollMessage(delayTimeInMills);
    }

    /**
     * stop auto scroll
     */
    public void stopAutoScroll() {
        isAutoScroll = false;
        handler.removeMessages(SCROLL_WHAT);
    }

    /**
     * set the factor by which the duration of sliding animation will change
     */
    public void setScrollDurationFactor(double scrollFactor) {
        scroller.setScrollDurationFactor(scrollFactor);
    }

    private void sendScrollMessage(long delayTimeInMills) {
        /** remove messages before, keeps one message is running at most **/
        handler.removeMessages(SCROLL_WHAT);
        handler.sendEmptyMessageDelayed(SCROLL_WHAT, delayTimeInMills);
    }

看到是handler在发送消息并且每次发送之前都要把前面的消息移除。
结合我操作APP发生ANR的时机,判断问题应该是出现在action_move响应太频繁,导致频繁startAutoScroll();然后方法内部里频繁handler.removeMessages(),这样被remove的消息由于垃圾回收机制频繁引起GC,所以就导致了CPU不足,这样似乎可以验证前面的说法。

问题找到了,那就要解决,这个解决也简单,直接把action_move里的
startAutoScroll()注释掉就OK了,其实这里也不需要在action_move里执行startAutoScroll(),因为action_up里已经执行了startAutoScroll()。

这样ANR就分析解决完毕了。有了这次经验,以后相信自己能比较好应对ANR。。。认真分析trace信息和log日志.

你可能感兴趣的:(记一次分析解决ANR过程)