ANR监控问题

最近实习公司给了个任务调研一下Android App的ANR监控问题。为了方便自己总结记录,把整个学习过程都在博客中记录一下。

文章目录

    • 前期调研
    • ANR-WatchDog
      • 工作原理
      • testapp
      • 项目源码
        • `ANRError.java`
        • `ANRWatchDog.java`
        • `ANRWatchDog.run()`
        • ANR-WatchDog如何检测栈信息

前期调研

首先本人读了一下这篇简书博客《Android ANR监测诊断以及解决办法》1,看标题还是很符合我们的需求的,我先读读看哈,瞧瞧有没有帮助:

  1. Android Vitals:向你警告ANR问题的发生。上架google play才可以。
  2. 最经常发生ANR的几种情况:
    • 在主线程中执行IO操作
    • 在主线程执行长时间的计算
    • 主线程执行同步Binder操作访问另一个进程,该进程执行很长时间再返回
    • 非主线程持有lock,导致主线程等待lock超时
    • 主线程和另一个线程发生死锁,可以是位于当前进程或者通过Binder调用。
  3. Strict mode2:严格模式,一个debug时候可以用来严格监控的mode,不太适合我们这里。
  4. TraceView3:就是一个可以利用DDMS等工具采集APP信息之后进行分析的Android开发时的一个数据分析工具,更适合没有源代码时候的使用
  5. 拉取traces文件4:Android系统每次发生ANR后,都会在/data/anr/目录下面输出一个traces.txt文件,这个文件记录了发生问题进程的虚拟机相关信息和线程的堆栈信息。【感觉这个东西还是比较靠谱的,之后可以考虑一下】

刚刚这个文章好像简单介绍了一下,为了深入了解,在具体研究方法前我又看了这篇搜狐上的文章《Android ANR监测方案解析》5,在这里看看,主要有下面几个小结:

  • Service与Bradcast只会打印trace信息,不会提示用户ANR弹窗,大部分可感知的ANR都是由于InputEvent。
  • Android应用程序是通过消息来驱动的,Android某种意义上也可以说成是一个以消息驱动的系统,UI、事件、生命周期都和消息处理机制息息相关。Android的ANR监测方案也是一样,大部分就是利用了Android的消息机制。
  • 目前流行的ANR检测方案有开源的BlockCanaryANR-WatchDogSafeLooper, 还有根据谷歌原生系统接口监测的方案:FileObserver。下面就针对这四种方案根据场景解析对比。
  • 后面这篇文章5就是具体的分析了一下这么几个开源项目的优劣,我们呢这篇博客里面主要也一次讨论一下这几个ANR检测方案的原理。

后来我又看了一眼,Google上有一个issue entry-Possibility to detect ANR dialogs from application的帖子讨论这个ANR能否被正常追踪,我们也来一起看一下哈:

  1. We have great solutions like BugSense or Crittercism for crashes, but for ANRs there is no way to be notified by the system when it detects ANR dialog.
  2. (后面基本上都是在安利ANR-WatchDog了……),所以最后还是要转过去看一下ANR-WatchDog的实现

ANR-WatchDog

开源项目地址:ANR-WatchDog项目

工作原理

监视器本身只是一个循环执行下面操作的简单线程:

  1. 计划让一个runnable在主线程运行,一旦主线程允许,立刻执行。
  2. 等待5秒. (默认5s,可以配置).
  3. 如果runnable正常执行,回到步骤1.
  4. 如果runnable没有正常运行,意味着UI线程至少阻塞了5s,唤起一个线程栈追踪的error。

testapp

项目源码附带了一个testapp module,修改了一些基本配置以后运行,就能看到ANR-Watchdog阻止了ANR窗口的弹出,相反是弹出了一个错误而crash闪退。查看log日志看到:

2019-01-07 16:19:48.839 1612-1645/anrwatchdog.github.com.testapp E/AndroidRuntime: FATAL EXCEPTION: |ANR-WatchDog|
    Process: anrwatchdog.github.com.testapp, PID: 1612
    com.github.anrwatchdog.ANRError: Application Not Responding
    Caused by: com.github.anrwatchdog.ANRError$$$_Thread: main (state = TIMED_WAITING)
        at java.lang.Thread.sleep(Native Method)
        at java.lang.Thread.sleep(Thread.java:386)
        at java.lang.Thread.sleep(Thread.java:327)
        at com.github.anrtestapp.MainActivity.SleepAMinute(MainActivity.java:18)
        at com.github.anrtestapp.MainActivity.access$100(MainActivity.java:12)
        at com.github.anrtestapp.MainActivity$2.onClick(MainActivity.java:63)
        at android.view.View.performClick(View.java:6291)
        at android.view.View$PerformClick.run(View.java:24931)
        at android.os.Handler.handleCallback(Handler.java:808)
        at android.os.Handler.dispatchMessage(Handler.java:101)
        at android.os.Looper.loop(Looper.java:166)
        at android.app.ActivityThread.main(ActivityThread.java:7529)
        at java.lang.reflect.Method.invoke(Native Method)
        at com.android.internal.os.Zygote$MethodAndArgsCaller.run(Zygote.java:245)
        at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:921)
     Caused by: com.github.anrwatchdog.ANRError$$$_Thread: FinalizerDaemon (state = WAITING)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:422)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:188)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:209)
        at java.lang.Daemons$FinalizerDaemon.runInternal(Daemons.java:235)
        at java.lang.Daemons$Daemon.run(Daemons.java:103)
        at java.lang.Thread.run(Thread.java:784)
     Caused by: com.github.anrwatchdog.ANRError$$$_Thread: FinalizerWatchdogDaemon (state = TIMED_WAITING)
        at java.lang.Thread.sleep(Native Method)
        at java.lang.Thread.sleep(Thread.java:386)
        at java.lang.Thread.sleep(Thread.java:327)
        at java.lang.Daemons$FinalizerWatchdogDaemon.sleepFor(Daemons.java:345)
        at java.lang.Daemons$FinalizerWatchdogDaemon.waitForFinalization(Daemons.java:371)
        at java.lang.Daemons$FinalizerWatchdogDaemon.runInternal(Daemons.java:284)
        at java.lang.Daemons$Daemon.run(Daemons.java:103) 
        at java.lang.Thread.run(Thread.java:784) 
     Caused by: com.github.anrwatchdog.ANRError$$$_Thread: ReferenceQueueDaemon (state = WAITING)
        at java.lang.Object.wait(Native Method)
        at java.lang.Daemons$ReferenceQueueDaemon.runInternal(Daemons.java:178)
        at java.lang.Daemons$Daemon.run(Daemons.java:103) 
        at java.lang.Thread.run(Thread.java:784) 
     Caused by: com.github.anrwatchdog.ANRError$$$_Thread: Thread-7 (state = RUNNABLE)
        at libcore.io.Linux.accept(Native Method)
        at libcore.io.BlockGuardOs.accept(BlockGuardOs.java:64)
        at android.system.Os.accept(Os.java:43)
        at android.net.LocalSocketImpl.accept(LocalSocketImpl.java:344)
        at android.net.LocalServerSocket.accept(LocalServerSocket.java:90)
        at com.android.tools.ir.server.Server$SocketServerThread.run(Server.java:165)
        at java.lang.Thread.run(Thread.java:784) 
     Caused by: com.github.anrwatchdog.ANRError$$$_Thread: queued-work-looper (state = RUNNABLE)
        at android.os.MessageQueue.nativePollOnce(Native Method)
        at android.os.MessageQueue.next(MessageQueue.java:379)
        at android.os.Looper.loop(Looper.java:144)
        at android.os.HandlerThread.run(HandlerThread.java:65)
     Caused by: com.github.anrwatchdog.ANRError$$$_Thread: |ANR-WatchDog| (state = RUNNABLE)
        at dalvik.system.VMStack.getThreadStackTrace(Native Method)
        at java.lang.Thread.getStackTrace(Thread.java:1556)
        at java.lang.Thread.getAllStackTraces(Thread.java:1606)
        at com.github.anrwatchdog.ANRError.New(ANRError.java:72)
        at com.github.anrwatchdog.ANRWatchDog.run(ANRWatchDog.java:209)

所以我们也更清楚的看到了相关的原理——将ANR检测到之后,在ANR发生之前,直接报错。这里可以用其他的对crash的处理方法handle这个名为“ANRError”的Error,就不会crash了。

接下来就一起具体看看项目源码吧:

项目源码

其实项目源码很简单,基本上就两个java文件:ANRError.javaANRWatchDog.java

ANRError.java

从简单的开始看,首先看看这个ANRError.java文件开始吧:
整个文件就100多行,干脆我把整个源码贴上来吧:

package com.github.anrwatchdog;

import android.os.Looper;

import java.io.Serializable;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.TreeMap;

/**
 * Error thrown by {@link com.github.anrwatchdog.ANRWatchDog} when an ANR is detected.
 * Contains the stack trace of the frozen UI thread.
 * 

* It is important to notice that, in an ANRError, all the "Caused by" are not really the cause * of the exception. Each "Caused by" is the stack trace of a running thread. Note that the main * thread always comes first. */ @SuppressWarnings({"Convert2Diamond", "UnusedDeclaration"}) public class ANRError extends Error { private static class $ implements Serializable { private final String _name; private final StackTraceElement[] _stackTrace; private class _Thread extends Throwable { private _Thread(_Thread other) { super(_name, other); } @Override public Throwable fillInStackTrace() { setStackTrace(_stackTrace); return this; } } private $(String name, StackTraceElement[] stackTrace) { _name = name; _stackTrace = stackTrace; } } private static final long serialVersionUID = 1L; private ANRError($._Thread st) { super("Application Not Responding", st); } @Override public Throwable fillInStackTrace() { setStackTrace(new StackTraceElement[] {}); return this; } static ANRError New(String prefix, boolean logThreadsWithoutStackTrace) { final Thread mainThread = Looper.getMainLooper().getThread(); final Map<Thread, StackTraceElement[]> stackTraces = new TreeMap<Thread, StackTraceElement[]>(new Comparator<Thread>() { @Override public int compare(Thread lhs, Thread rhs) { if (lhs == rhs) return 0; if (lhs == mainThread) return 1; if (rhs == mainThread) return -1; return rhs.getName().compareTo(lhs.getName()); } }); for (Map.Entry<Thread, StackTraceElement[]> entry : Thread.getAllStackTraces().entrySet()) if ( entry.getKey() == mainThread || ( entry.getKey().getName().startsWith(prefix) && ( logThreadsWithoutStackTrace || entry.getValue().length > 0 ) ) ) stackTraces.put(entry.getKey(), entry.getValue()); // Sometimes main is not returned in getAllStackTraces() - ensure that we list it if (!stackTraces.containsKey(mainThread)) { stackTraces.put(mainThread, mainThread.getStackTrace()); } $._Thread tst = null; for (Map.Entry<Thread, StackTraceElement[]> entry : stackTraces.entrySet()) tst = new $(getThreadTitle(entry.getKey()), entry.getValue()).new _Thread(tst); return new ANRError(tst); } static ANRError NewMainOnly() { final Thread mainThread = Looper.getMainLooper().getThread(); final StackTraceElement[] mainStackTrace = mainThread.getStackTrace(); return new ANRError(new $(getThreadTitle(mainThread), mainStackTrace).new _Thread(null)); } private static String getThreadTitle(Thread thread) { return thread.getName() + " (state = " + thread.getState() + ")"; } }

通读一下……嗯……没什么需要注意的,直接进入ANRWatchDog.java文件吧。

ANRWatchDog.java

大佬就是大佬,你看这个类的名字和项目名称一样,一眼就知道这个是核心类。代码也不长,一共220行,也干脆贴出来吧:

package com.github.anrwatchdog;

/*
 * The MIT License (MIT)
 *
 * Copyright (c) 2016 Salomon BRYS
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy of
 * this software and associated documentation files (the "Software"), to deal in
 * the Software without restriction, including without limitation the rights to
 * use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
 * the Software, and to permit persons to whom the Software is furnished to do so,
 * subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in all
 * copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
 * FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
 * COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
 * IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
 * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 */

import android.os.Debug;
import android.os.Handler;
import android.os.Looper;
import android.util.Log;

/**
 * A watchdog timer thread that detects when the UI thread has frozen.
 */
@SuppressWarnings("UnusedDeclaration")
public class ANRWatchDog extends Thread {

    public interface ANRListener {
        public void onAppNotResponding(ANRError error);
    }

    public interface InterruptionListener {
        public void onInterrupted(InterruptedException exception);
    }

    private static final int DEFAULT_ANR_TIMEOUT = 5000;

    private static final ANRListener DEFAULT_ANR_LISTENER = new ANRListener() {
        @Override public void onAppNotResponding(ANRError error) {
            throw error;
        }
    };

    private static final InterruptionListener DEFAULT_INTERRUPTION_LISTENER = new InterruptionListener() {
        @Override public void onInterrupted(InterruptedException exception) {
            Log.w("ANRWatchdog", "Interrupted: " + exception.getMessage());
        }
    };

    private ANRListener _anrListener = DEFAULT_ANR_LISTENER;
    private InterruptionListener _interruptionListener = DEFAULT_INTERRUPTION_LISTENER;

    private final Handler _uiHandler = new Handler(Looper.getMainLooper());
    private final int _timeoutInterval;

    private String _namePrefix = "";
    private boolean _logThreadsWithoutStackTrace = false;
    private boolean _ignoreDebugger = false;

    private volatile int _tick = 0;

    private final Runnable _ticker = new Runnable() {
        @Override public void run() {
            _tick = (_tick + 1) % Integer.MAX_VALUE;
        }
    };

    /**
     * Constructs a watchdog that checks the ui thread every {@value #DEFAULT_ANR_TIMEOUT} milliseconds
     */
    public ANRWatchDog() {
        this(DEFAULT_ANR_TIMEOUT);
    }

    /**
     * Constructs a watchdog that checks the ui thread every given interval
     *
     * @param timeoutInterval The interval, in milliseconds, between to checks of the UI thread.
     *                        It is therefore the maximum time the UI may freeze before being reported as ANR.
     */
    public ANRWatchDog(int timeoutInterval) {
        super();
        _timeoutInterval = timeoutInterval;
    }

    /**
     * Sets an interface for when an ANR is detected.
     * If not set, the default behavior is to throw an error and crash the application.
     *
     * @param listener The new listener or null
     * @return itself for chaining.
     */
    public ANRWatchDog setANRListener(ANRListener listener) {
        if (listener == null) {
            _anrListener = DEFAULT_ANR_LISTENER;
        }
        else {
            _anrListener = listener;
        }
        return this;
    }

    /**
     * Sets an interface for when the watchdog thread is interrupted.
     * If not set, the default behavior is to just log the interruption message.
     *
     * @param listener The new listener or null.
     * @return itself for chaining.
     */
    public ANRWatchDog setInterruptionListener(InterruptionListener listener) {
        if (listener == null) {
            _interruptionListener = DEFAULT_INTERRUPTION_LISTENER;
        }
        else {
            _interruptionListener = listener;
        }
        return this;
    }

    /**
     * Set the prefix that a thread's name must have for the thread to be reported.
     * Note that the main thread is always reported.
     * Default "".
     *
     * @param prefix The thread name's prefix for a thread to be reported.
     * @return itself for chaining.
     */
    public ANRWatchDog setReportThreadNamePrefix(String prefix) {
        if (prefix == null)
            prefix = "";
        _namePrefix = prefix;
        return this;
    }

    /**
     * Set that only the main thread will be reported.
     *
     * @return itself for chaining.
     */
    public ANRWatchDog setReportMainThreadOnly() {
        _namePrefix = null;
        return this;
    }

    /**
     * Set that all running threads will be reported,
     * even those from which no stack trace could be extracted.
     * Default false.
     *
     * @param logThreadsWithoutStackTrace Whether or not all running threads should be reported
     * @return itself for chaining.
     */
    public ANRWatchDog setLogThreadsWithoutStackTrace(boolean logThreadsWithoutStackTrace) {
        _logThreadsWithoutStackTrace = logThreadsWithoutStackTrace;
        return this;
    }

    /**
     * Set whether to ignore the debugger when detecting ANRs.
     * When ignoring the debugger, ANRWatchdog will detect ANRs even if the debugger is connected.
     * By default, it does not, to avoid interpreting debugging pauses as ANRs.
     * Default false.
     *
     * @param ignoreDebugger Whether to ignore the debugger.
     * @return itself for chaining.
     */
    public ANRWatchDog setIgnoreDebugger(boolean ignoreDebugger) {
        _ignoreDebugger = ignoreDebugger;
        return this;
    }

    @Override
    public void run() {
        setName("|ANR-WatchDog|");

        int lastTick;
        int lastIgnored = -1;
        while (!isInterrupted()) {
            lastTick = _tick;
            _uiHandler.post(_ticker);
            try {
                Thread.sleep(_timeoutInterval);
            }
            catch (InterruptedException e) {
                _interruptionListener.onInterrupted(e);
                return ;
            }

            // If the main thread has not handled _ticker, it is blocked. ANR.
            if (_tick == lastTick) {
                if (!_ignoreDebugger && Debug.isDebuggerConnected()) {
                    if (_tick != lastIgnored)
                        Log.w("ANRWatchdog", "An ANR was detected but ignored because the debugger is connected (you can prevent this with setIgnoreDebugger(true))");
                    lastIgnored = _tick;
                    continue ;
                }

                ANRError error;
                if (_namePrefix != null)
                    error = ANRError.New(_namePrefix, _logThreadsWithoutStackTrace);
                else
                    error = ANRError.NewMainOnly();
                _anrListener.onAppNotResponding(error);
                return;
            }
        }
    }

}

ANRWatchDog.run()

我们直接从最后一个函数:public void run()开始理解:


    @Override
    public void run() {
        setName("|ANR-WatchDog|");

        int lastTick;
        int lastIgnored = -1;
        while (!isInterrupted()) {
            lastTick = _tick;
            _uiHandler.post(_ticker);
            try {
                Thread.sleep(_timeoutInterval);
            }
            catch (InterruptedException e) {
                _interruptionListener.onInterrupted(e);
                return ;
            }

            // If the main thread has not handled _ticker, it is blocked. ANR.
            if (_tick == lastTick) {
                if (!_ignoreDebugger && Debug.isDebuggerConnected()) {
                    if (_tick != lastIgnored)
                        Log.w("ANRWatchdog", "An ANR was detected but ignored because the debugger is connected (you can prevent this with setIgnoreDebugger(true))");
                    lastIgnored = _tick;
                    continue ;
                }

                ANRError error;
                if (_namePrefix != null)
                    error = ANRError.New(_namePrefix, _logThreadsWithoutStackTrace);
                else
                    error = ANRError.NewMainOnly();
                _anrListener.onAppNotResponding(error);
                return;
            }
        }
    }

首先说明,(假设读者肯定理解多线程的逻辑哈)这个run()函数是Thread子类的核心想必都清楚,我要告诉大家的是这个Thread线程(ANRWatchDog)的start()方法是在Application.onCreate()中调用的。

  1. 加载了两个int型变量:lastTick, lastIgnored

// todo: 详解run()方法逻辑

ANR-WatchDog如何检测栈信息

  1. 首先明确一个类——StackTraceElement6。这里参考官方文档可以理解个大概,我们又找到了这样一个博客7,以及易百教程8
    先看这篇博客7:先看这篇博客7,对于StackTraceElement的解释有下面这一段话:
/*
 * StackTrace简述 
 * 1 StackTrace用栈的形式保存了方法的调用信息. 
 * 2 怎么获取这些调用信息呢? 
 *   可用Thread.currentThread().getStackTrace()方法 
 *   得到当前线程的StackTrace信息. 
 *   该方法返回的是一个StackTraceElement数组. 
 * 3 该StackTraceElement数组就是StackTrace中的内容. 
 * 4 遍历该StackTraceElement数组.就可以看到方法间的调用流程. 
 *   比如线程中methodA调用了methodB那么methodA先入栈methodB再入栈. 
 * 5 在StackTraceElement数组下标为2的元素中保存了当前方法的所属文件名,当前方法所属 
 *   的类名,以及该方法的名字.除此以外还可以获取方法调用的行数. 
 * 6 在StackTraceElement数组下标为3的元素中保存了当前方法的调用者的信息和它调用 
 *   时的代码行数. 
 * /

按照该文档的


  1. 《Android ANR监测诊断以及解决办法》[简书]https://www.jianshu.com/p/8ae173c9fb08 ↩︎

  2. 《StrictMode:安卓中的严格模式》[简书]https://www.jianshu.com/p/271474cd1d91 ↩︎

  3. 《Android 编程下的 TraceView 简介及其案例实战》[博客园]https://www.cnblogs.com/sunzn/p/3192231.html ↩︎

  4. 《Android trace文件抓取原理》[简书]https://www.jianshu.com/p/f406d535a8bc ↩︎

  5. 《Android ANR监测方案解析》[搜狐]https://www.sohu.com/a/220647552_741445 ↩︎ ↩︎

  6. 《Oracle官方java说明文档——StackTraceElement》https://docs.oracle.com/javase/9/docs/api/java/lang/StackTraceElement.html ↩︎

  7. 《StackTrace简述以及StackTraceElement使用实例》[博客园]https://www.cnblogs.com/xiaozz/p/6448622.html ↩︎ ↩︎ ↩︎

  8. 《java.lang.StackTraceElement类
    》[易百教程]https://www.yiibai.com/java/lang/java_lang_stacktraceelement.html ↩︎

你可能感兴趣的:(Android,ANR)