Android源码剖析：基于 Handler、Looper 实现拦截全局崩溃、监控ANR等

相信很多人都会有一个疑问，我们为何要去阅读源码，工作上又用不上，这个问题很棒，我们就先从使用出发，然后分析这些用法的实现原理，这样才能体现出阅读源码的意义。

基于 Handler 和 Looper 拦截全局崩溃（主线程），避免 APP 退出。
基于 Handler 和 Looper 实现 ANR 监控。
基于 Handler 实现单线程的线程池。

实现代码

class MyApplication : Application() {
    override fun onCreate() {
        super.onCreate()
        var startWorkTimeMillis = 0L
        Looper.getMainLooper().setMessageLogging {
            if (it.startsWith(">>>>> Dispatching to Handler")) {
                startWorkTimeMillis = System.currentTimeMillis()
            } else if (it.startsWith("<<<<< Finished to Handler")) {
                val duration = System.currentTimeMillis() - startWorkTimeMillis
                if (duration > 100) {
                    Log.e("主线程执行耗时过长","$duration 毫秒，$it")
                }
            }
        }
        val handler = Handler(Looper.getMainLooper())
        handler.post {
            while (true) {
                try {
                    Looper.loop()
                } catch (e: Throwable) {
                    // TODO 主线程崩溃，自行上报崩溃信息
                    if (e.message != null && e.message!!.startsWith("Unable to start activity")) {
                        android.os.Process.killProcess(android.os.Process.myPid())
                        break
                    }
                    e.printStackTrace()
                }
            }
        }
        Thread.setDefaultUncaughtExceptionHandler { thread, e ->
            e.printStackTrace()
            // TODO 异步线程崩溃，自行上报崩溃信息
        }
    }
}

通过上面的代码就可以就可以实现拦截UI线程的崩溃，耗时性能监控。但是也并不能够拦截所有的异常，如果在Activity的onCreate出现崩溃，导致Activity创建失败，那么就会显示黑屏。

ANR获取堆栈信息《Android：基于 Handler、Looper 实现 ANR 监控，获取堆栈》

源码剖析

通过上面简单的代码，我们就实现崩溃和ANR的拦截和监控，但是我们可能并不知道是为何实现的，包括我们知道出现了ANR，但是我们还需要进一步分析为何处出现ANR，如何解决。今天分析的问题有：

如何拦截全局崩溃，避免APP退出。
如何实现 ANR 监控。
利用 Handler 实现单线程池功能。
Activity 的生命周期为什么用 Handler 发送执行。
Handler 的延迟操作如何实现。

涉及的源码

/java/android/os/Handler.java
/java/android/os/MessageQueue.java
/java/android/os/Looper.java
/java/android.app/ActivityThread.java

我们先从APP启动开始分析，APP的启动方法是在ActivityThread中，在main方法中创建了主线程的Looper，也就是当前进程创建。并且在main方法的最后调用了 Looper.loop()，在这个方法中处理主线程的任务调度，一旦执行完这个方法就意味着APP被退出了，如果我们要避免APP被退出，就必须让APP持续执行Looper.loop()。

package android.app;
public final class ActivityThread extends ClientTransactionHandler {
    ...
    public static void main(String[] args) {
        ...
        Looper.prepareMainLooper();
        ...
        Looper.loop();
        throw new RuntimeException("Main thread loop unexpectedly exited");
    }
}

Looper.loop()

那我们进一步分析Looper.loop()方法，在这个方法中写了一个循环，只有当 queue.next() == null 的时候才退出，看到这里我们心里可能会有一个疑问，如果没有主线程任务，是不是Looper.loop()方法就退出了呢？实际上queue.next()其实就是一个阻塞的方法，如果没有任务或没有主动退出，会一直在阻塞，一直等待主线程任务添加进来。

当队列有任务，就会打印信息 Dispatching to ...，然后就调用 msg.target.dispatchMessage(msg);执行任务，执行完毕就会打印信息 Finished to ...，我们就可以通过打印的信息来分析 ANR，一旦执行任务超过5秒就会触发系统提示ANR，但是我们对自己的APP肯定要更加严格，我们可以给我们设定一个目标，超过指定的时长就上报统计，帮助我们进行优化。

public final class Looper {
    final MessageQueue mQueue;
    public static void loop() {
        final Looper me = myLooper();
        if (me == null) {
            throw new RuntimeException("No Looper; Looper.prepare() wasn't called on this thread.");
        }
        final MessageQueue queue = me.mQueue;
        for (;;) {
            Message msg = queue.next(); // might block
            if (msg == null) {
                // No message indicates that the message queue is quitting.
                return;
            }
            // This must be in a local variable, in case a UI event sets the logger
            final Printer logging = me.mLogging;
            if (logging != null) {
                logging.println(">>>>> Dispatching to " + msg.target + " " + msg.callback + ": " + msg.what);
            }
            try {
                msg.target.dispatchMessage(msg);
            } finally {}
            if (logging != null) {
                logging.println("<<<<< Finished to " + msg.target + " " + msg.callback);
            }
            msg.recycleUnchecked();
        }
    }
    public void quit() {
        mQueue.quit(false);
    }
}

如果主线程发生了异常，就会退出循环，意味着APP崩溃，所以我们我们需要进行try-catch，避免APP退出，我们可以在主线程再启动一个 Looper.loop() 去执行主线程任务，然后try-catch这个Looper.loop()方法，就不会退出。

基于 Handler 实现单线程的线程池

从上面的 Looper.loop() ，我们可以利用 Handler 实现单线程池功能，而且这个线程池和主线程一样拥有立刻执行post()、延迟执行postDelayed()、定时执行postAtTime()等强大功能。

// 错误用法
var handler: Handler? = null
Thread({
    handler = Handler()
}).start()

当我们在异步线程执行上面的代码，就会报错 Can't create handler inside thread Thread[Thread-2,5,main] that has not called Looper.prepare()。
这个是因为 Handler 的工作是依靠 Looper ，必须为线程创建 Looper 才能正常功能，正确的用法如下：

// 正确用法
var handler: Handler? = null
Thread({
    Looper.prepare()
    handler = Handler()
    Looper.loop()
}).start()

测试：

button.setOnClickListener {
    handler?.post {
        println(Thread.currentThread())
    }
    handler?.post {
        println(Thread.currentThread())
    }
}

输出结果：

System.out: Thread[Thread-2,5,main]
System.out: Thread[Thread-2,5,main]

HandlerThread

HandlerThread 是 Android 对Thread的封装，增加了Handler的支持，实现就是实现了前面例子的功能

val handlerThread = HandlerThread("test")
handlerThread.start()
handler = Handler(handlerThread.looper)

MessageQueue 源码剖析

我们都知道Handler的功能非常丰富，拥有立刻执行post()、延迟执行postDelayed()、定时执行postAtTime()等执行方式。下面就从源码分析是如何实现的。

public final class MessageQueue {
    Message next() {
        // Return here if the message loop has already quit and been disposed.
        // This can happen if the application tries to restart a looper after quit
        // which is not supported.
        final long ptr = mPtr;
        if (ptr == 0) {
            return null;
        }

        int pendingIdleHandlerCount = -1; // -1 only during first iteration
        int nextPollTimeoutMillis = 0;
        for (;;) {
            if (nextPollTimeoutMillis != 0) {
                Binder.flushPendingCommands();
            }

            nativePollOnce(ptr, nextPollTimeoutMillis);

            synchronized (this) {
                // Try to retrieve the next message.  Return if found.
                final long now = SystemClock.uptimeMillis();
                Message prevMsg = null;
                Message msg = mMessages;
                if (msg != null && msg.target == null) {
                    // Stalled by a barrier.  Find the next asynchronous message in the queue.
                    do {
                        prevMsg = msg;
                        msg = msg.next;
                    } while (msg != null && !msg.isAsynchronous());
                }
                if (msg != null) {
                    if (now < msg.when) {
                        // Next message is not ready.  Set a timeout to wake up when it is ready.
                        nextPollTimeoutMillis = (int) Math.min(msg.when - now, Integer.MAX_VALUE);
                    } else {
                        // Got a message.
                        mBlocked = false;
                        if (prevMsg != null) {
                            prevMsg.next = msg.next;
                        } else {
                            mMessages = msg.next;
                        }
                        msg.next = null;
                        if (DEBUG) Log.v(TAG, "Returning message: " + msg);
                        msg.markInUse();
                        return msg;
                    }
                } else {
                    // No more messages.
                    nextPollTimeoutMillis = -1;
                }

                // Process the quit message now that all pending messages have been handled.
                if (mQuitting) {
                    dispose();
                    return null;
                }

                // If first time idle, then get the number of idlers to run.
                // Idle handles only run if the queue is empty or if the first message
                // in the queue (possibly a barrier) is due to be handled in the future.
                if (pendingIdleHandlerCount < 0
                        && (mMessages == null || now < mMessages.when)) {
                    pendingIdleHandlerCount = mIdleHandlers.size();
                }
                if (pendingIdleHandlerCount <= 0) {
                    // No idle handlers to run.  Loop and wait some more.
                    mBlocked = true;
                    continue;
                }

                if (mPendingIdleHandlers == null) {
                    mPendingIdleHandlers = new IdleHandler[Math.max(pendingIdleHandlerCount, 4)];
                }
                mPendingIdleHandlers = mIdleHandlers.toArray(mPendingIdleHandlers);
            }

            // Run the idle handlers.
            // We only ever reach this code block during the first iteration.
            for (int i = 0; i < pendingIdleHandlerCount; i++) {
                final IdleHandler idler = mPendingIdleHandlers[i];
                mPendingIdleHandlers[i] = null; // release the reference to the handler

                boolean keep = false;
                try {
                    keep = idler.queueIdle();
                } catch (Throwable t) {
                    Log.wtf(TAG, "IdleHandler threw exception", t);
                }

                if (!keep) {
                    synchronized (this) {
                        mIdleHandlers.remove(idler);
                    }
                }
            }

            // Reset the idle handler count to 0 so we do not run them again.
            pendingIdleHandlerCount = 0;

            // While calling an idle handler, a new message could have been delivered
            // so go back and look again for a pending message without waiting.
            nextPollTimeoutMillis = 0;
        }
    }
}

MessageQueue.next() 是一个带有阻塞的方法，只有退出或者有任务才会return，起阻塞的实现是使用Native层的 nativePollOnce() 函数，如果消息队列中没有消息存在nativePollOnce就不会返回，一直处于Native层等待状态。直到调用 quit() 退出或者调用 enqueueMessage(Message msg, long when) 有新的任务进来调用了Native层的nativeWake()函数，才会重新唤醒。
android_os_MessageQueue.cpp

nativePollOnce(long ptr, int timeoutMillis)

nativePollOnce 是一个带有两个参数的Native函数，第一个参数是作为当前任务队列ID；第二个参数是等待时长，如果是-1，就代表无消息，会进入等待状态，如果是 0，再次查找未等待的消息。如果大于0，就等到指定时长然后返回。

nextPollTimeoutMillis = (int) Math.min(msg.when - now, Integer.MAX_VALUE);

在这行代码进行延时的赋值，从而实现postDelayed、postAtTime的功能

enqueueMessage()

看到这里我们可能会有一个疑问，既然是队列，先进先出的原则，那么以下代码输出的结果是如何？

handler?.postDelayed({ println("任务1") },5000)
handler?.post { println("任务2") }
handler?.postDelayed({ println("任务3") },3000)
// 输出结果
任务2
任务3
任务1

之所以是如此，是因为在 enqueueMessage(Message msg, long when) 添加任务的时候已经就已经按照执行的时间要求做好了排序。

    boolean enqueueMessage(Message msg, long when) {
        if (msg.target == null) {
            throw new IllegalArgumentException("Message must have a target.");
        }
        if (msg.isInUse()) {
            throw new IllegalStateException(msg + " This message is already in use.");
        }

        synchronized (this) {
            if (mQuitting) {
                IllegalStateException e = new IllegalStateException(
                        msg.target + " sending message to a Handler on a dead thread");
                Log.w(TAG, e.getMessage(), e);
                msg.recycle();
                return false;
            }

            msg.markInUse();
            msg.when = when;
            Message p = mMessages;
            boolean needWake;
            if (p == null || when == 0 || when < p.when) {
                // New head, wake up the event queue if blocked.
                msg.next = p;
                mMessages = msg;
                needWake = mBlocked;
            } else {
                // Inserted within the middle of the queue.  Usually we don't have to wake
                // up the event queue unless there is a barrier at the head of the queue
                // and the message is the earliest asynchronous message in the queue.
                needWake = mBlocked && p.target == null && msg.isAsynchronous();
                Message prev;
                for (;;) {
                    prev = p;
                    p = p.next;
                    if (p == null || when < p.when) {
                        break;
                    }
                    if (needWake && p.isAsynchronous()) {
                        needWake = false;
                    }
                }
                msg.next = p; // invariant: p == prev.next
                prev.next = msg;
            }

            // We can assume mPtr != 0 because mQuitting is false.
            if (needWake) {
                nativeWake(mPtr);
            }
        }
        return true;
    }

拦截主进程崩溃

拦截主进程崩溃其实也有一定的弊端，因为给用户的感觉是点击没有反应，因为崩溃已经被拦截了。如果是Activity.create崩溃，会出现黑屏问题，所以如果Activity.create崩溃，必须杀死进程，让APP重启，避免出现改问题。

public class MyApplication extends Application {
    
    @Override
    protected void attachBaseContext(Context base) {
        super.attachBaseContext(base);

        new Handler(getMainLooper()).post(() -> {
            while (true) {
                try {
                    Looper.loop();
                } catch (Throwable e) {
                    e.printStackTrace();
                    // TODO 需要手动上报错误到异常管理平台，比如bugly，及时追踪问题所在。
                    if (e.getMessage() != null && e.getMessage().startsWith("Unable to start activity")) {
                        // 如果打开Activity崩溃，就杀死进程，让APP重启。
                        Process.killProcess(Process.myPid());
                        break;
                    }
                }
            }
        });
    }
}

总结

经过上述的分析，我觉得弄懂Handler和Looper MessageQueue还是很有意义的，可以帮助我们更好处理崩溃、ANR、Handler的使用等。