Android异常崩溃处理流程-源码分析

1、前言

在文章开篇,我们抛出两个问题:

  • 当我们的应用发生crash或是anr的时候,系统框架做了什么?
  • 我们是否可以接收系统监控到的应用崩溃,并进行记录和上传呢?

而要解释这个问题,就不得不从进程的启动开始讲起,因为监控也是在这个时候开始的..

2、源码分析

2.1、启动app进程

有深入了解过android底层的同学应该会都知道,Android进程都是由zygote孵化而来。而启动zygote和其他Java程序的应用程序, 代码位于frameworks/base/cmds/app_process/app_main.cpp

关键代码:

int main(int argc, char* const argv[])
{
    ....
    if (strcmp(arg, "--zygote") == 0) {
        zygote = true;
        niceName = ZYGOTE_NICE_NAME;
    } else if (strcmp(arg, "--start-system-server") == 0) {
        startSystemServer = true;
    } else if (strcmp(arg, "--application") == 0) {
        application = true;
    } else if (strncmp(arg, "--nice-name=", 12) == 0) {
        niceName.setTo(arg + 12);
    }
    
    ....
    if (zygote) {
        runtime.start("com.android.internal.os.ZygoteInit", args, zygote);
    } else if (className) {
        runtime.start("com.android.internal.os.RuntimeInit", args, zygote);
    } else {
        fprintf(stderr, "Error: no class name or --zygote supplied.\n");
        app_usage();
        LOG_ALWAYS_FATAL("app_process: no class name or --zygote supplied.");
    }
}

可以看到,app_process 里面定义了三种应用程序类型:

  1. Zygote: com.android.internal.os.ZygoteInit
  2. System Server, 不单独启动,而是由Zygote启动
  3. 其他指定类名的Java 程序,比如说常用的 am. /system/bin/am ,其实是一个shell程序,它的真正实现是:
    exec app_process @"

根据传入参数的不同可以有两种启动方式,一个是 "com.android.internal.os.RuntimeInit", 另一个是 ”com.android.internal.os.ZygoteInit", 对应RuntimeInit 和 ZygoteInit 两个类, 图中用绿色和粉红色分别表示。这两个类的主要区别在于Java端,可以明显看出,ZygoteInit 相比 RuntimeInit 多做了很多事情,比如说 “preload", "gc" 等等。但是在Native端,他们都做了相同的事, startVM() 和 startReg().而我们的目的是在这两类的java端实现中.

2.2、 ZygoteInit

当VM准备就绪,就可以运行Java代码了,系统也将在此第一次进入Java世界,还记得app_main.cpp里面调到的 Runtime.start()的参数吗, 那就是我们要运行的Java类。Android支持两个类做为起点,一个是‘com.android.internal.os.ZygoteInit', 另一个是'com.android.internal.os.RuntimeInit'。

此外Runtime_Init 类里还定义了一个ZygoteInit() 静态方法。它在Zygote 创建一个新的应用进程的时候被创建,它和RuntimeInit 类的main() 函数做了以下相同的事情:

  • redirectLogStreams(): 将System.out 和 System.err 输出重定向到Android 的Log系统(定义在 android.util.Log).

  • commonInit(): 初始化了一下系统属性,其中最重要的一点就是设置了一个未捕捉异常的handler,当代码有任何未知异常,就会执行它

RuntimeInit:

 protected static final void commonInit() {
        if (DEBUG) Slog.d(TAG, "Entered RuntimeInit!");

        /*
         * set handlers; these apply to all threads in the VM. Apps can replace
         * the default handler, but not the pre handler.
         */
        LoggingHandler loggingHandler = new LoggingHandler();
        Thread.setUncaughtExceptionPreHandler(loggingHandler);
        Thread.setDefaultUncaughtExceptionHandler(new KillApplicationHandler(loggingHandler));

        /*
       ...
    }

调试过Android代码的同学经常看到的"*** FATAL EXCEPTION IN SYSTEM PROCESS" 打印就出自LoggingHandler这里:

 private static class LoggingHandler implements Thread.UncaughtExceptionHandler {
        public volatile boolean mTriggered = false;

        @Override
        public void uncaughtException(Thread t, Throwable e) {
            ...
            // Don't re-enter if KillApplicationHandler has already run
            if (mCrashing) return;

            // mApplicationObject is null for non-zygote java programs (e.g. "am")
            // There are also apps running with the system UID. We don't want the
            // first clause in either of these two cases, only for system_server.
            if (mApplicationObject == null && (Process.SYSTEM_UID == Process.myUid())) {
                Clog_e(TAG, "*** FATAL EXCEPTION IN SYSTEM PROCESS: " + t.getName(), e);
            } else {
                StringBuilder message = new StringBuilder();
                // The "FATAL EXCEPTION" string is still used on Android even though
                // apps can set a custom UncaughtExceptionHandler that renders uncaught
                // exceptions non-fatal.
                message.append("FATAL EXCEPTION: ").append(t.getName()).append("\n");
                final String processName = ActivityThread.currentProcessName();
                if (processName != null) {
                    message.append("Process: ").append(processName).append(", ");
                }
                message.append("PID: ").append(Process.myPid());
                Clog_e(TAG, message.toString(), e);
            }
        }
    }

KillApplicationHandler:

 private static class KillApplicationHandler implements Thread.UncaughtExceptionHandler {
        private final LoggingHandler mLoggingHandler;

        public KillApplicationHandler(LoggingHandler loggingHandler) {
            this.mLoggingHandler = Objects.requireNonNull(loggingHandler);
        }

        @Override
        public void uncaughtException(Thread t, Throwable e) {
            try {
                ...

                // Don't re-enter -- avoid infinite loops if crash-reporting crashes.
                if (mCrashing) return;
                mCrashing = true;

                ...
                //关键代码
                // Bring up crash dialog, wait for it to be dismissed
                ActivityManager.getService().handleApplicationCrash(
                        mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e));
            } catch (Throwable t2) {
               ...
            } finally {
                // Try everything to make sure this process goes away.
                Process.killProcess(Process.myPid());
                System.exit(10);
            }
        }

ActivityManager.getService().handleApplicationCrash-->ActivityManagerService.handleApplicationCrash,此处转到ams进行处理。

从这里的调用函数我们也可以了看出,handleApplicationCrash,是处理应用crash的,但其实还有native_crash、ANR、wtf,下面我们就来说说各种类型的崩溃处理流程.

2.3、Application crash

ActivityManagerService:

  public void handleApplicationCrash(IBinder app,
            ApplicationErrorReport.ParcelableCrashInfo crashInfo) {
        ProcessRecord r = findAppProcess(app, "Crash");
        final String processName = app == null ? "system_server"
                : (r == null ? "unknown" : r.processName);

        handleApplicationCrashInner("crash", r, processName, crashInfo);
    }

从这里可以看出,若传入app为null时,processName就设置为system_server

 void handleApplicationCrashInner(String eventType, ProcessRecord r, String processName,
            ApplicationErrorReport.CrashInfo crashInfo) {
        ...
        //关键
        addErrorToDropBox(eventType, r, processName, null, null, null, null, null, crashInfo);
        if (r == null || r.pid == MY_PID) {
            if ( "crash".equals(eventType) ) {
                dumpErrorInfo(processName, MY_PID, 2, 2);
                SystemClock.sleep(3000);
            }
        } else {
          
        }

        mAppErrors.crashApplication(r, crashInfo);
    }

上面会调用addErrorToDropBox将应用crash,进行封装输出,这里我们先不说addErrorToDropBox的具体逻辑,先说说其他类型的异常崩溃,你会发现,最终都是调用addErrorToDropBox方法。

2.4、native_crash

native_crash,顾名思义,就是native层发生的crash。其实他是通过一个NativeCrashListener线程去监控的。

final class NativeCrashListener extends Thread {
    ...

    /*
     * Daemon thread that accept()s incoming domain socket connections from debuggerd
     * and processes the crash dump that is passed through.
     */
    NativeCrashListener(ActivityManagerService am) {
        mAm = am;
    }

    @Override
    public void run() {
        final byte[] ackSignal = new byte[1];

       ...

        // The file system entity for this socket is created with 0777 perms, owned
        // by system:system. selinux restricts things so that only crash_dump can
        // access it.
        {
            File socketFile = new File(DEBUGGERD_SOCKET_PATH);
            if (socketFile.exists()) {
                socketFile.delete();
            }
        }

        try {
            FileDescriptor serverFd = Os.socket(AF_UNIX, SOCK_STREAM, 0);
            final UnixSocketAddress sockAddr = UnixSocketAddress.createFileSystem(
                    DEBUGGERD_SOCKET_PATH);
            Os.bind(serverFd, sockAddr);
            Os.listen(serverFd, 1);
            Os.chmod(DEBUGGERD_SOCKET_PATH, 0777);
            
            //1.一直循环地读peerFd文件,若发生存在,则进入consumeNativeCrashData
            while (true) {
                FileDescriptor peerFd = null;
                try {
                    if (MORE_DEBUG) Slog.v(TAG, "Waiting for debuggerd connection");
                    peerFd = Os.accept(serverFd, null /* peerAddress */);
                    if (MORE_DEBUG) Slog.v(TAG, "Got debuggerd socket " + peerFd);
                    if (peerFd != null) {
                        // the reporting thread may take responsibility for
                        // acking the debugger; make sure we play along.
                        //2.进入native crash数据处理流程
                        consumeNativeCrashData(peerFd);
                    }
                } catch (Exception e) {
                    Slog.w(TAG, "Error handling connection", e);
                } finally {
                    ...
                }
            }
        } catch (Exception e) {
            Slog.e(TAG, "Unable to init native debug socket!", e);
        }
    }

  

    // Read a crash report from the connection
    void consumeNativeCrashData(FileDescriptor fd) {
        try {
                ...
                //3.启动NativeCrashReporter作为上报错误的新线程
                final String reportString = new String(os.toByteArray(), "UTF-8");
                (new NativeCrashReporter(pr, signal, reportString)).start();
             
        } catch (Exception e) {
            ...
        }
    }

}

上报native_crash的线程-->NativeCrashReporter:

 class NativeCrashReporter extends Thread {
        ProcessRecord mApp;
        int mSignal;
        String mCrashReport;

        NativeCrashReporter(ProcessRecord app, int signal, String report) {
            super("NativeCrashReport");
            mApp = app;
            mSignal = signal;
            mCrashReport = report;
        }

        @Override
        public void run() {
            try {
                //1.包装崩溃信息
                CrashInfo ci = new CrashInfo();
                ci.exceptionClassName = "Native crash";
                ci.exceptionMessage = Os.strsignal(mSignal);
                ci.throwFileName = "unknown";
                ci.throwClassName = "unknown";
                ci.throwMethodName = "unknown";
                ci.stackTrace = mCrashReport;

                if (DEBUG) Slog.v(TAG, "Calling handleApplicationCrash()");
                //2.转到ams中处理,跟普通crash一致,只是类型不一样
                mAm.handleApplicationCrashInner("native_crash", mApp, mApp.processName, ci);
                if (DEBUG) Slog.v(TAG, "<-- handleApplicationCrash() returned");
            } catch (Exception e) {
                Slog.e(TAG, "Unable to report native crash", e);
            }
        }
    }

native crash跟到这里就结束了,后面的流程就是跟application crash一样,都会走到addErrorToDropBox中,这个我们后面再分析。

2.5、ANR

相信做android的同学都知道anr这个让人郁闷的东西,而发生的场景,也有好几种,比如Activity、Service、Broadcast。因为篇幅有限,我们这里就不讨论每种anr发生后的原因和具体的流程了,直接跳到已经触发ANR的位置。

AppErrors.appNotResponding:

final void appNotResponding(ProcessRecord app, ActivityRecord activity,
            ActivityRecord parent, boolean aboveSystem, final String annotation) {
        ArrayList firstPids = new ArrayList(5);
        SparseArray lastPids = new SparseArray(20);

        if (mService.mController != null) {
            try {
                //1.判断是否继续后面的流程,还是直接kill掉当前进程 
                // 0 == continue, -1 = kill process immediately
                int res = mService.mController.appEarlyNotResponding(
                        app.processName, app.pid, annotation);
                if (res < 0 && app.pid != MY_PID) {
                    app.kill("anr", true);
                }
            } catch (RemoteException e) {
                mService.mController = null;
                Watchdog.getInstance().setActivityController(null);
            }
        }

        //2.记录发生anr的时间
        long anrTime = SystemClock.uptimeMillis();
        //3.更新cpu使用情况
        if (ActivityManagerService.MONITOR_CPU_USAGE) {
            mService.updateCpuStatsNow();
        }

        //可以在设置中设置发生anr后,是弹框显示还是后台处理,默认是后台
        // Unless configured otherwise, swallow ANRs in background processes & kill the process.
        boolean showBackground = Settings.Secure.getInt(mContext.getContentResolver(),
                Settings.Secure.ANR_SHOW_BACKGROUND, 0) != 0;

        boolean isSilentANR;

        synchronized (mService) {
            ...

            // In case we come through here for the same app before completing
            // this one, mark as anring now so we will bail out.
            app.notResponding = true;

            //3.将anr写入event log中
            EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid,
                    app.processName, app.info.flags, annotation);

            // Dump thread traces as quickly as we can, starting with "interesting" processes.
            firstPids.add(app.pid);

            // Don't dump other PIDs if it's a background ANR
            isSilentANR = !showBackground && !isInterestingForBackgroundTraces(app);
            if (!isSilentANR) {
                int parentPid = app.pid;
                if (parent != null && parent.app != null && parent.app.pid > 0) {
                    parentPid = parent.app.pid;
                }
                if (parentPid != app.pid) firstPids.add(parentPid);

                if (MY_PID != app.pid && MY_PID != parentPid) firstPids.add(MY_PID);

                for (int i = mService.mLruProcesses.size() - 1; i >= 0; i--) {
                    ProcessRecord r = mService.mLruProcesses.get(i);
                    if (r != null && r.thread != null) {
                        int pid = r.pid;
                        if (pid > 0 && pid != app.pid && pid != parentPid && pid != MY_PID) {
                            if (r.persistent) {
                                firstPids.add(pid);
                                if (DEBUG_ANR) Slog.i(TAG, "Adding persistent proc: " + r);
                            } else if (r.treatLikeActivity) {
                                firstPids.add(pid);
                                if (DEBUG_ANR) Slog.i(TAG, "Adding likely IME: " + r);
                            } else {
                                lastPids.put(pid, Boolean.TRUE);
                                if (DEBUG_ANR) Slog.i(TAG, "Adding ANR proc: " + r);
                            }
                        }
                    }
                }
            }
            
        }

        // 4.将主要的anr信息写到main.log中
        StringBuilder info = new StringBuilder();
        info.setLength(0);
        info.append("ANR in ").append(app.processName);
        if (activity != null && activity.shortComponentName != null) {
            info.append(" (").append(activity.shortComponentName).append(")");
        }
        info.append("\n");
        info.append("PID: ").append(app.pid).append("\n");
        if (annotation != null) {
            info.append("Reason: ").append(annotation).append("\n");
        }
        if (parent != null && parent != activity) {
            info.append("Parent: ").append(parent.shortComponentName).append("\n");
        }
        
        ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);
        ArrayList nativePids = null;

        // don't dump native PIDs for background ANRs unless it is the process of interest
        String[] nativeProc = null;
        if (isSilentANR) {
            for (int i = 0; i < NATIVE_STACKS_OF_INTEREST.length; i++) {
                if (NATIVE_STACKS_OF_INTEREST[i].equals(app.processName)) {
                    nativeProc = new String[] { app.processName };
                    break;
                }
            }
            int[] pid = nativeProc == null ? null : Process.getPidsForCommands(nativeProc);
            if(pid != null){
                nativePids = new ArrayList(pid.length);
                for (int i : pid) {
                    nativePids.add(i);
                }
            }
        } else {
            nativePids = Watchdog.getInstance().getInterestingNativePids();
        }

        //5.dump出stacktraces文件
        // For background ANRs, don't pass the ProcessCpuTracker to
        // avoid spending 1/2 second collecting stats to rank lastPids.
        File tracesFile = ActivityManagerService.dumpStackTraces(
                true, firstPids,
                (isSilentANR) ? null : processCpuTracker,
                (isSilentANR) ? null : lastPids,
                nativePids);

        String cpuInfo = null;
        if (ActivityManagerService.MONITOR_CPU_USAGE) {
            //6.再次更新cpu使用情况
            mService.updateCpuStatsNow();
            synchronized (mService.mProcessCpuTracker) {
                //7.打印anr时cpu使用状态
                cpuInfo = mService.mProcessCpuTracker.printCurrentState(anrTime);
            }
            info.append(processCpuTracker.printCurrentLoad());
            info.append(cpuInfo);
        }

        info.append(processCpuTracker.printCurrentState(anrTime));

        //8.当traces文件不存在时,只能打印线程日志了
        if (tracesFile == null) {
            // There is no trace file, so dump (only) the alleged culprit's threads to the log
            Process.sendSignal(app.pid, Process.SIGNAL_QUIT);
        }

        ...
        //9.关键,回到了我们熟悉的addErrorToDropBox,进行错误信息包装跟上传了
        mService.addErrorToDropBox("anr", app, app.processName, activity, parent, annotation,
                cpuInfo, tracesFile, null);

        if (mService.mController != null) {
            try {
                //10.根据appNotResponding返回结果,看是否继续等待,还是结束当前进程
                // 0 == show dialog, 1 = keep waiting, -1 = kill process immediately
                int res = mService.mController.appNotResponding(
                        app.processName, app.pid, info.toString());
                if (res != 0) {
                    if (res < 0 && app.pid != MY_PID) {
                        app.kill("anr", true);
                    } else {
                        synchronized (mService) {
                            mService.mServices.scheduleServiceTimeoutLocked(app);
                        }
                    }
                    return;
                }
            } catch (RemoteException e) {
                mService.mController = null;
                Watchdog.getInstance().setActivityController(null);
            }
        }

       ...
    }

我们来看一下traces文件是怎么dump出来的:

    public static File dumpStackTraces(boolean clearTraces, ArrayList firstPids,
            ProcessCpuTracker processCpuTracker, SparseArray lastPids,
            ArrayList nativePids) {
        ArrayList extraPids = null;

        //1.测量CPU的使用情况,以便在请求时对顶级用户进行实际的采样。
        if (processCpuTracker != null) {
            processCpuTracker.init();
            try {
                Thread.sleep(200);
            } catch (InterruptedException ignored) {
            }

            processCpuTracker.update();

            // 2.爬取顶级应用到的cpu使用情况
            final int N = processCpuTracker.countWorkingStats();
            extraPids = new ArrayList<>();
            for (int i = 0; i < N && extraPids.size() < 5; i++) {
                ProcessCpuTracker.Stats stats = processCpuTracker.getWorkingStats(i);
                if (lastPids.indexOfKey(stats.pid) >= 0) {
                    if (DEBUG_ANR) Slog.d(TAG, "Collecting stacks for extra pid " + stats.pid);

                    extraPids.add(stats.pid);
                } else if (DEBUG_ANR) {
                    Slog.d(TAG, "Skipping next CPU consuming process, not a java proc: "
                            + stats.pid);
                }
            }
        }
        
        //3.读取trace文件的保存目录
        File tracesFile;
        final String tracesDirProp = SystemProperties.get("dalvik.vm.stack-trace-dir", "");
        if (tracesDirProp.isEmpty()) {
            ...
            String globalTracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);
            ...
        } else {
           ...
        }

        //4.传入指定目录,进入实际dump逻辑
        dumpStackTraces(tracesFile.getAbsolutePath(), firstPids, nativePids, extraPids,
                useTombstonedForJavaTraces);
        return tracesFile;
    }

dumpStackTraces:

  private static void dumpStackTraces(String tracesFile, ArrayList firstPids,
            ArrayList nativePids, ArrayList extraPids,
            boolean useTombstonedForJavaTraces) {

        ...
        final DumpStackFileObserver observer;
        if (useTombstonedForJavaTraces) {
            observer = null;
        } else {
            // Use a FileObserver to detect when traces finish writing.
            // The order of traces is considered important to maintain for legibility.
            observer = new DumpStackFileObserver(tracesFile);
        }

        //我们必须在20秒内完成所有堆栈转储。
        long remainingTime = 20 * 1000;
        try {
            if (observer != null) {
                observer.startWatching();
            }

            // 首先收集所有最重要的pid堆栈。
            if (firstPids != null) {
                int num = firstPids.size();
                for (int i = 0; i < num; i++) {
                    if (DEBUG_ANR) Slog.d(TAG, "Collecting stacks for pid "
                            + firstPids.get(i));
                    final long timeTaken;
                    if (useTombstonedForJavaTraces) {
                        timeTaken = dumpJavaTracesTombstoned(firstPids.get(i), tracesFile, remainingTime);
                    } else {
                        timeTaken = observer.dumpWithTimeout(firstPids.get(i), remainingTime);
                    }

                    remainingTime -= timeTaken;
                    if (remainingTime <= 0) {
                        Slog.e(TAG, "Aborting stack trace dump (current firstPid=" + firstPids.get(i) +
                            "); deadline exceeded.");
                        return;
                    }

                    if (DEBUG_ANR) {
                        Slog.d(TAG, "Done with pid " + firstPids.get(i) + " in " + timeTaken + "ms");
                    }
                }
            }

            //接下来收集native pid的堆栈
            if (nativePids != null) {
                for (int pid : nativePids) {
                    if (DEBUG_ANR) Slog.d(TAG, "Collecting stacks for native pid " + pid);
                    final long nativeDumpTimeoutMs = Math.min(NATIVE_DUMP_TIMEOUT_MS, remainingTime);

                    final long start = SystemClock.elapsedRealtime();
                    Debug.dumpNativeBacktraceToFileTimeout(
                            pid, tracesFile, (int) (nativeDumpTimeoutMs / 1000));
                    final long timeTaken = SystemClock.elapsedRealtime() - start;

                    remainingTime -= timeTaken;
                    if (remainingTime <= 0) {
                        Slog.e(TAG, "Aborting stack trace dump (current native pid=" + pid +
                            "); deadline exceeded.");
                        return;
                    }

                    if (DEBUG_ANR) {
                        Slog.d(TAG, "Done with native pid " + pid + " in " + timeTaken + "ms");
                    }
                }
            }

            // 最后,从CPU跟踪器转储所有额外PID的堆栈。
            if (extraPids != null) {
                for (int pid : extraPids) {
                    if (DEBUG_ANR) Slog.d(TAG, "Collecting stacks for extra pid " + pid);

                    final long timeTaken;
                    if (useTombstonedForJavaTraces) {
                        timeTaken = dumpJavaTracesTombstoned(pid, tracesFile, remainingTime);
                    } else {
                        timeTaken = observer.dumpWithTimeout(pid, remainingTime);
                    }

                    remainingTime -= timeTaken;
                    if (remainingTime <= 0) {
                        Slog.e(TAG, "Aborting stack trace dump (current extra pid=" + pid +
                                "); deadline exceeded.");
                        return;
                    }

                    if (DEBUG_ANR) {
                        Slog.d(TAG, "Done with extra pid " + pid + " in " + timeTaken + "ms");
                    }
                }
            }
        } finally {
            if (observer != null) {
                observer.stopWatching();
            }
        }
    }

看完之后,应该可以很清楚地的明白。ANR的流程就是打印一些 ANR reason、cpu stats、线程日志,然后分别写入main.log、event.log,然后调用到addErrorToDropBox中,最后kill该进程。

2.6、wtf

同样在RuntimeInit中,存在一个wtf的方法,该方法主要由Log.wtf调用,主要用报告一个永远不可能发生的情况。毕竟wtf=what a terrible failure,都开始爆粗了,这个问题可不得了。

 public static void wtf(String tag, Throwable t, boolean system) {
        try {
            if (ActivityManager.getService().handleApplicationWtf(
                    mApplicationObject, tag, system,
                    new ApplicationErrorReport.ParcelableCrashInfo(t))) {
                // The Activity Manager has already written us off -- now exit.
                Process.killProcess(Process.myPid());
                System.exit(10);
            }
        } catch (Throwable t2) {
            if (t2 instanceof DeadObjectException) {
                // System process is dead; ignore
            } else {
                Slog.e(TAG, "Error reporting WTF", t2);
                Slog.e(TAG, "Original WTF:", t);
            }
        }
    }

此处又转到AMS中处理:

 public boolean handleApplicationWtf(final IBinder app, final String tag, boolean system,
            final ApplicationErrorReport.ParcelableCrashInfo crashInfo) {
        ...

        if (system) {
            // If this is coming from the system, we could very well have low-level
            // system locks held, so we want to do this all asynchronously.  And we
            // never want this to become fatal, so there is that too.
            mHandler.post(new Runnable() {
                @Override public void run() {
                    handleApplicationWtfInner(callingUid, callingPid, app, tag, crashInfo);
                }
            });
            return false;
        }
        //无论system是true或是false,都会调用handleApplicationWtfInner
        final ProcessRecord r = handleApplicationWtfInner(callingUid, callingPid, app, tag,
                crashInfo);

        final boolean isFatal = Build.IS_ENG || Settings.Global
                .getInt(mContext.getContentResolver(), Settings.Global.WTF_IS_FATAL, 0) != 0;
        final boolean isSystem = (r == null) || r.persistent;

        if (isFatal && !isSystem) {
            //将再进入普通应用crash的流程
            mAppErrors.crashApplication(r, crashInfo);
            return true;
        } else {
            return false;
        }
    }

handleApplicationWtfInner:

   ProcessRecord handleApplicationWtfInner(int callingUid, int callingPid, IBinder app, String tag,
            final ApplicationErrorReport.CrashInfo crashInfo) {
        final ProcessRecord r = findAppProcess(app, "WTF");
        final String processName = app == null ? "system_server"
                : (r == null ? "unknown" : r.processName);

        EventLog.writeEvent(EventLogTags.AM_WTF, UserHandle.getUserId(callingUid), callingPid,
                processName, r == null ? -1 : r.info.flags, tag, crashInfo.exceptionMessage);

        StatsLog.write(StatsLog.WTF_OCCURRED, callingUid, tag, processName,
                callingPid);

        //关键,最终走到了我们的addErrorToDropBox
        addErrorToDropBox("wtf", r, processName, null, null, tag, null, null, crashInfo);

        return r;
    }

2.7、殊途同归的addErrorToDropBox

为什么说addErrorToDropBox是殊途同归呢,因为无论是crash、native_crash、ANR或是wtf,最终都是来到这里,交由它去处理。那下面我们就来揭开它的神秘面纱吧。

    public void addErrorToDropBox(String eventType,
            ProcessRecord process, String processName, ActivityRecord activity,
            ActivityRecord parent, String subject,
            final String report, final File dataFile,
            final ApplicationErrorReport.CrashInfo crashInfo) {
        // NOTE -- this must never acquire the ActivityManagerService lock,
        // otherwise the watchdog may be prevented from resetting the system.

        // Bail early if not published yet
        if (ServiceManager.getService(Context.DROPBOX_SERVICE) == null) return;
        final DropBoxManager dbox = mContext.getSystemService(DropBoxManager.class);

        //只有这几种类型的错误,才会进行上传
        final boolean shouldReport = ("anr".equals(eventType)
                || "crash".equals(eventType)
                || "native_crash".equals(eventType)
                || "watchdog".equals(eventType));

        // Exit early if the dropbox isn't configured to accept this report type.
        final String dropboxTag = processClass(process) + "_" + eventType;
        //1.如果DropBoxManager没有初始化,或不是要上传的类型,则直接返回
        if (dbox == null || !dbox.isTagEnabled(dropboxTag)&& !shouldReport) 
            return;

        ...

        final StringBuilder sb = new StringBuilder(1024);
        //2.添加一些头部log信息
        appendDropBoxProcessHeaders(process, processName, sb);
        //3.添加崩溃进程和界面的信息
        try {
            if (process != null) {
                //添加是否前台前程log
                sb.append("Foreground: ")
                        .append(process.isInterestingToUserLocked() ? "Yes" : "No")
                        .append("\n");
            }
            //触发该崩溃的界面,可以为null
            if (activity != null) {
                sb.append("Activity: ").append(activity.shortComponentName).append("\n");
            }
            if (parent != null && parent.app != null && parent.app.pid != process.pid) {
                sb.append("Parent-Process: ").append(parent.app.processName).append("\n");
            }
            if (parent != null && parent != activity) {
                sb.append("Parent-Activity: ").append(parent.shortComponentName).append("\n");
            }
            //定入简要信息
            if (subject != null) {
                sb.append("Subject: ").append(subject).append("\n");
            }
            sb.append("Build: ").append(Build.FINGERPRINT).append("\n");
            //是否连接了调试
            if (Debug.isDebuggerConnected()) {
                sb.append("Debugger: Connected\n");
            }
        } catch (NullPointerException e) {
           e.printStackTrace();
        } finally {
            sb.append("\n");
        }


        final String fProcessName = processName;
        final String fEventType = eventType;
        final String packageName = getErrorReportPackageName(process, crashInfo, eventType);
        Slog.i(TAG,"addErrorToDropbox, real report package is "+packageName);

        // Do the rest in a worker thread to avoid blocking the caller on I/O
        // (After this point, we shouldn't access AMS internal data structures.)
        Thread worker = new Thread("Error dump: " + dropboxTag) {
            @Override
            public void run() {
                //4.添加进程的状态到dropbox中
                BufferedReader bufferedReader = null;
                String line;
                try {
                    bufferedReader = new BufferedReader(new FileReader("/proc/" + pid + "/status"));
                    for (int i = 0; i < 5; i++) {
                        if ((line = bufferedReader.readLine()) != null && line.contains("State")) {
                            sb.append(line + "\n");
                            break;
                        }
                    }
                } catch (IOException e) {
                    e.printStackTrace();
                } finally {
                    if (bufferedReader != null) {
                        try {
                            bufferedReader.close();
                        } catch (IOException e) {
                            e.printStackTrace();
                        }
                    }
                }
               
                if (report != null) {
                    sb.append(report);
                }

                String setting = Settings.Global.ERROR_LOGCAT_PREFIX + dropboxTag;
                int lines = Settings.Global.getInt(mContext.getContentResolver(), setting, 0);
                int maxDataFileSize = DROPBOX_MAX_SIZE - sb.length()
                        - lines * RESERVED_BYTES_PER_LOGCAT_LINE;

                //5.将dataFile文件定入dropbox中,一般只有anr时,会将traces文件通过该参数传递进来者,其他类型都不传.
                if (dataFile != null && maxDataFileSize > 0) {
                    try {
                        sb.append(FileUtils.readTextFile(dataFile, maxDataFileSize,
                                    "\n\n[[TRUNCATED]]"));
                    } catch (IOException e) {
                        Slog.e(TAG, "Error reading " + dataFile, e);
                    }
                }
                
                //6.如果是crash类型,会传入crashInfo,此时将其写入dropbox中
                if (crashInfo != null && crashInfo.stackTrace != null) {
                    sb.append(crashInfo.stackTrace);
                }

                if (lines > 0) {
                    sb.append("\n");

                    // 7.合并几个logcat流,取最新部分log
                    InputStreamReader input = null;
                    try {
                        java.lang.Process logcat = new ProcessBuilder(
                                "/system/bin/timeout", "-k", "15s", "10s",
                                "/system/bin/logcat", "-v", "threadtime", "-b", "events", "-b", "system",
                                "-b", "main", "-b", "crash", "-t", String.valueOf(lines))
                                        .redirectErrorStream(true).start();

                        try { logcat.getOutputStream().close(); } catch (IOException e) {}
                        try { logcat.getErrorStream().close(); } catch (IOException e) {}
                        input = new InputStreamReader(logcat.getInputStream());

                        int num;
                        char[] buf = new char[8192];
                        while ((num = input.read(buf)) > 0) sb.append(buf, 0, num);
                    } catch (IOException e) {
                        Slog.e(TAG, "Error running logcat", e);
                    } finally {
                        if (input != null) try { input.close(); } catch (IOException e) {}
                    }
                }

                ...

                
                if (shouldReport) {
                    synchronized (mErrorListenerLock) {
                        try {
                            if (mIApplicationErrorListener == null) {
                                return;
                            }
                            //8.关键,在这里可以添加一个application error的接口,用来实现应用层接收崩溃信息
                            mIApplicationErrorListener.onError(fEventType,
                                    packageName, fProcessName, subject, dropboxTag
                                            + "-" + uuid, crashInfo);
                        } catch (DeadObjectException e) {
                            Slog.i(TAG, "ApplicationErrorListener.onError() E :" + e, e);
                            mIApplicationErrorListener = null;
                        } catch (Exception e) {
                            Slog.i(TAG, "ApplicationErrorListener.onError() E :" + e, e);
                        }
                    }
                }
            }
        };

        ...
    }

调用appendDropBoxProcessHeaders添加头部log信息:

 private void appendDropBoxProcessHeaders(ProcessRecord process, String processName,
            StringBuilder sb) {
        // Watchdog thread ends up invoking this function (with
        // a null ProcessRecord) to add the stack file to dropbox.
        // Do not acquire a lock on this (am) in such cases, as it
        // could cause a potential deadlock, if and when watchdog
        // is invoked due to unavailability of lock on am and it
        // would prevent watchdog from killing system_server.
        if (process == null) {
            sb.append("Process: ").append(processName).append("\n");
            return;
        }
        // Note: ProcessRecord 'process' is guarded by the service
        // instance.  (notably process.pkgList, which could otherwise change
        // concurrently during execution of this method)
        synchronized (this) {
            sb.append("Process: ").append(processName).append("\n");
            sb.append("PID: ").append(process.pid).append("\n");
            int flags = process.info.flags;
            IPackageManager pm = AppGlobals.getPackageManager();
            //添加该进程的flag
            sb.append("Flags: 0x").append(Integer.toHexString(flags)).append("\n");
            for (int ip=0; ip

2.8、接收来自框架层的异常监控信息

这里,就可以回答我们开篇讲的第二个问题,如何去接收框架层监控到的应用或系统崩溃事件?

可以在框架提供了一个IApplicationErrorListener的接口,可通过设置该接口,接收系统框架捕获到的应用到崩溃信息。
IApplicationErrorListener:

public interface IApplicationErrorListener extends android.os.IInterface {
    /** Local-side IPC implementation stub class. */
    public static abstract class Stub extends android.os.Binder implements android.app.IApplicationErrorListener {
        private static final java.lang.String DESCRIPTOR = "android.app.IApplicationErrorListener";

        /** Construct the stub at attach it to the interface. */
        public Stub() {
            this.attachInterface(this, DESCRIPTOR);
        }

        /**
         * Cast an IBinder object into an android.app.IApplicationErrorListener
         * interface, generating a proxy if needed.
         */
        public static android.app.IApplicationErrorListener asInterface(android.os.IBinder obj) {
            if ((obj == null)) {
                return null;
            }
            android.os.IInterface iin = obj.queryLocalInterface(DESCRIPTOR);
            if (((iin != null) && (iin instanceof android.app.IApplicationErrorListener))) {
                return ((android.app.IApplicationErrorListener) iin);
            }
            return new android.app.IApplicationErrorListener.Stub.Proxy(obj);
        }

        @Override
        public android.os.IBinder asBinder() {
            return this;
        }

        @Override
        public boolean onTransact(int code, android.os.Parcel data, android.os.Parcel reply, int flags)
                throws android.os.RemoteException {
            switch (code) {
                case INTERFACE_TRANSACTION: {
                    reply.writeString(DESCRIPTOR);
                    return true;
                }
                case TRANSACTION_onError: {
                    data.enforceInterface(DESCRIPTOR);
                    String errorType = data.readString();
                    String packageName = data.readString();
                    String processName = data.readString();
                    String subject = data.readString();
                    String dump = data.readString();
                    ApplicationErrorReport.CrashInfo crashInfo = new ApplicationErrorReport.CrashInfo(data);
                    this.onError(errorType, packageName, processName, subject, dump, crashInfo);
                    reply.writeNoException();
                    return true;
                }
            }
            return super.onTransact(code, data, reply, flags);
        }

        private static class Proxy implements android.app.IApplicationErrorListener {
            private android.os.IBinder mRemote;

            Proxy(android.os.IBinder remote) {
                mRemote = remote;
            }

            @Override
            public android.os.IBinder asBinder() {
                return mRemote;
            }

            public java.lang.String getInterfaceDescriptor() {
                return DESCRIPTOR;
            }

            @Override
            public void onError(String errorType, String packageName, String processName, String subject,
                    String dump, CrashInfo crashInfo) throws android.os.RemoteException {
                android.os.Parcel _data = android.os.Parcel.obtain();
                android.os.Parcel _reply = android.os.Parcel.obtain();
                try {
                    _data.writeInterfaceToken(DESCRIPTOR);
                    _data.writeString(errorType);
                    _data.writeString(packageName);
                    _data.writeString(processName);
                    _data.writeString(subject);
                    _data.writeString(dump);
                    if (null == crashInfo) {
                        crashInfo = new CrashInfo();
                    }
                    crashInfo.writeToParcel(_data, 0);
                    mRemote.transact(Stub.TRANSACTION_onError, _data, _reply, 0);
                    _reply.readException();
                } finally {
                    _reply.recycle();
                    _data.recycle();
                }
            }
        }

        static final int TRANSACTION_onError = (android.os.IBinder.FIRST_CALL_TRANSACTION + 0);
    }

    public void onError(String errorType, String packageName, String processName, String subject,
            String dump, ApplicationErrorReport.CrashInfo crashInfo) throws android.os.RemoteException;
}

该接口的关键方法就是:

  public void onError(String errorType, String packageName, String processName, String subject,
            String dump, ApplicationErrorReport.CrashInfo crashInfo) throws android.os.RemoteException;

最终的错误信息,都会通过该方法进行传递。那只要我们的应用拥有系统权限,就可以往系统添加该回调了。

你可能感兴趣的:(Android异常崩溃处理流程-源码分析)