ANR系列之一:ANR显示和日志生成原理讲解

前言:   

ANR系列介绍

ANR系列一共打算分为6篇文章,六篇文章清单如下:

1.ANR系列之一:ANR基本概念,以及弹框和日志记录讲解

2.ANR系列之二:Input类型原理讲解

3.ANR系列之三:BroadCast类型原理讲解

4.ANR系列之四:ContentProvider类型原理讲解

5.ANR系列之五:Service类型原理讲解

6.ANR系列之六:ANR问题如何排查和解决

本篇主要内容

本篇是该系列的第一篇,主要讲解以下几点内容:

1.ANR的一些基本概念;

2.ANR弹框是如何显示出来的;

3.ANR的logcat日志以及ANR文件是如何产生的;

4.发生ANR时,会有哪些日志,以及如何解读;

5.一些扩展性的知识点。

一.ANR基本概念

ANR,指的是安卓中一种长时间应用无响应的一种场景。发生ANR时,会弹出下面这样的提示框,用户点击按钮可以强行杀死应用。

ANR系列之一:ANR显示和日志生成原理讲解_第1张图片

简单来说,ANR就是应用的主线程长时间处于阻塞状态,导致其任务无法正常执行完成的一种场景。所以,主线程阻塞就一定会造成ANR吗?当然不会,因为还有另外一个必要条件:待执行的任务。

二.ANR流程概览

本章主要讲解ANR发生后的完整流程,当然更具体的内容,比如ANR弹框是怎么显示的,以及日志是如何记录的,我们放到了后面章节来讲。

ANR流程概览

首先,我们通过一张流程图来了解整个ANR的流程:

无论是哪种类型的ANR,哪怕是native层的ANR,最终也会通知到AnrHelper类的appNotResponding方法。所以,我们从这个方法开始了解整个ANR的流程。我们用来区分ANR的四种不同类型,其实也就是appNotResponding这个方法中的annotation不同而已,而ANR本身是不去分类型的。

1.appNotResponding方法中,主要是生成AnrRecord对象,加入到mAnrRecords集合中。然后调用startAnrConsumerIfNeeded方法。

2.startAnrConsumerIfNeeded方法主要是通过cas进行判断,避免两个ANR线程同时执行。如果没有冲突的话,则开启AnrConsumerThread线程,对mAnrRecords中的对象进行消费。

3.AnrConsumerThread的run方法中,就是从mAnrRecords中取出对象,这些对象第一条中添加的。通过AnrRecord自身的appNotResponding方法进行消费。

4.appNotResponding方法就是整个ANR流程的核心执行逻辑了。总结一下,其实主要分为两大块:

第一,生成ANR的相关log以及日志并打印或保存。这个我们第四章来讲

第二,弹出应用无响应的框。这个我们第三章来讲。

三.ANR的logcat日志以及ANR文件是如何产生的

两种ANR信息介绍

logcat的日志和ANR文件的生成的核心逻辑仍然在appNotResponding方法中。

发生ANR的时候,会有3种形式的记录,分别记录不同的信息。

1.logcat中记录的日志。这里主要起到的是打印的作用,打印一些ANR的信息,以及会把一些CPU状态打印出来

2.data/anr/目录下生成对应的ANR日志文件。这里面主要记录的是dump进程的信息。

3.dropbox中记录的信息。

3.1 收集进程信息

ANR发生的原因,有时候并不是APP自身导致的,和其他进程也有关系。比如其他进程耗尽CPU资源,导致发生ANR的APP的任务无法正常执行。所以,发生ANR后,收集原因时,并不能单纯的只收集发生ANR的那个APP进程。

appNotResponding方法中,有两个集合,分别是firstPids和lastPids,分别收集

ArrayList firstPids = new ArrayList<>(5);
SparseArray lastPids = new SparseArray<>(20);

firstPids中会加入以下类型的进程ID:

1.发生ANR的进程PID;

2.发生ANR的进程的父进程PID;

3.系统进程的PID;

4.最近使用到并且还存活的进程PID;

5.存在被处理过的Activity的进程PID,这个说实在的,我没太理解,不过不影响主流程,就先忽略。

3.2 logcat中打印ANR信息

发生ANR的时候,会有两种形式的记录:

接下来,appNotResponding方法中会收集一些必要的ANR信息,打印显示在logcat中,主要代码如下:

// Log the ANR to the main log.
StringBuilder info = new StringBuilder();
info.setLength(0);
info.append("ANR in ").append(mApp.processName);//打印发生ANR的进程名
if (activityShortComponentName != null) {
    info.append(" (").append(activityShortComponentName).append(")");
}
info.append("\n");
info.append("PID: ").append(pid).append("\n");//打印发生ANR的进程PID
if (annotation != null) {
    info.append("Reason: ").append(annotation).append("\n");//打印ANR类型和原因
}
if (parentShortComponentName != null
&& parentShortComponentName.equals(activityShortComponentName)) {
    info.append("Parent: ").append(parentShortComponentName).append("\n");
}
if (errorId != null) {
    info.append("ErrorId: ").append(errorId.toString()).append("\n");
}
info.append("Frozen: ").append(mApp.mOptRecord.isFrozen()).append("\n");

...

Slog.e(TAG, info.toString());

具体记录的内容和如何分析,我们下面再讲,这里只要知道,logcat中的信息是这里记录的就好。这些必要的信息都存储在info中,最终会通过logcat打印出来。

3.3 记录ANR文件流程

dumpStackTraces入参讲解

这个主要是通过appNotResponding中的ActivityManagerService.dumpStackTraces方法来实现的,我们先看下这个方法的参数:

static File dumpStackTraces(ArrayList firstPids,
            ProcessCpuTracker processCpuTracker, SparseArray lastPids,
            ArrayList nativePids, StringWriter logExceptionCreatingFile,
            long[] firstPidOffsets, String subject) {
 tracesFile = createAnrDumpFile(tracesDir);

1.firstPids:第一等级要收集的进程PID,firstPids集合3.1有介绍。

2.processCpuTracker:CPU分析记录对象,这个方法中会用到

3.lastPids:第二等级要收集的进程PID集合。

4.nativePids:要收集的native进程的PID集合。

5.tracesFileException:传递进去一个输出流对象,供方法内写入,最终这些信息会写入到dropbox中

6.offsets:用来记录采集的第一个非系统进程的日志文件的偏移量。两个参数,分别记录写入之前和写入之后的文件大小。

7.annotation:ANR类型和原因

三个要dump进程的集合

dumpStackTraces方法中,首先初始化processCpuTracker,然后休眠200毫秒,这200毫秒主要是让CPU去采集和分析CPU状态。然后遍历lastPids,获取其中CPU占用率最高的5个进程加入到extraPids集合中,等于说对lastPids集合进行了一遍处理。所以,我们现在有三个PID集合了:firstPids,nativePids和extraPids,这三个集合就是我们要dump进程的。

生成trace文件

生成trace文件,并且把ARN原因写入

File tracesFile;
        try {
            tracesFile = createAnrDumpFile(tracesDir);
        } catch (IOException e) {
            Slog.w(TAG, "Exception creating ANR dump file:", e);
            if (logExceptionCreatingFile != null) {
                logExceptionCreatingFile.append("----- Exception creating ANR dump file -----\n");
                e.printStackTrace(new PrintWriter(logExceptionCreatingFile));
            }
            return null;
        }

        if (subject != null) {
            try (FileOutputStream fos = new FileOutputStream(tracesFile, true)) {
                String header = "Subject: " + subject + "\n";
                fos.write(header.getBytes(StandardCharsets.UTF_8));
            } catch (IOException e) {
                Slog.w(TAG, "Exception writing subject to ANR dump file:", e);
            }
        }

采集进程堆栈信息

最后通过重写的dumpStackTraces方法,分别回去采集firstPids,nativePids,extraPids集合中对应进程的堆栈状态。

这里系统会设置上限20秒的时间来允许对集合中的进程PID进程捕获,如果超过20秒,则认为捕获失败。

我们先看遍历firstPids集合的操作,简化的代码如下:

           for (int i = 0; i < num; i++) {
                final int pid = firstPids.get(i);
                ...

                Slog.i(TAG, "Collecting stacks for pid " + pid);
                final long timeTaken = dumpJavaTracesTombstoned(pid, tracesFile,
                                                                remainingTime);

                remainingTime -= timeTaken;
                if (remainingTime <= 0) {
                    Slog.e(TAG, "Aborting stack trace dump (current firstPid=" + pid
                            + "); deadline exceeded.");
                    return firstPidStart >= 0 ? new Pair<>(firstPidStart, firstPidEnd) : null;
                }

                if (firstPid) {
                    firstPidEnd = tf.length();
                }
                if (DEBUG_ANR) {
                    Slog.d(TAG, "Done with pid " + firstPids.get(i) + " in " + timeTaken + "ms");
                }
            }

dumpJavaTracesTombstoned中,会调用Debug类,最终会通知native层去获取对应PID的进程堆栈。现在常用的使用信号量监听ANR也是在此时发送的。

遍历nativePids集合的操作其实和第一步操作类似,只不过最终调用的就是dumpNativeBacktraceToFileTimeout方法了。

而最后遍历extraPids集合,操作和第一步是一样的了。

具体日志和代码的对应,我们第五章再讲。

四.ANR弹框是如何显示出来的

这一章其实是很早之前写的,比较简单,就直接复用之前的内容,不再改了。对于新手排查系统问题,是一个很好的参考。

4.1找到弹框对应类

想知道弹框是如何显示的,最简单的方式就是先造一个程序无响应的场景,然后打开uiautomatorviewer工具查看布局,这样我们就可以找到对应的XMl然后在源码中搜索,就能找到我们的目标类了。

首先我们看到的是下面这样一个场景,先基于android-29的源码全局文字:没有响应,关闭应用,等待,没有搜到,正常,这些文字应该是配置在xml中的。换个思路搜id,关闭应用对应的是aerr_close,全局搜果然搜到了,在AppNotRespondingDialog类中。对应的类名正好也符合:APP无响应。所以说我们的入口找到了。

另外我们看一下AppNotRespondingDialog的包名:com.android.server.am,说明是跑在Serve端的。

ANR系列之一:ANR显示和日志生成原理讲解_第2张图片

4.2.查找AppNotRespondingDialog引用

发现有是AppErrors的handleShowAnrUi方法调用的。我们先看整个调用流程:

1.InputManagerService的notifyANR方法

 // Native callback.
    private long notifyANR(IBinder token, String reason) {
        return mWindowManagerCallbacks.notifyANR(
                token, reason);
    }

好吧,调用触发点是naive。
2.InputManagerCallback的notifyANR方法

 @Override
    public long notifyANR(IBinder token, String reason) {
        AppWindowToken appWindowToken = null;
        WindowState windowState = null;
        boolean aboveSystem = false;
        synchronized (mService.mGlobalLock) {
           ...
            //存储日志的
            mService.saveANRStateLocked(appWindowToken, windowState, reason);
        }

        // All the calls below need to happen without the WM lock held since they call into AM.
        mService.mAtmInternal.saveANRState(reason);

        if (appWindowToken != null && appWindowToken.appToken != null) {
            ...
        } else if (windowState != null) {
            //通知弹出ANR的框
            long timeout = mService.mAmInternal.inputDispatchingTimedOut(
                    windowState.mSession.mPid, aboveSystem, reason);
            if (timeout >= 0) {
                // The activity manager declined to abort dispatching.
                // Wait a bit longer and timeout again later.
                return timeout * 1000000L; // nanoseconds
            }
        }
        return 0; // abort dispatching
    }

3.ActivityManagerService.LocalService的inputDispatchingTimedOut方法

4.ActivityManagerService的inputDispatchingTimedOut方法

5.ProcessRecord的appNotResponding方法

 if (mService.mUiHandler != null) {
                // Bring up the infamous App Not Responding dialog
                Message msg = Message.obtain();
                msg.what = ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG;
                msg.obj = new AppNotRespondingDialog.Data(this, aInfo, aboveSystem);

                mService.mUiHandler.sendMessage(msg);
            }

6.ProcessRecord的appNotResponding方法

case SHOW_NOT_RESPONDING_UI_MSG: {
                mAppErrors.handleShowAnrUi(msg);
                ensureBootCompleted();
            } break;

7.ProcessRecord的appNotResponding方法

void handleShowAnrUi(Message msg) {
        Dialog dialogToShow = null;
        List packageList = null;
        synchronized (mService) {
            AppNotRespondingDialog.Data data = (AppNotRespondingDialog.Data) msg.obj;
            final ProcessRecord proc = data.proc;
            if (proc == null) {
                Slog.e(TAG, "handleShowAnrUi: proc is null");
                return;
            }
            ...
            if (mService.mAtmInternal.canShowErrorDialogs() || showBackground) {
                dialogToShow = new AppNotRespondingDialog(mService, mContext, data);
                proc.anrDialog = dialogToShow;
            } else {
                MetricsLogger.action(mContext, MetricsProto.MetricsEvent.ACTION_APP_ANR,
                        AppNotRespondingDialog.CANT_SHOW);
                // Just kill the app if there is no dialog to be shown.
                mService.killAppAtUsersRequest(proc, null);
            }
        }
        // If we've created a crash dialog, show it without the lock held
        if (dialogToShow != null) {
            dialogToShow.show();//显示程序无响应的框
        }
        ...
    }

总结一下,流程图如下:

ANR系列之一:ANR显示和日志生成原理讲解_第3张图片

五.发生ANR时,会有哪些日志,以及如何解读

上面有讲到,ANR发生时,会有三种信息来记录, 分别是logcat以及ANR的日志。其实还有一些logcat中的信息,可以辅助我们判断ANR流程何时开始,合适采集,合适结束等等,所以,我们一一来讲解。

5.1  logcat中日志信息

 首先,我们先梳理一下整个logcat中ANR日志的完整流程:

2022-10-08 15:03:32.023 24194-24330/? W/InputDispatcher: Window ... is not responding. Waited 5007ms for MotionEvent    //InputDispatcher::onAnrLocked中,识别到输入事件响应超时
2022-10-08 15:03:32.023 24194-24330/? W/InputDispatcher: Canceling events ... it is unresponsive                        //InputDispatcher::cancelEventsForAnrLocked中,InputDispatcher进行ANR流程分法
2022-10-08 15:03:32.023 24194-24330/? I/WindowManager: ANR in Window{}                                                  //AnrController.java的notifyWindowUnresponsive中,window中超时,准备通知AMS
2022-10-08 15:03:32.146 24194-31707/? I/ActivityManager: dumpStackTraces pids={7705=true, 7740=true, ...}               //ActivityManagerService.java的dumpStackTraces中,准备收集的进程集合
2022-10-08 15:03:32.471 24194-31707/? I/ActivityManager: Dumping to /data/anr/anr_2022-10-08-15-03-32-471               //ActivityManagerService.java的dumpStackTraces中,准备通知APP收集ANR日志
2022-10-08 15:03:32.471 24194-31707/? I/ActivityManager: Collecting stacks for pid 31655                                //ActivityManagerService.java的dumpStackTraces中,待收集的进程PID
2022-10-08 15:03:32.472 24194-31707/? I/system_server: libdebuggerd_client: started dumping process 31655               //debuggerd_client.cpp的debuggerd_trigger_dump中,native中准备通知待收集的进程
2022-10-08 15:03:32.472 31655-31660/com.xt.client I/com.xt.client: Thread[6,tid=31660,...]: reacting to signal 3        //APP进程收到信号,准备开始写入
2022-10-08 15:03:32.568 31655-31660/com.xt.client I/com.xt.client: Wrote stack traces to tombstoned                     //APP进程写入完成,通知系统侧
2022-10-08 15:03:32.568 24194-31707/? I/system_server: libdebuggerd_client: done dumping process 31655                  //系统进程收到通知,进行保存
2022-10-08 15:03:32.589 24194-31707/? I/ActivityManager: Collecting stacks for pid 24194                                //ActivityManagerService.java dumpStackTraces中,准备开始收集下一个进程的PID,这个进程为系统进程system_server
2022-10-08 15:03:32.601 24194-31707/? I/system_server: libdebuggerd_client: started dumping process 24194               //debuggerd_client.cpp的debuggerd_trigger_dump中,native中准备通知待收集的进程
2022-10-08 15:03:32.603 24194-24202/? I/system_server: Thread[2,...]: reacting to signal 3                              //system_server进程收到信号,准备开始写入
2022-10-08 15:03:32.858 24194-24202/? I/system_server: Wrote stack traces to tombstoned                                 //system_server进程写入完成,通知系统侧
...继续收集其它java进程的信息
2022-10-08 15:03:34.972 24194-31707/? I/ActivityManager: Collecting stacks for native pid 660                           //准备收集native进程660的信息
2022-10-08 15:03:34.973 24194-31707/? I/system_server: libdebuggerd_client: started dumping process 660                 //debuggerd_client.cpp的debuggerd_trigger_dump中,native中准备通知待收集的进程
2022-10-08 15:03:34.974 660-660/? I/libc: Requested dump for pid 660 (binder:660_2)                                     //debuggerd_handler.cpp的log_signal_summary中,native中准备通知待收集的进程
...继续收集其它native进程的信息
2022-10-08 15:03:36.273 24194-31707/? I/ActivityManager: Done dumping                                                   //全部收集完成
2022-10-08 15:03:36.274 24194-31707/? E/ActivityManager: ANR in com.xt.client (com.xt.client/.MainActivity)             //打印发生ANR时收集到的日志
...

这里我们以输入类型为例,

1.InputDispatcher中识别到发生ANR了,通过回调java方法,通知到InputManagerService。如InputDispatcher开头的两行日志所记录。

2.InputManagerService中,根据WindowState找到对应的归属,先交给ActivityRecord处理,最终转交到AMS中。如WindowManager开头的一行日志所记录。

3.AMS中委托给AnrHelper.java进行处理,最终会通知到ProcessErrorStateRecord.java的appNotResponding方法,该方法中,会进行三个操作:

首先,收集需要dump的进程列表,包含java进程,native进程,其它进程,采集规则是发生ANR的进程+最近活动的进程。

其次,进行采集操作。这个会交给ActivityManagerService.java的dumpStackTraces方法执行。

最后,显示无响应的弹框。

4.采集日志会进入到ActivityManagerService.java的dumpStackTraces方法中,此时先打印ActivityManager开头的两行日志,分别代表着开始采集和生成日志文件。下面这两个日志分别记录将要采集的进程ID以及日志文件路径:

2022-10-08 15:03:32.146 24194-31707/? I/ActivityManager: dumpStackTraces pids={7705=true, 7740=true, ...}               //ActivityManagerService.java的dumpStackTraces中,准备收集的进程集合
2022-10-08 15:03:32.471 24194-31707/? I/ActivityManager: Dumping to /data/anr/anr_2022-10-08-15-03-32-471               //ActivityManagerService.java的dumpStackTraces中,准备通知APP收集ANR日志

5.因为需要dump多个进程的,所以开始采集以及采集完成的日志会有很多,如下面所示,分别代表开始采集,发送信号通知,收到信号通知,采集完成以及系统收到采集完成的通知。这种日志会经历很多轮。

2022-10-08 15:03:32.471 24194-31707/? I/ActivityManager: Collecting stacks for pid 31655                                //ActivityManagerService.java的dumpStackTraces中,待收集的进程PID
2022-10-08 15:03:32.472 24194-31707/? I/system_server: libdebuggerd_client: started dumping process 31655               //debuggerd_client.cpp的debuggerd_trigger_dump中,native中准备通知待收集的进程
2022-10-08 15:03:32.472 31655-31660/com.xt.client I/com.xt.client: Thread[6,tid=31660,...]: reacting to signal 3        //APP进程收到信号,准备开始写入
2022-10-08 15:03:32.568 31655-31660/com.xt.client I/com.xt.client: Wrote stack traces to tombstoned                     //APP进程写入完成,通知系统侧
2022-10-08 15:03:32.568 24194-31707/? I/system_server: libdebuggerd_client: done dumping process 31655                  //系统进程收到通知,进行保存

6.采集native的信息

2022-10-08 15:03:34.972 24194-31707/? I/ActivityManager: Collecting stacks for native pid 660                           //准备收集native进程660的信息
2022-10-08 15:03:34.973 24194-31707/? I/system_server: libdebuggerd_client: started dumping process 660                 //debuggerd_client.cpp的debuggerd_trigger_dump中,native中准备通知待收集的进程
2022-10-08 15:03:34.974 660-660/? I/libc: Requested dump for pid 660 (binder:660_2)                                     //debuggerd_handler.cpp的log_signal_summary中,native中准备通知待收集的进程

7.所有进程dump完成。

2022-10-08 15:03:36.273 24194-31707/? I/ActivityManager: Done dumping                                                   //全部收集完成

5.2  logcat中信息记录

上面的流程采集完成后,会一次性打印采集到的ANR的信息,如下面所示:

我们这一小节就重点介绍下,这份logcat中的ANR信息,如何解读。

2022-10-08 15:03:36.274 24194-31707/? E/ActivityManager: ANR in com.xt.client (com.xt.client/.MainActivity)
    PID: 31655    //进程ID
    Reason: Input dispatching timed out (dff7ad com.xt.client/com.xt.client.MainActivity (server) is not responding. Waited 5007ms for MotionEvent)    //原因
    Parent: com.xt.client/.MainActivity    //触发点
    ErrorId: bcef99f2-c232-4bee-bdd8-233b4ff598f0
    Frozen: false
    Load: 0.54 / 0.15 / 0.12        //ANR发生之前的1分钟,5分钟,15分钟CPU占用率。0.54代表0.54%
    ----- Output from /proc/pressure/memory -----
    some avg10=0.00 avg60=0.00 avg300=0.00 total=18377466
    full avg10=0.00 avg60=0.00 avg300=0.00 total=10458714
    ----- End output from /proc/pressure/memory -----
    //ANR发生之前的129秒内CPU占用率
    CPU usage from 129271ms to 0ms ago (2022-10-08 15:01:22.846 to 2022-10-08 15:03:32.116):
      //user代表用户态占用,kernel代表内核态占用
      1.7% 24194/system_server: 1.2% user + 0.5% kernel / faults: 3611 minor
      1.5% 689/surfaceflinger: 1.1% user + 0.4% kernel / faults: 441 minor
      1.4% 24426/com.android.systemui: 1% user + 0.3% kernel / faults: 2682 minor
      1% 28207/kworker/u16:6: 0% user + 1% kernel / faults: 18 minor
      0.8% 31505/kworker/u16:3: 0% user + 0.8% kernel
      0.7% 27789/kworker/u16:1: 0% user + 0.7% kernel / faults: 2 minor
      0.6% 29978/com.tencent.mm: 0.1% user + 0.4% kernel / faults: 66 minor
      0.6% 1122/[email protected]: 0.3% user + 0.2% kernel / faults: 1190 minor
      0.3% 24523/com.breel.wallpapers20a: 0.2% user + 0% kernel / faults: 1291 minor
      0.4% 24828/com.google.android.apps.nexuslauncher: 0.3% user + 0% kernel / faults: 1984 minor
      0.1% 691/[email protected]: 0% user + 0% kernel / faults: 133 minor
      0.3% 18902/com.google.android.gms.persistent: 0.2% user + 0.1% kernel / faults: 2221 minor
      0.3% 30388/com.tencent.mm:tools: 0.2% user + 0% kernel / faults: 93 minor
      ...

具体解释如下:

行数 内容 解释
1
ANR in com.test.sceneengineservice
发生ANR应用的包名
2
PID: 2468
发生ANR的进程ID,请注意, 此日志是system_server进程打印的
3
Reason: executing service com.test.sceneengineservice/.SceneEngineService 
发生ANR的的原因
4
Load: 0.0 / 0.0 / 0.0 
发生ANR之前CPU的负载,分别代表发生前的1分钟/5分钟/15分钟
5
CPU usage from 14781ms to 0ms ago (2022-09-21 07:32:34.092 to 2022-09-21 07:32:48.873):
这期间的CPU占用。从发生之前的14秒到发生时
6
E ActivityManager:   5.6% 1000/system_server: 1.8% user + 3.7% kernel / faults: 3555 minor
这段时间内具体的CPU占用率。user代表应用侧的占用,kernel代表内核侧的占用。

5.3  ANR文件中记录的信息

文件内容信息如下:

Subject: Input dispatching timed out (4dff4f9 com.xt.client/com.xt.client.MainActivity (server) is not responding. Waited 5001ms for MotionEvent)

--- CriticalEventLog ---
capacity: 20
events <
  timestamp_ms: 1665198570560
  anr <
    pid: 20001
    process: "com.xt.client"
    process_class: 1
    subject: "Input dispatching timed out (b9701e com.xt.client/com.xt.client.MainActivity (server) is not responding. Waited 5007ms for MotionEvent)"
    uid: 10244
  >
>
events <
  timestamp_ms: 1665198580245
  java_crash <
    exception_class: "java.lang.SecurityException"
    pid: 20001
    process: "com.xt.client"
    process_class: 1
    uid: 10244
  >
>
timestamp_ms: 1665198615793
window_ms: 300000


----- pid 20223 at 2022-10-08 11:10:16.144897760+0800 -----
Cmd line: com.xt.client
Build fingerprint: 'google/sunfish/sunfish:13/TP1A.220905.004/8927612:user/release-keys'
ABI: 'arm64'
Build type: optimized
/** 类加载信息-开始 class_linker.cc中写入,记录类加载信息 */
Zygote loaded classes=21523 post zygote classes=573
Dumping registered class loaders
#0 dalvik.system.PathClassLoader: [], parent #1
#1 java.lang.BootClassLoader: [], no parent
#2 dalvik.system.PathClassLoader: [/data/app/~~9kDnfQhhAsXX5wlTVjeRlA==/com.xt.client-DT6oy9kys_nok1ibzgGmFQ==/base.apk!classes12.dex:/data/data/com.xt.client/code_cache/.overlay/base.apk/classes14.dex:/data/app/~~9kDnfQhhAsXX5wlTVjeRlA==/com.xt.client-DT6oy9kys_nok1ibzgGmFQ==/base.apk:/data/app/~~9kDnfQhhAsXX5wlTVjeRlA==/com.xt.client-DT6oy9kys_nok1ibzgGmFQ==/base.apk!classes11.dex:/data/app/~~9kDnfQhhAsXX5wlTVjeRlA==/com.xt.client-DT6oy9kys_nok1ibzgGmFQ==/base.apk!classes3.dex:/data/app/~~9kDnfQhhAsXX5wlTVjeRlA==/com.xt.client-DT6oy9kys_nok1ibzgGmFQ==/base.apk!classes8.dex:/data/data/com.xt.client/code_cache/.overlay/base.apk/classes9.dex:/data/app/~~9kDnfQhhAsXX5wlTVjeRlA==/com.xt.client-DT6oy9kys_nok1ibzgGmFQ==/base.apk!classes5.dex:/data/app/~~9kDnfQhhAsXX5wlTVjeRlA==/com.xt.client-DT6oy9kys_nok1ibzgGmFQ==/base.apk!classes6.dex], parent #1
Done dumping class loaders
Classes initialized: 0 in 0
/** 类加载信息-结束 */

/** 引用表-开始 intern_table.cc中写入,记录强弱引用数量 */
Intern table: 30098 strong; 1210 weak
/** 引用表-结束 */

/** JNI加载信息-开始 java_vm_ext.cc中写入,记录加载的so */
JNI: CheckJNI is on; globals=400 (plus 68 weak)
Libraries: libandroid.so libaudioeffect_jni.so libcompiler_rt.so libframework-connectivity-jni.so libframework-connectivity-tiramisu-jni.so libicu_jni.so libjavacore.so libjavacrypto.so libjnigraphics.so libmedia_jni.so libopenjdk.so librs_jni.so librtp_jni.so libsoundpool.so libstats_jni.so libwebviewchromium_loader.so (16)
/** JNI加载信息-结束  */

/** java虚拟机堆状态-开始 heap.cc中写入,记录当前的内存状态 */
Heap: 74% free, 4180KB/16MB; 183984 objects
Dumping cumulative Gc timings
Average major GC reclaim bytes ratio inf over 0 GC cycles
Average major GC copied live bytes ratio 0.732525 over 4 major GCs
Cumulative bytes moved 29923016
Cumulative objects moved 547043
Peak regions allocated 60 (15MB) / 1024 (256MB)
Total madvise time 7.031ms
Average minor GC reclaim bytes ratio nan over 0 GC cycles
Average minor GC copied live bytes ratio nan over 0 minor GCs
Cumulative bytes moved 0
Cumulative objects moved 0
Peak regions allocated 60 (15MB) / 1024 (256MB)
Total number of allocations 183984
Total bytes allocated 4180KB
Total bytes freed 0B
Free memory 11MB
Free memory until GC 11MB
Free memory until OOME 251MB
Total memory 16MB
Max memory 256MB
Zygote space size 7760KB
Total mutator paused time: 0
Total time waiting for GC to complete: 2.448us
Total GC count: 0
Total GC time: 0
Total blocking GC count: 0
Total blocking GC time: 0
Total pre-OOME GC count: 0
Native bytes total: 14386550 registered: 2142838
Total native bytes at last GC: 4357592
/** java虚拟机堆状态-结束 */

/** OTA文件记录-开始 oat_file_manager.cc中写入,记录当前使用到的OTA文件 */
/data/app/~~9kDnfQhhAsXX5wlTVjeRlA==/com.xt.client-DT6oy9kys_nok1ibzgGmFQ==/oat/arm64/base.vdex: verify
/system/framework/oat/arm64/android.hidl.manager-V1.0-java.odex: verify
/system/framework/oat/arm64/android.test.base.odex: verify
/system/framework/oat/arm64/android.hidl.base-V1.0-java.odex: verify
/** OTA文件记录-结束 */

/** JIT信息记录-开始 jit.cc中写入,记录JIT的信息 */
Current JIT code cache size (used / resident): 40KB / 60KB
Current JIT data cache size (used / resident): 28KB / 60KB
Zygote JIT code cache size (at point of fork): 22KB / 32KB
Zygote JIT data cache size (at point of fork): 16KB / 32KB
Current JIT mini-debug-info size: 19KB
Current JIT capacity: 128KB
Current number of JIT JNI stub entries: 0
Current number of JIT code cache entries: 46
Total number of JIT baseline compilations: 34
Total number of JIT optimized compilations: 4
Total number of JIT compilations for on stack replacement: 1
Total number of JIT code cache collections: 1
Memory used for stack maps: Avg: 377B Max: 11KB Min: 32B
Memory used for compiled code: Avg: 1156B Max: 26KB Min: 164B
Memory used for profiling info: Avg: 187B Max: 4416B Min: 24B
Start Dumping Averages for 56 iterations for JIT timings
Compiling optimized:	Sum: 49.228ms Avg: 879.071us
Compiling baseline:	Sum: 27.108ms Avg: 484.071us
TrimMaps:	Sum: 2.699ms Avg: 48.196us
Code cache collection:	Sum: 380us Avg: 6.785us
Compiling OSR:	Sum: 240us Avg: 4.285us
Done Dumping Averages
Memory used for compilation: Avg: 154KB Max: 4050KB Min: 16KB
ProfileSaver total_bytes_written=0
ProfileSaver total_number_of_writes=0
ProfileSaver total_number_of_code_cache_queries=0
ProfileSaver total_number_of_skipped_writes=0
ProfileSaver total_number_of_failed_writes=0
ProfileSaver total_ms_of_sleep=5000
ProfileSaver total_ms_of_work=0
ProfileSaver total_number_of_hot_spikes=0
ProfileSaver total_number_of_wake_ups=0
/** JIT信息记录-结束 */

/** ART信息指标-开始 metrics_common.cc中写入,记录ART虚拟机的一些关键指标 */
*** ART internal metrics ***
  Metadata:
    timestamp_since_start_ms: 19952
  Metrics:
    ClassLoadingTotalTime: count = 17358
    ClassVerificationTotalTime: count = 15303
    ClassVerificationCount: count = 28
    WorldStopTimeDuringGCAvg: count = 0
    YoungGcCount: count = 0
    FullGcCount: count = 0
    TotalBytesAllocated: count = 2349168
    TotalGcCollectionTime: count = 0
    YoungGcThroughputAvg: count = 0
    FullGcThroughputAvg: count = 0
    YoungGcTracingThroughputAvg: count = 0
    FullGcTracingThroughputAvg: count = 0
    JitMethodCompileTotalTime: count = 28492
    JitMethodCompileCount: count = 39
    YoungGcCollectionTime: range = 0...60000, buckets: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    FullGcCollectionTime: range = 0...60000, buckets: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    YoungGcThroughput: range = 0...10000, buckets: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    FullGcThroughput: range = 0...10000, buckets: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    YoungGcTracingThroughput: range = 0...10000, buckets: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    FullGcTracingThroughput: range = 0...10000, buckets: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
*** Done dumping ART internal metrics ***
/** ART信息指标-结束 */

/** 线程信息-开始 thread_list.cc中写入,记录当前进程的线程信息 */
suspend all histogram:	Sum: 112us 99% C.I. 0.083us-54us Avg: 4.666us Max: 54us
DALVIK THREADS (19):
"Signal Catcher" daemon prio=10 tid=6 Runnable
  | group="system" sCount=0 ucsCount=0 flags=0 obj=0x12c801a0 self=0xb400007193e16a60
  | sysTid=20229 nice=-20 cgrp=system sched=0/0 handle=0x7004dfbcb0
  | state=R schedstat=( 17930731 0 5 ) utm=0 stm=0 core=7 HZ=100
  | stack=0x7004d04000-0x7004d06000 stackSize=991KB
  | held mutexes= "mutator lock"(shared held)
  native: #00 pc 00000000005374cc  /apex/com.android.art/lib64/libart.so (art::DumpNativeStack(std::__1::basic_ostream >&, int, BacktraceMap*, char const*, art::ArtMethod*, void*, bool)+128) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
  native: #01 pc 00000000006f0d94  /apex/com.android.art/lib64/libart.so (art::Thread::DumpStack(std::__1::basic_ostream >&, bool, BacktraceMap*, bool) const+236) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
  native: #02 pc 00000000006fe620  /apex/com.android.art/lib64/libart.so (art::DumpCheckpoint::Run(art::Thread*)+208) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
  native: #03 pc 00000000003641d4  /apex/com.android.art/lib64/libart.so (art::ThreadList::RunCheckpoint(art::Closure*, art::Closure*)+440) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
  native: #04 pc 00000000006fcdc0  /apex/com.android.art/lib64/libart.so (art::ThreadList::Dump(std::__1::basic_ostream >&, bool)+280) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
  native: #05 pc 00000000006fc7b4  /apex/com.android.art/lib64/libart.so (art::ThreadList::DumpForSigQuit(std::__1::basic_ostream >&)+292) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
  native: #06 pc 00000000006d57e4  /apex/com.android.art/lib64/libart.so (art::Runtime::DumpForSigQuit(std::__1::basic_ostream >&)+184) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
  native: #07 pc 00000000006e1928  /apex/com.android.art/lib64/libart.so (art::SignalCatcher::HandleSigQuit()+468) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
  native: #08 pc 0000000000573c40  /apex/com.android.art/lib64/libart.so (art::SignalCatcher::Run(void*)+264) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
  native: #09 pc 00000000000b62b8  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+208) (BuildId: 058e3ec96fa600fb840a6a6956c6b64e)
  native: #10 pc 0000000000052fb8  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64) (BuildId: 058e3ec96fa600fb840a6a6956c6b64e)
  (no managed stack frames)

"main" prio=5 tid=1 Sleeping
  | group="main" sCount=1 ucsCount=0 flags=1 obj=0x72424e88 self=0xb400007193e0a7b0
  | sysTid=20223 nice=-10 cgrp=system sched=0/0 handle=0x72dd5244f8
  | state=S schedstat=( 331627619 74690560 345 ) utm=30 stm=2 core=7 HZ=100
  | stack=0x7fcfb32000-0x7fcfb34000 stackSize=8188KB
  | held mutexes=
  at java.lang.Thread.sleep(Native method)
  - sleeping on <0x0e92c80b> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:450)
  - locked <0x0e92c80b> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:355)
  at com.xt.client.MainActivity.doActionWithoutClass(MainActivity.java:201)
  at com.xt.client.MainActivity.access$000(MainActivity.java:85)
  at com.xt.client.MainActivity$2.onItemClick(MainActivity.java:177)
  at com.xt.client.inter.RecyclerItemClickListener.onInterceptTouchEvent(RecyclerItemClickListener.java:34)
  at androidx.recyclerview.widget.RecyclerView.findInterceptingOnItemTouchListener(RecyclerView.java:3286)
  at androidx.recyclerview.widget.RecyclerView.dispatchToOnItemTouchListeners(RecyclerView.java:3257)
  at androidx.recyclerview.widget.RecyclerView.onTouchEvent(RecyclerView.java:3421)
  at android.view.View.dispatchTouchEvent(View.java:15004)
  at android.view.ViewGroup.dispatchTransformedTouchEvent(ViewGroup.java:3115)
  at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:2788)
  at android.view.ViewGroup.dispatchTransformedTouchEvent(ViewGroup.java:3121)
  at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:2802)
  at android.view.ViewGroup.dispatchTransformedTouchEvent(ViewGroup.java:3121)
  at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:2802)
  at android.view.ViewGroup.dispatchTransformedTouchEvent(ViewGroup.java:3121)
  at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:2802)
  at android.view.ViewGroup.dispatchTransformedTouchEvent(ViewGroup.java:3121)
  at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:2802)
  at com.android.internal.policy.DecorView.superDispatchTouchEvent(DecorView.java:498)
  at com.android.internal.policy.PhoneWindow.superDispatchTouchEvent(PhoneWindow.java:1899)
  at android.app.Activity.dispatchTouchEvent(Activity.java:4262)
  at com.android.internal.policy.DecorView.dispatchTouchEvent(DecorView.java:456)
  at android.view.View.dispatchPointerEvent(View.java:15263)
  at android.view.ViewRootImpl$ViewPostImeInputStage.processPointerEvent(ViewRootImpl.java:6548)
  at android.view.ViewRootImpl$ViewPostImeInputStage.onProcess(ViewRootImpl.java:6348)
  at android.view.ViewRootImpl$InputStage.deliver(ViewRootImpl.java:5804)
  at android.view.ViewRootImpl$InputStage.onDeliverToNext(ViewRootImpl.java:5861)
  at android.view.ViewRootImpl$InputStage.forward(ViewRootImpl.java:5827)
  at android.view.ViewRootImpl$AsyncInputStage.forward(ViewRootImpl.java:5992)
  at android.view.ViewRootImpl$InputStage.apply(ViewRootImpl.java:5835)
  at android.view.ViewRootImpl$AsyncInputStage.apply(ViewRootImpl.java:6049)
  at android.view.ViewRootImpl$InputStage.deliver(ViewRootImpl.java:5808)
  at android.view.ViewRootImpl$InputStage.onDeliverToNext(ViewRootImpl.java:5861)
  at android.view.ViewRootImpl$InputStage.forward(ViewRootImpl.java:5827)
  at android.view.ViewRootImpl$InputStage.apply(ViewRootImpl.java:5835)
  at android.view.ViewRootImpl$InputStage.deliver(ViewRootImpl.java:5808)
  at android.view.ViewRootImpl.deliverInputEvent(ViewRootImpl.java:8857)
  at android.view.ViewRootImpl.doProcessInputEvents(ViewRootImpl.java:8808)
  at android.view.ViewRootImpl.enqueueInputEvent(ViewRootImpl.java:8777)
  at android.view.ViewRootImpl$WindowInputEventReceiver.onInputEvent(ViewRootImpl.java:8980)
  at android.view.InputEventReceiver.dispatchInputEvent(InputEventReceiver.java:267)
  at android.os.MessageQueue.nativePollOnce(Native method)
  at android.os.MessageQueue.next(MessageQueue.java:335)
  at android.os.Looper.loopOnce(Looper.java:161)
  at android.os.Looper.loop(Looper.java:288)
  at android.app.ActivityThread.main(ActivityThread.java:7898)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:936)

  ...
/** 线程信息-结束/

----- end 20223 -----

...

具体解释如下:

所在行 内容 解释
1 ”main“ 线程名
1 prio=5 线程优先级
1 tid=1 线程标识符
1 Native 线程状态,native是native中的线程状态,对应java中runnable
2 group="main" 线程所属的线程组
2 sCount=1 程被正常挂起的次数
2 dsCount=0 线程因调试而挂起次数
2 flags=1
2 obj=0x75c4db98 线程的Java对象的地址
2 self=0x7b08e14c00 线程本身的地址
3 sysTid=2468 Linux下的内核线程id
3 nice=0 线程的调度优先级
3 cgrp=default 调度属组
3 sched=0/0 线程的调度策略和优先级
3 handle=0x7b8e933548 线程的处理函数地址
4 state=S 调度状态
4 schedstat=( 562066147 3061697256 542 )

执行时间记录:

三个值分别表示线程在cpu上执行的时间、线程的等待时间和线程执行的时间片长度

4 utm=31 用户态CPU占用次数,由于频率是100,则占用31次就等于31*10ms=310ms
4 stm=24 内核态CPU占用次数
4 core=2 最后执行这个线程的CPU核序号
4 HZ=100 运行频率,一秒100次
5 stack=0x7fe841d000-0x7fe841f000 虚拟机栈地址
5 stackSize=8MB 虚拟机栈大小
6 at android.os.BinderProxy.transactNative(Native method) at android.os.BinderProxy.transact(Binder.java:1127) 此时采集到的APP对应线程的对应状态

5.4 DropBox中的ANR信息

dropBox中的日志文件更像是一个综合体,包括了logcat日志中的部分以及ANR文件中的部分。

ProcessErrorStateRecord的appNotResponding方法中,会通过以下的方法,把report信息和traceFile中的信息进行记录。report信息基本上就是上面logcat中的部分。

 mService.addErrorToDropBox("anr", mApp, mApp.processName, activityShortComponentName,
                parentShortComponentName, parentPr, null, report.toString(), tracesFile,
                null, new Float(loadingProgress), incrementalMetrics, errorId);

文件位置是:/data/system/dropbox/[email protected]

是一个压缩文件,解析后就是我们想要的内容了,大体的格式如下。

Process: com.beantechs.apm
PID: 10589
Flags: 0x20e8bf46
Package: com.xxx.apm v1 (1.0)
Foreground: Yes
Activity: com.xxx.apm/.TestActivity
Subject: Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago.  Wait queue length: 4.  Wait queue head age: 5571.8ms.)
Build: qti/gwm/gwm:9/PQ1A.190105.004/bean08231515:userdebug/test-keys

CPU usage from 141069ms to 0ms ago (2022-12-07 15:01:10.951 to 2022-12-07 15:03:32.020):
  20% 6354/adbd: 1.7% user + 18% kernel / faults: 4493599 minor
  5.8% 322/logd: 1.7% user + 4% kernel / faults: 26 minor
  4.3% 514/[email protected]: 0.6% user + 3.7% kernel
  3.9% 385/[email protected]: 1.7% user + 2.1% kernel / faults: 281 minor
  3.4% 497/[email protected]: 0.9% user + 2.4% kernel
  3.2% 1020/system_server: 1.3% user + 1.9% kernel / faults: 21547 minor 1 major
  3.2% 543/audioserver: 1.2% user + 1.9% kernel / faults: 4052 minor
  3.2% 26582/kworker/u16:1: 0% user + 3.2% kernel
  2.2% 4415/com.beantechs.datapipeline: 1.5% user + 0.7% kernel / faults: 41778 minor
  2.1% 4442/com.lx.usbaudiohid: 1% user + 1% kernel / faults: 2 minor
  2% 3376/com.beantechs.intelligentvehiclecontrol: 1.3% user + 0.7% kernel / faults: 5915 minor
  1.8% 9937/kworker/u17:1: 0% user + 1.8% kernel
 ...
10% TOTAL: 3.2% user + 6.4% kernel + 0% iowait + 0.7% irq + 0.1% softirq

----- pid 10589 at 2022-12-07 15:03:32 -----
Cmd line: com.xxxx.apm

六.扩展问题:

问题1:ANR发生后会立马弹出ANR框吗?

答:不会,2.3中讲到,触发了ANR并且采集完相关线程日志后,会延时5S在弹出ANR的框。

问题2:主线程卡顿一定会导致ANR吗?

答:不会。首先,卡顿一定要达到某个阈值才会触发。其次,一定要满足某个场景才可以。比如ActivityA跳转ActivityB之后,延时1S后主线程休眠20S,就不会ANR,因为输入事件已经得到响应,并不满足任何触发ANR的场景。

问题3:为什么ANR的日志traces.txt中,有时候会打印nativePollOnce的堆栈?

答:理论上,系统侧在发现达到超市事件后,会立马出发ANR检查。如果检查发生真的超时,会走ANR流程,捕获APP的堆栈状态并保存。但是实际执行中,创建文件,dump指定进程堆栈信息都需要时间,并且ANR流程的开始还会sleep200毫秒。

我们看3.1中日志,从15:03:32.023开始识别到ANR,到15:03:32.568结束dump完成APP侧信息。

2022-10-08 15:03:32.023 24194-24330/? W/InputDispatcher: Window ... is not responding. Waited 5007ms for MotionEvent    //InputDispatcher::onAnrLocked中,识别到输入事件响应超时
2022-10-08 15:03:32.568 31655-31660/com.xt.client I/com.xt.client: Wrote stack traces to tombstoned                     //APP进程写入完成,通知系统侧

整个流程大约要500多毫秒,其中包括:

1.InputDispatcher中识别到ANR,传递到AMS准备开始采集要100多毫秒

2.识别到ANR的时候,系统进程会休眠200毫秒采集CPU信息。

3.系统进程采集一些必要信息,会耗费一些时间,大约100毫秒左右。

4.客户端收到信号去采集ANR的时候,也需要时间,大约要100毫秒左右。

所以,漂移的是有上限时间的,正常漂移上限一般为500ms左右,即使极端的场景,应该也不会很长。

七.声明:

参考文章:

​​​​​​android input anr分析 - 知乎

你可能感兴趣的:(安卓-源码分析,小程序,html5)