SWT问题分析总结

文章目录

  • 1 概述
  • 2 SWT机制
    • 2.1 原理设计框图
    • 2.2 Watchdog的初始化
    • 2.3 Watchdog的运作
  • 3 导致 SWT 重启原因
  • 4 log分析
    • 4.1 搜索关键 watchdog
    • 4.2 搜索关键字 held by
    • 4.3 检查Binder的Server 端
      • 4.3.1 重要信息
      • 4.3.2 如何确定binder的对端
    • 4.4 Native 方法执行时间过长导致重启
    • 4.5 SurfaceFlinger 卡住导致重启
    • 4.6 Zygote Fork 进程时卡住
    • 4.7 Dump 时间过长
  • 5 案例

1 概述

一般手机异常reboot的原因:
Reboot rootcause

1.JE on the system process
   search keywords “FATAL EXCEPTION IN SYSTEM PROCESS:”in main_log

2.NE on the system process
   search keywords “Fatal signal”in main_log

3.SWT on the system process
   Search keywords “WATCHDOG KILLING SYSTEM PROCESS”in main_log

4.KE occurs
   Search keywords “Kernel panic”in kernel_log

Android SWT 即 Android Software Watchdog Timeout。

System Server进程是Android的一个核心进程,里面为APP运行提供了核心的服务。如果System Server的一些核心服务和重要线程卡住,就会导致相应的功能异常。

如手机发生hang机,输入无响应,无法启动APP等一些不正常的情况。而且,如果没有一种机制,让这些服务复位的话,那么将严重影响客户体验。尤其是当前大多数手机把电池封装在手机里面的这种,想拨电池重启都很难。

所以有必要在核心服务和核心线程卡住的时候,让系统有自动复位的机会。于是,google引入了Sytem Server watchdog机制。这个机制来监控核心服务和核心线程是否卡住。

Framework层的Watchdog监控系统的重要服务和进程是否死锁,例如AMS、WMS、PMS、MS、IMS等。在解析init.rc时初始化和启动,通过addMonitor添加需要监控对象,然后开启30s循环,如果检测到死锁,则dumpTrace并再检测一轮,如果还是死锁,则dumpTrace并杀掉systemServer让zygote重启。

watchdog is a thread of system_server, it will mornitor some core service of system_server, such as:
AcitivityManagerService,WindowManagerService,PowerManagerService

SWT主要用来监控SystemServer 重要线程/Service 的运行情况。如果判断阻塞 60s ,就会把系统重启,来保证系统恢复正常状态.

2 SWT机制

2.1 原理设计框图

SWT问题分析总结_第1张图片
最开始设计只是在main looper 里面执行register 的monitor 对象 的monitor 方法. 后续Google 改进,通过HandlerChecker 来达成,在HandlerChecker 中审查注入的montior 对象是否能快速执行。即在foreground thread 来执行register 的monitor 对象,而对于其他的线程,则是审查规定时间内是否可以达到idle,不是一直卡死在某个message执行上。
需要注意的是, SystemServer Watchdog启动是在SystemServer init 的后期, 如果SystemServer 在init 的过程中卡死了,那么就意味着watchdog 不会有任何的作用.

2.2 Watchdog的初始化

Android的Watchdog是一个单例线程,在System Server启动时就会init &start Watchdog:

 private void startOtherServices() {
            ...
            traceBeginAndSlog("InitWatchdog");
            final Watchdog watchdog = Watchdog.getInstance();
            watchdog.init(context, mActivityManagerService);
            traceEnd();
            ...
            traceBeginAndSlog("StartWatchdog");
            Watchdog.getInstance().start();
            traceEnd();
            ...
    }

Watchdog在初始化时,会构建很多HandlerChecker,大致可以分为两类:

Monitor Checker,用于检查是Monitor对象可能发生的死锁, AMS, PKMS, WMS等核心的系统服务都是Monitor对象。

在代码里搜索关键字addMonitor,会检测如下信息:

LINUX/android/frameworks/base/services/core/java/com/android/server/input/
InputManagerService.java	349 Watchdog.getInstance().addMonitor(this); in start()
LINUX/android/frameworks/base/services/core/java/com/android/server/wm/
WindowManagerService.java	1161 Watchdog.getInstance().addMonitor(this); in WindowManagerService()
LINUX/android/frameworks/base/services/core/java/com/android/server/am/
ActivityManagerService.java	2988 Watchdog.getInstance().addMonitor(this); in ActivityManagerService()
LINUX/android/frameworks/base/services/core/java/com/android/server/media/
MediaRouterService.java	95 Watchdog.getInstance().addMonitor(this); in MediaRouterService()
......

各个server的monitor函数也只是检测是否可以获得要检测的锁对象,这些service通过
Watchdog.getInstance().addMonitor(this)将自己(实现了Watchdog.Monitor)添加到
Watchdog.mMonitorChecker.mMonitors列表中,该列表会不断调用Monitor.Monitor()函数。

 WindowManagerService.java  
    // Called by the heartbeat to ensure locks are not held indefnitely (for deadlock detection).
    @Override
    public void monitor() {
        synchronized (mWindowMap) { }
    }

具体看各个service中实现的monitor函数,发现这个函数很简单,就是去获取对应锁,如果线程死锁或其他原因阻塞,那么必然无法获取锁,monitor()函数执行必然会阻塞。Watchdog就是利用这个原理来判断是否死锁。

Looper Checker,用于检查线程的消息队列是否长时间处于工作状态。Watchdog自身的消息队列,Ui, Io, Display这些全局的消息队列都是被检查的对象。此外,一些重要的线程的消息队列,也会加入到Looper Checker中,譬如AMS, PKMS,这些是在对应的对象初始化时加入的。
代码里搜索关键字addThread,会检测如下信息:

LINUX/android/frameworks/base/services/core/java/com/android/server/power/
PowerManagerService.java	775 Watchdog.getInstance().addThread(mHandler); in onStart()
LINUX/android/frameworks/base/services/core/java/com/android/server/am/
ActivityManagerService.java	2989 Watchdog.getInstance().addThread(mHandler); in ActivityManagerService()
LINUX/android/frameworks/base/services/core/java/com/android/server/pm/
PackageManagerService.java	2512 Watchdog.getInstance().addThread(mHandler, WATCHDOG_TIMEOUT); in PackageManagerService()

addThread()将PowerManagerService、PackageManagerService、ActivityManagerService
等几个主线程Handler保存到Watchdog.mHandlerCheckers列表中;同时还会把上面提到的mMonitorChecker也保存到Watchdog.mHandlerCheckers中;另外还会将foreground thread、ui thread、i/o thread 、display thread 、main thread的Handler也保存到Watchdog.mHandlerCheckers中来;

  /frameworks/base/services/core/java/com/android/server/Watchdog.java        
        // The shared foreground thread is the main checker.  It is where we
        // will also dispatch monitor checks and do other work.
        mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
                "foreground thread", DEFAULT_TIMEOUT);
        mHandlerCheckers.add(mMonitorChecker);
        // Add checker for main thread.  We only do a quick check since there
        // can be UI running on the thread.
        mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),
                "main thread", DEFAULT_TIMEOUT));
        // Add checker for shared UI thread.
        mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),
                "ui thread", DEFAULT_TIMEOUT));
        // And also check IO thread.
        mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),
                "i/o thread", DEFAULT_TIMEOUT));
        // And the display thread.
        mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),
                "display thread", DEFAULT_TIMEOUT));

Watchdog会不断判断这些线程的Looper是否空闲,如果一直非空闲,那么必然被blocked住了。

2.3 Watchdog的运作

通过前面的初始化,已经将watchdog需要监测的对象全部准备就绪。接下来就要看它具体是如何去监测的了。

watchdog本身就是一个线程,我们想知道它是如何去监测各个对象的?那就直接从它的run方法来看就好:

 @Override
    public void run() {
        boolean waitedHalf = false;//标识第一个30s超时
        boolean mSFHang = false;//标识surfaceflinger是否hang?
        mProcessStats.init(); // mtk: get cpu info when SWT occur.
        while (true) {
            final List blockedCheckers;
            final String subject;
            final String name;
            mSFHang = false;
            if (exceptionHWT != null && waitedHalf == false ) {
                exceptionHWT.WDTMatterJava(300);//hang_detect机制相关,用来重置hang_detect的计数,这里表示正常阶段
具体可以查看MOL > Quick Start > Hang Detect 问题快速分析 专题!
            }
            final boolean allowRestart;//发生SWT要不要重启
            int debuggerWasConnected = 0;

            if (DEBUG)
                Slog.w(TAG, "SWT Watchdog before synchronized:" + SystemClock.uptimeMillis());

            synchronized (this) {

                if (DEBUG)
                    Slog.w(TAG, "SWT Watchdog after synchronized:" + SystemClock.uptimeMillis());

                long timeout = CHECK_INTERVAL;
                long SFHangTime;
                // Make sure we (re)spin the checkers that have become idle within
                // this wait-and-check interval
             //1、调度所有的HandlerChecker
                for (int i=0; i 0) {
                    debuggerWasConnected--;
                }

                // NOTE: We use uptimeMillis() here because we do not want to increment the time we
                // wait while asleep. If the device is asleep then the thing that we are waiting
                // to timeout on is asleep as well and won't have a chance to run, causing a false
                // positive on when to kill things.
                //2、开始定期检查
                long start = SystemClock.uptimeMillis();
                while (timeout > 0) {
                    if (Debug.isDebuggerConnected()) {
                        debuggerWasConnected = 2;
                    }
                    try {
                        wait(timeout);
                    } catch (InterruptedException e) {
                        Log.wtf(TAG, e);
                    }
                    if (Debug.isDebuggerConnected()) {
                        debuggerWasConnected = 2;
                    }
                    timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);
                }

                //3、检查HandlerChecker的完成状态
                //MTK enhance
                SFHangTime = GetSFStatus();
                if (DEBUG) Slog.w(TAG, "**Get SF Time **" + SFHangTime);
                if (SFHangTime > TIME_SF_WAIT * 2) {
                    Slog.v(TAG, "**SF hang Time **" + SFHangTime);
                    mSFHang = true;
                    blockedCheckers = getBlockedCheckersLocked();
                    subject = "";

                } //@@
                else {
                    boolean fdLimitTriggered = false;
                    if (mOpenFdMonitor != null) {
                        fdLimitTriggered = mOpenFdMonitor.monitor();
                    }

                    if (!fdLimitTriggered) {
                        final int waitState = evaluateCheckerCompletionLocked();
//根据mCompleted 、mStartTime值评估等待状态
                        if (waitState == COMPLETED) {//检测完成并正常,继续检查
                            // The monitors have returned; reset
                            waitedHalf = false;
                            continue;
                        } else if (waitState == WAITING) {//30秒之内,继续检查
                            // still waiting but within their configured intervals;
                            // back off and recheck
                            continue;
                        } else if (waitState == WAITED_HALF) {//30~60秒之内,dump一些信息并继续检查
                            if (!waitedHalf) {
                                // We've waited half the deadlock-detection interval.  Pull a stack
                                // trace and wait another half.
                                if (exceptionHWT != null) {
                                    exceptionHWT.WDTMatterJava(360);//hang_detect机制相关,表示进入dump阶段
                                }
                                ArrayList pids = new ArrayList();
                                pids.add(Process.myPid());
                                ActivityManagerService.dumpStackTraces(true, pids, null, null,
                                    getInterestingNativePids());
                                mProcessStats.update();
                                waitedHalf = true;
                            }
                            continue;
                        }
                        //4、收集超时的HandlerChecker
                        // something is overdue!
                        blockedCheckers = getBlockedCheckersLocked();
//超了60秒,此时便出问题了,收集超时的HandlerChecker
                        subject = describeCheckersLocked(blockedCheckers);
                    } else {
                        blockedCheckers = Collections.emptyList();
                        subject = "Open FD high water mark reached";
                    }
                }
                allowRestart = mAllowRestart;
            }

            // If we got here, that means that the system is most likely hung.
            // First collect stack traces from all threads of the system process.
            // Then kill this process so that the system will restart.
            //5、保存一些重要日志,并根据设定,来判断是否需要重启系统
            Slog.e(TAG, "**SWT happen **" + subject);
            if (exceptionHWT != null) {
                exceptionHWT.switchFtrace(2);
            }
            name = (mSFHang && subject.isEmpty()) ? "surfaceflinger  hang." : "";
            EventLog.writeEvent(EventLogTags.WATCHDOG, name.isEmpty() ? subject : name);
//将阻塞线程信息打印到Event日志中

            if (exceptionHWT != null) {
                exceptionHWT.WDTMatterJava(420);//hang_detect机制相关,表示发生了SWT
            }
            mProcessStats.update();
            final String cpuInfo = mProcessStats.printCurrentState(SystemClock.uptimeMillis());
            Slog.d(TAG, mProcessStats.printCurrentLoad());

            ArrayList pids = new ArrayList<>();
            pids.add(Process.myPid());
            if (mPhonePid > 0) pids.add(mPhonePid);
            // Pass !waitedHalf so that just in case we somehow wind up here without having
            // dumped the halfway stacks, we properly re-initialize the trace file.
            final File stack = ActivityManagerService.dumpStackTraces(
                    !waitedHalf, pids, null, null, getInterestingNativePids());

            // Give some extra time to make sure the stack traces get written.
            // The system's been hanging for a minute, another second or two won't hurt much.
            SystemClock.sleep(2000);

            // Trigger the kernel to dump all blocked threads, and backtraces on all CPUs to the kernel log
            doSysRq('w');
            doSysRq('l');

            /// M: WDT debug enhancement
            /// need to wait the AEE dumps all info, then kill system server @{
            // Try to add the error to the dropbox, but assuming that the ActivityManager
            // itself may be deadlocked.  (which has happened, causing this statement to
            // deadlock and the watchdog as a whole to be ineffective)
            Slog.v(TAG, "** save all info before killnig system server **");
            Thread dropboxThread = new Thread("watchdogWriteToDropbox") {
                    public void run() {
                        Slog.v(TAG, "** start addErrorToDropBox **");
                        mActivity.addErrorToDropBox(
                                "watchdog", null, "system_server", null, null,
                                name.isEmpty() ? subject : name, cpuInfo, stack, null);
                    }
                };
            dropboxThread.start();
            try {
                dropboxThread.join(2000);  // wait up to 2 seconds for it to return.
            } catch (InterruptedException ignored) {}

            IActivityController controller;
            synchronized (this) {
                controller = mController;
            }
            if ((mSFHang == false) && (controller != null)) {
                Slog.i(TAG, "Reporting stuck state to activity controller");
                try {
                    Binder.setDumpDisabled("Service dumps disabled due to hung system process.");
                    Slog.i(TAG, "Binder.setDumpDisabled");
                    // 1 = keep waiting, -1 = kill system
                    int res = controller.systemNotResponding(subject);
                    if (res >= 0) {
                        Slog.i(TAG, "Activity controller requested to coninue to wait");
                        waitedHalf = false;
                        continue;
                    }
                    Slog.i(TAG, "Activity controller requested to reboot");
                } catch (RemoteException e) {
                }
            }

            // Only kill the process if the debugger is not attached.
            if (Debug.isDebuggerConnected()) {
                debuggerWasConnected = 2;
            }
            if (debuggerWasConnected >= 2) {
                Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");
            } else if (debuggerWasConnected > 0) {
                Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");
            } else if (!allowRestart) {
                Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");
            } else {
                Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
                WatchdogDiagnostics.diagnoseCheckers(blockedCheckers);
                /// @}

                Slog.w(TAG, "*** GOODBYE!");
                exceptionHWT.WDTMatterJava(330); // 330 means watchdog exit successfully.
                // MTK enhance
                if (mSFHang)
                {
                    Slog.w(TAG, "SF hang!");
                    if (GetSFReboot() > 3)
                    {
                        Slog.w(TAG, "SF hang reboot time larger than 3 time, reboot device!");
                        rebootSystem("Maybe SF driver hang,reboot device.");
                    }
                    else
                    {
                        SetSFReboot();
                    }
                }
                //@
                if (mSFHang) {
                    Slog.v(TAG, "killing surfaceflinger for surfaceflinger hang");
                    String[] sf = new String[] {"/system/bin/surfaceflinger"};
                    int[] pid_sf =  Process.getPidsForCommands(sf);
                    if (pid_sf[0] > 0) {
                        Process.killProcess(pid_sf[0]);
                    }
                    Slog.v(TAG, "killing surfaceflinger end");
                } else {
                    Process.killProcess(Process.myPid());
                }

                System.exit(10);
            }

            waitedHalf = false;
        }
    }

以上代码片段主要的运行逻辑如下:

1、Watchdog运行后,便开始无限循环,依次调用每一个HandlerChecker的scheduleCheckLocked()方法

2、调度完HandlerChecker之后,便开始定期检查是否超时,每一次检查的间隔时间由CHECK_INTERVAL常量设定,为30秒

3、每一次检查都会调用evaluateCheckerCompletionLocked()方法来评估一下HandlerChecker的完成状态:
a、COMPLETED表示已经完成
b、WAITING和WAITED_HALF表示还在等待,但未超时
c、OVERDUE表示已经超时。默认情况下,timeout是1分钟,但监测对象可以通过传参自行设定,譬如PKMS的Handler Checker的超时是10分钟

4、如果超时时间到了,还有HandlerChecker处于未完成的状态(OVERDUE),则通过getBlockedCheckersLocked()方法,获取阻塞的HandlerChecker,生成一些描述信息

5、保存日志,包括一些运行时的堆栈信息,这些日志是我们解决Watchdog问题的重要依据。如果判断需要杀掉system_server进程,则给当前进程(system_server)发送signal 9

3 导致 SWT 重启原因

线程死锁
Binder的server端卡住
Native方法执行时间过长
SurfaceFlinger卡住
Dump时间过长
Zygote fork进程时卡住
Binder used up

4 log分析

4.1 搜索关键 watchdog

搜索关键 watchdog初步分析rootcause

03-29 16:59:14.818   748  4784 W Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in monitor com.android.server.input.InputManagerService on foreground thread (android.fg) 
03-29 16:59:14.818   748  4784 W Watchdog: foreground thread stack trace: 
03-29 16:59:14.819   748  4784 W Watchdog:     at com.android.server.input.InputManagerService.nativeMonitor(Native Method) 
03-29 16:59:14.819   748  4784 W Watchdog:     at com.android.server.input.InputManagerService.monitor(InputManagerService.java:1404) 03-29 16:59:14.819   748  4784 W Watchdog:     at com.android.server.Watchdog$HandlerChecker.run(Watchdog.java:179) 
03-29 16:59:14.819   748  4784 W Watchdog:     at android.os.Handler.handleCallback(Handler.java:739) 03-29 16:59:14.819   748  4784 W Watchdog:     at android.os.Handler.dispatchMessage(Handler.java:95) 
03-29 16:59:14.819   748  4784 W Watchdog:     at android.os.Looper.loop(Looper.java:135) 
03-29 16:59:14.819   748  4784 W Watchdog:     at android.os.HandlerThread.run(HandlerThread.java:61) 
03-29 16:59:14.819   748  4784 W Watchdog:     at com.android.server.ServiceThread.run(ServiceThread.java:46) 
03-29 16:59:14.819   748  4784 W Watchdog: *** GOODBYE!

可见WD检测到InputManagerService发生Block:

03-29 16:59:14.818 748 4784 W Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in monitor com.android.server.input.InputManagerService on foreground thread (android.fg)

在 android.fg线程发生block,查看对应trace:

"android.fg" prio=5 tid=17 Native 
  | group="main" sCount=1 dsCount=0 obj=0x12e7a120 self=0xb7b20160
  | sysTid=771 nice=0 cgrp=default sched=0/0 handle=0xb7b20538
  | state=S schedstat=( 1927397505 30260078920 12131 ) utm=138 stm=54 core=0 HZ=100 
  | stack=0xa51df000-0xa51e1000 stackSize=1036KB
  | held mutexes=
  kernel: (couldn't read /proc/self/task/771/stack)
  native: #00 pc 0000f9b0  /system/lib/libc.so (syscall+28)
  native: #01 pc 0001318d  
/system/lib/libc.so (__pthread_cond_timedwait_relative(pthread_cond_t*, pthread_mutex_t*, timespec const*)+56) 
native: #02 pc 00025315  /system/lib/libinputflinger.so (android::InputReader::monitor()+28) 
native: #03 pc 0000f56d  /system/lib/libandroid_servers.so (???)
native: #04 pc 0011441d  
/system/framework/arm/services.odex (Java_com_android_server_input_InputManagerService_nativeMonitor__J+88) 
  at com.android.server.input.InputManagerService.nativeMonitor(Native method) 
  at com.android.server.input.InputManagerService.monitor(InputManagerService.java:1404) 
  at com.android.server.Watchdog$HandlerChecker.run(Watchdog.java:179)
  at android.os.Handler.handleCallback(Handler.java:739)
  at android.os.Handler.dispatchMessage(Handler.java:95)
  at android.os.Looper.loop(Looper.java:135)
  at android.os.HandlerThread.run(HandlerThread.java:61)
  at com.android.server.ServiceThread.run(ServiceThread.java:46)

Wd监控IMS的InputReader及InputDispatcher线程,从以上trace可知InputReader可能发生死锁,所以继续查看InputReader线程:

"InputReader" prio=10 tid=34 Native 
  | group="main" sCount=1 dsCount=0 obj=0x13100220 self=0xb7c0dc80
  | sysTid=4310 nice=-8 cgrp=default sched=0/0 handle=0xb7b098a0
  | state=S schedstat=( 62300529241 31784888301 33265 ) utm=5130 stm=1100 core=3 HZ=100 
  | stack=0xa3b7b000-0xa3b7d000 stackSize=1012KB
  | held mutexes=
   kernel: (couldn't read /proc/self/task/4310/stack)
native: #00 pc 0000f9b0  /system/lib/libc.so (syscall+28)
native: #01 pc 000a8c4b  /system/lib/libart.so (art::ConditionVariable::Wait(art::Thread*)+82) 
native: #02 pc 001abc1b  /system/lib/libart.so (art::JNI::IsInstanceOf(_JNIEnv*, _jobject*, _jclass*)+1022) 
native: #03 pc 0001fe2f  /system/lib/libjavacore.so (???)
native: #04 pc 0001fe71  /system/lib/libjavacore.so (???)
native: #05 pc 00283a93  /system/framework/arm/boot.oat (Java_libcore_io_Posix_writeBytes__Ljava_io_FileDescriptor_2Ljava_lang_Object_2II+142) 
  at libcore.io.Posix.writeBytes(Native method)
  at libcore.io.Posix.write(Posix.java:258)
  at libcore.io.BlockGuardOs.write(BlockGuardOs.java:313)
  at libcore.io.IoBridge.write(IoBridge.java:497)
  at java.io.FileOutputStream.write(FileOutputStream.java:186)
  at java.io.OutputStreamWriter.flushBytes(OutputStreamWriter.java:167)
  - locked <@addr=0x14079080> (a java.io.FileOutputStream)
  at java.io.OutputStreamWriter.close(OutputStreamWriter.java:140)
  - locked <@addr=0x14079080> (a java.io.FileOutputStream)
  at com.android.internal.policy.impl.PhoneWindowManager.saveCoverStatus(PhoneWindowManager.java:6266) 
  at com.android.internal.policy.impl.PhoneWindowManager.interceptKeyBeforeQueueing(PhoneWindowManager.java:5823)
  at com.android.server.wm.InputMonitor.interceptKeyBeforeQueueing(InputMonitor.java:370)
  at com.android.server.input.InputManagerService.interceptKeyBeforeQueueing(InputManagerService.java:1491)

可知在 saveCoverStatus时write发生等待,所以一方面检测函数调用。

另外从全局来查看下InputReader是否正常:
按power键点亮屏幕:

03-29 16:53:36.248   748  4310 D InputReader: Perf-Track KeyboardInputMapper Up keyCode=26

进行触屏操作:

03-29 16:53:52.894   748  4310 D InputReader: Perf-Track AppLaunch_dispatchPtr:Down:x=396.44937, y=1192.06873
03-29 16:53:53.007   748  4310 D InputReader: Perf-Track AppLaunch_dispatchPtr:Up:x=396.44937, y=1192.06873
03-29 16:53:53.954   748  4310 D InputReader: Perf-Track AppLaunch_dispatchPtr:Down:x=646.10266, y=853.33331
03-29 16:53:54.112   748  4310 D InputReader: Perf-Track AppLaunch_dispatchPtr:Up:x=532.26074, y=902.29510
03-29 16:53:54.457   748  4310 D InputReader: Perf-Track AppLaunch_dispatchPtr:Down:x=370.48544, y=1029.19592
03-29 16:53:54.595   748  4310 D InputReader: Perf-Track AppLaunch_dispatchPtr:Up:x=518.28015, y=920.28101

传入了键值keyCode=0 (KEYCODE_UNKNOWN = 0):

03-29 16:57:35.743   748  4310 D InputReader: Perf-Track KeyboardInputMapper Down keyCode=0, scanCode=251, DownTime=0
03-29 16:57:35.743   748  4310 D InputReader: Perf-Track KeyboardInputMapper Up keyCode=0

所以另一方面检查keyCode=0 (KEYCODE_UNKNOWN = 0)产生的原因。一方面检查函数调用是否有异常,另一方面从键值映射方面来检查。
此外 ,我们可以从中吸取一些经验就是尽量避免在重要服务进行大量耗时操作。

4.2 搜索关键字 held by

搜索关键字held by判断线程是否被 Block,是否有死锁问题
例如:

Cmd line: system_server
"ActivityManager" prio=5 tid=14 Blocked
| group="main" sCount=1 dsCount=0 obj=0x12d46f90 self=0xb732ba18
| sysTid=840 nice=-2 cgrp=apps sched=0/0 handle=0xb732c040
| state=S schedstat=( 3934680041 3104214846 9460 ) utm=166 stm=227 core=2 HZ=100
| stack=0xa6106000-0xa6108000 stackSize=1036KB
| held mutexes=
at com.android.server.net.NetworkStatsService.setUidForeground(NetworkStatsService.java:641)
- waiting to lock <0x2e5f7eff> (a java.lang.Object) held by thread 35
at com.android.server.net.NetworkPolicyManagerService.updateRulesForUidLocked(NetworkPolicyMana
gerService.java:2081)

waiting to lock <0x2e5f7eff> (a java.lang.Object) held by thread 35, 查看thread 35

"NetworkStats" prio=5 tid=35 TimedWaiting
| group="main" sCount=1 dsCount=0 obj=0x1393f9e0 self=0xb758ce48
| sysTid=2021 nice=0 cgrp=apps sched=0/0 handle=0xb76fab58
| state=S schedstat=( 402053186 255954674 601 ) utm=27 stm=13 core=0 HZ=100
| stack=0xa32db000-0xa32dd000 stackSize=1036KB
| held mutexes=
at java.lang.Object.wait!(Native method)
- waiting on <0x16b70c1b> (a java.lang.Object)
at java.lang.Thread.parkFor(Thread.java:1220)
- locked <0x16b70c1b> (a java.lang.Object)
at sun.misc.Unsafe.park(Unsafe.java:299)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:197)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueue
dSynchronizer.java:2055)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:388)
at com.android.server.NativeDaemonConnector$ResponseQueue.remove(NativeDaemonConnector.jav
a:614)
at com.android.server.NativeDaemonConnector.execute(NativeDaemonConnector.java:406)
at com.android.server.NativeDaemonConnector.executeForList(NativeDaemonConnector.java:360)
at com.android.server.NetworkManagementService.getNetworkStatsTethering(NetworkManagementSer
vice.java:1731)

4.3 检查Binder的Server 端

检查Binder的Server 端是否卡住

4.3.1 重要信息

如果线程的状态是Native,并且callstack中含有IPCThreadState::waitForResponse–>IPCThreadState::talkWithDriver的信息就可以初步说明是卡在对端,下一步就是要找到对端

找到事件以及发生的时间点

01-01 03:05:34.411 639 1064 I watchdog: Blocked in handler on main thread (main)
block module: main timeout: 60s

查看时间段内对应线程的 call stack,初步说明是卡在对端

"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x721e0870 self=0xa805d000
  | sysTid=941 nice=0 cgrp=default sched=0/0 handle=0xabfe74a4
  | state=S schedstat=( 6662448086 9845197785 9495 ) utm=548 stm=118 core=1 HZ=100
  | stack=0xbe090000-0xbe092000 stackSize=8MB
  | held mutexes=
  kernel: (couldn't read /proc/self/task/941/stack)
  native: #00 pc 00049304  /system/lib/libc.so (__ioctl+8)
  native: #01 pc 0001deef  /system/lib/libc.so (ioctl+38)
  native: #02 pc 0004242f  /system/lib/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+170)
  native: #03 pc 00042de9  /system/lib/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+236)
  native: #04 pc 0003d2e5  /system/lib/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+36)
  native: #05 pc 000bcdad  /system/lib/libandroid_runtime.so (???)
  native: #06 pc 0074ec65  /system/framework/arm/boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+132)
  at android.os.BinderProxy.transactNative(Native method)
  at android.os.BinderProxy.transact(Binder.java:764)
  at android.content.om.IOverlayManager$Stub$Proxy.getOverlayInfo(IOverlayManager.java:254)
  at com.android.systemui.statusbar.phone.StatusBar.isUsingDarkTheme(unavailable:-1)
  at com.android.systemui.statusbar.phone.StatusBar.updateTheme(unavailable:-1)
  at com.android.systemui.statusbar.phone.StatusBar.onColorsChanged(unavailable:-1)
  at com.android.internal.colorextraction.ColorExtractor.triggerColorsChanged(ColorExtractor.java:186)
  at com.android.systemui.colorextraction.SysuiColorExtractor.setWallpaperVisible(unavailable:-1)
  at com.android.systemui.colorextraction.SysuiColorExtractor$1.lambda$-com_android_systemui_colorextraction_SysuiColorExtractor$1_3105(unavailable:-1)
  at com.android.systemui.colorextraction.-$Lambda$j2m7lOWVNe22BvvVwNuW1ftTq4c.$m$0(unavailable:-1)
  at com.android.systemui.colorextraction.-$Lambda$j2m7lOWVNe22BvvVwNuW1ftTq4c.run(unavailable:-1)
  at android.os.Handler.handleCallback(Handler.java:790)
  at android.os.Handler.dispatchMessage(Handler.java:99)
  at android.os.Looper.loop(Looper.java:164)
  at android.app.ActivityThread.main(ActivityThread.java:6523)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:438)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:857)

systemui调用getOverlayInfo 出问题是因为Binder对端
查看SYS_BINDER_INFO,确定Binder对端tid以及sysTid,这里无法确认,但是从getOverlayInfo 来看可能是sysTid=4447

"Binder:639_17" prio=5 tid=115 Blocked
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x13e4a4a0 self=0xa0310e00
  | sysTid=4447 nice=0 cgrp=default sched=0/0 handle=0x85a75970
  | state=S schedstat=( 1407904321 1543930926 3137 ) utm=101 stm=39 core=2 HZ=100
  | stack=0x8597b000-0x8597d000 stackSize=1006KB
  | held mutexes=
  at com.android.server.om.OverlayManagerService$1.getOverlayInfo(OverlayManagerService.java:511)
  - waiting to lock <0x0d53be75> (a java.lang.Object) held by thread 1
  at android.content.om.IOverlayManager$Stub.onTransact(IOverlayManager.java:79)
  at android.os.Binder.execTransact(Binder.java:697)

而0x0d53be75锁被sysTid=639 持有,持有不释放是639的Binder调用出了问题

"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x721e0870 self=0xa805d000
  | sysTid=639 nice=-2 cgrp=default sched=0/0 handle=0xabfe74a4
  | state=S schedstat=( 38966905233 5299817531 20464 ) utm=3323 stm=573 core=0 HZ=100
  | stack=0xbe090000-0xbe092000 stackSize=8MB
  | held mutexes=
  kernel: (couldn't read /proc/self/task/639/stack)
  native: #00 pc 00049304  /system/lib/libc.so (__ioctl+8)
  native: #01 pc 0001deef  /system/lib/libc.so (ioctl+38)
  native: #02 pc 0004242f  /system/lib/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+170)
  native: #03 pc 00042de9  /system/lib/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+236)
  native: #04 pc 0003d2e5  /system/lib/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+36)
  native: #05 pc 000bcdad  /system/lib/libandroid_runtime.so (???)
  native: #06 pc 0074ec65  /system/framework/arm/boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+132)
  at android.os.BinderProxy.transactNative(Native method)
  at android.os.BinderProxy.transact(Binder.java:764)
  at android.os.IInstalld$Stub$Proxy.idmap(IInstalld.java:936)
  at com.android.server.pm.Installer.idmap(Installer.java:327)
  at com.android.server.om.IdmapManager.createIdmap(IdmapManager.java:62)
  at com.android.server.om.OverlayManagerServiceImpl.updateState(OverlayManagerServiceImpl.java:493)
  at com.android.server.om.OverlayManagerServiceImpl.updateAllOverlaysForTarget(OverlayManagerServiceImpl.java:237)
  at com.android.server.om.OverlayManagerServiceImpl.onTargetPackageChanged(OverlayManagerServiceImpl.java:187)
  at com.android.server.om.OverlayManagerService$PackageReceiver.onPackageChanged(OverlayManagerService.java:392)
  - locked <0x0d53be75> (a java.lang.Object)
  at com.android.server.om.OverlayManagerService$PackageReceiver.onReceive(OverlayManagerService.java:350)
  at android.app.LoadedApk$ReceiverDispatcher$Args.lambda$-android_app_LoadedApk$ReceiverDispatcher$Args_53034(LoadedApk.java:1323)

binder调用卡住导致问题发生,这种情况需要检查binder对端的行为
若binder对端已无空闲binder线程,则需要从以下两个方面做进一步分析和解决:
1) 为何binder线程执行如此长时间?
2) 是否有太多重复binder请求导致binder线程资源被占用,这种请求是否合理?

4.3.2 如何确定binder的对端

前提:有binder block的callstack的特征,线程状态为Native

IPCThreadState::waitForResponse-->IPCThreadState::talkWithDriver
Cmd line: system_server
"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x721b49b0 self=0xa97b2000
  | sysTid=815 nice=-2 cgrp=default sched=0/0 handle=0xae13a4a4

确定方法:
第一步:获取时间点 17:54:29.433

06-22 17:54:29.433   815  1035 I watchdog: Blocked in handler on main thread (main)
block module: main                 timeout: 60s

第二步:查看时间段内对应线程的 call stack , 获取 sysTid 以及卡住的具体接口

----- pid 815 at 2018-06-22 17:53:54 -----
Cmd line: system_server
"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x721b49b0 self=0xa97b2000
  | sysTid=815 nice=-2 cgrp=default sched=0/0 handle=0xae13a4a4
  | state=S schedstat=( 18775190570 13246793364 27928 ) utm=1310 stm=566 core=0 HZ=100
  | stack=0xbe1be000-0xbe1c0000 stackSize=8MB
  | held mutexes=
  kernel: (couldn't read /proc/self/task/815/stack)
  native: #00 pc 00048c8c  /system/lib/libc.so (__ioctl+8)
  native: #01 pc 0001dd65  /system/lib/libc.so (ioctl+32)
  native: #02 pc 00042535  /system/lib/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+168)
  native: #03 pc 00042ee9  /system/lib/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+236)
  native: #04 pc 0003d309  /system/lib/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+36)
  native: #05 pc 000bcf89  /system/lib/libandroid_runtime.so (???)
  native: #06 pc 002a90a5  /system/framework/arm/boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+132)
  at android.os.BinderProxy.transactNative(Native method)
  at android.os.BinderProxy.transact(Binder.java:764)
  at com.android.internal.telephony.ITelephony$Stub$Proxy.invokeOemRilRequestRaw(ITelephony.java:4634)
  at android.telephony.TelephonyManager.invokeOemRilRequestRaw(TelephonyManager.java:5716)
  at com.android.server.audio.AudioService.sendATCommand(AudioService.java:1068)
  at com.android.server.audio.AudioService.setModemAllPowerdown(AudioService.java:1083)
  at com.android.server.audio.AudioService$AsOnAudioPortUpdateListener.onAudioPatchListUpdate(AudioService.java:1045)
  at android.media.AudioPortEventHandler$1.handleMessage(AudioPortEventHandler.java:108)
  at android.os.Handler.dispatchMessage(Handler.java:106)
  at android.os.Looper.loop(Looper.java:164)
  at com.android.server.SystemServer.run(SystemServer.java:442)
  at com.android.server.SystemServer.main(SystemServer.java:283)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:438)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:837)

----- pid 815 at 2018-06-22 17:54:29 -----
Cmd line: system_server
"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x721b49b0 self=0xa97b2000
  | sysTid=815 nice=-2 cgrp=default sched=0/0 handle=0xae13a4a4
  | state=S schedstat=( 18775421647 13246793364 27930 ) utm=1310 stm=566 core=0 HZ=100
  | stack=0xbe1be000-0xbe1c0000 stackSize=8MB
  | held mutexes=
  kernel: (couldn't read /proc/self/task/815/stack)
  native: #00 pc 00048c8c  /system/lib/libc.so (__ioctl+8)
  native: #01 pc 0001dd65  /system/lib/libc.so (ioctl+32)
  native: #02 pc 00042535  /system/lib/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+168)
  native: #03 pc 00042ee9  /system/lib/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+236)
  native: #04 pc 0003d309  /system/lib/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+36)
  native: #05 pc 000bcf89  /system/lib/libandroid_runtime.so (???)
  native: #06 pc 002a90a5  /system/framework/arm/boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+132)
  at android.os.BinderProxy.transactNative(Native method)
  at android.os.BinderProxy.transact(Binder.java:764)
  at com.android.internal.telephony.ITelephony$Stub$Proxy.invokeOemRilRequestRaw(ITelephony.java:4634)
  at android.telephony.TelephonyManager.invokeOemRilRequestRaw(TelephonyManager.java:5716)
  at com.android.server.audio.AudioService.sendATCommand(AudioService.java:1068)
  at com.android.server.audio.AudioService.setModemAllPowerdown(AudioService.java:1083)
  at com.android.server.audio.AudioService$AsOnAudioPortUpdateListener.onAudioPatchListUpdate(AudioService.java:1045)
  at android.media.AudioPortEventHandler$1.handleMessage(AudioPortEventHandler.java:108)
  at android.os.Handler.dispatchMessage(Handler.java:106)
  at android.os.Looper.loop(Looper.java:164)
  at com.android.server.SystemServer.run(SystemServer.java:442)
  at com.android.server.SystemServer.main(SystemServer.java:283)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:438)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:837)

可见是卡在com.android.internal.telephony.ITelephony S t u b Stub StubProxy.invokeOemRilRequestRaw(ITelephony.java:4634),卡住的接口是invokeOemRilRequestRaw, sysTid = 815

第三步:Check "SYS_BINDER_INFO"找到815的通信对端
比如
如果 sysTid = 2938,在Check "SYS_BINDER_INFO"找到如下信息:

proc 838
thread 2938:
    outgoing transaction 139654941: ffffffc0877d87ab from 838:2938 to 9111:9123 code 1 flags 10 priv -6...

然后从"SYS_PROCESSES_AND_THREADE" 找到9111是进程monkey

root 9111  9100  1150100  21780  0  20  0  0  0  fg  fffff  2487234d  S  64  com.android.monkey
root 9123  9111  1150100  21780  2  14 -6  0  0  fg  00000  5456456c  R  64  bonder_1

如果不能从SYS_BINDER_INFO找到信息,也可以从kernel log去找,或者在当前卡住的callstack中,仅从名字上结合code,确认卡住的对端和具体的接口

如上例在binder info中找不到815的通信对端,在kernel log中发现binder release的信息:

[11645.672135] (3)[23724:kworker/u8:6]binder: release 815:815 transaction 3729140 out, still active
<6>[11647.223911]  (1)[23672:kworker/u8:1]binder: release 1207:1229 transaction 3729140 in, still active
<6>[11647.223954]  (1)[23672:kworker/u8:1]binder: send failed reply for transaction 3729140, target dead

在binder_info中查找对端,确认对端是1207:1229

incoming transaction 3729140: cc857480 from 0:0 to 1207:1229 code 79 flags 10 pri 0:118 r1 node 23923 size 120:0 data e6280068

查看1207:1229的backtrace:

----- pid 1207 at 2018-06-22 17:54:30 -----
Cmd line: com.android.phone
"Binder:1207_2" prio=5 tid=9 Waiting
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x13c01a90 self=0x9fc53c00
  | sysTid=1229 nice=-2 cgrp=default sched=0/0 handle=0x91ca7970
  | state=S schedstat=( 10524016079 5984437221 36564 ) utm=647 stm=404 core=0 HZ=100
  | stack=0x91bad000-0x91baf000 stackSize=1006KB
  | held mutexes=
  at java.lang.Object.wait(Native method)
  - waiting on <0x0bbbfc71> (a com.android.phone.PhoneInterfaceManager$MainThreadRequest)
  at com.android.phone.PhoneInterfaceManager.PhoneInterfaceManager.java(PhoneInterfaceManager.java:1031)
  - locked <0x0bbbfc71> (a com.android.phone.PhoneInterfaceManager$MainThreadRequest)
  at com.android.phone.PhoneInterfaceManager.sendRequest(PhoneInterfaceManager.java:1010)
  at com.android.phone.PhoneInterfaceManager.invokeOemRilRequestRaw(PhoneInterfaceManager.java:3114)
  at com.android.internal.telephony.ITelephony$Stub.onTransact(ITelephony.java:1448)
  at android.os.Binder.execTransact(Binder.java:697)

看起来应该是卡在phone这边的invokeOemRilRequestRaw操作

4.4 Native 方法执行时间过长导致重启

线程状态Native,查看是否有PowerManagerService.nativeSetAutoSuspend
根据callstack找到是何Native方法时间太久,是否等待硬件返回信息,或者硬件本身有问题

4.5 SurfaceFlinger 卡住导致重启

搜索关键字 I watchdog ,查看是否有 surfaceflinger hang,默认卡住40s,就会重启。

4.6 Zygote Fork 进程时卡住

线程状态Native,查看是否有 Process.zygoteSendArgsAndGetResult

4.7 Dump 时间过长

Dump 超过60s 可能会引起手机重启。
搜索关键字dumpStackTraces 或 dumpStackTraces process

一般前面会有ANR发生
前面有fatal JE NE K等Exception
自动化脚本有call “dumpsys” 去dump 系统信息

一般来说上面的情况enduser不会遇到,上面原因一般是系统loading过重或者dump的信息过多

5 案例

现象:
安装应用时swt,系统重启
1.小概率登录账户后台在更新google应用,
2.连接电脑用豌豆荚安装应用时,
点击屏幕解锁然后就重启卡死了
搜索"watchdog":

07-15 15:13:52.132 984 1174 I watchdog: Blocked in monitor com.android.server.am.ActivityManagerService on foreground thread (android.fg), 
 Blocked in handler on main thread (main), 
 Blocked in handler on i/o thread (android.io),
 Blocked in handler on display thread (android.display), 
 Blocked in handler on ActivityManager (ActivityManager), 
 Blocked in handler on PowerManagerService (PowerManagerService)
block module: android.fg timeout: 60s

查看对应时间线程的call stack

1、----- pid 984 at 2017-07-15 15:13:17 -----
Cmd line: system_server
"android.fg" prio=5 tid=16 Blocked
| group="main" sCount=1 dsCount=0 obj=0x12ca59c0 self=0x9de63800
| sysTid=1004 nice=0 cgrp=default sched=0/0 handle=0x92238920
| state=S schedstat=( 817332233 838592314 1765 ) utm=68 stm=13 core=2 HZ=100
| stack=0x92136000-0x92138000 stackSize=1038KB
| held mutexes=
at com.android.server.am.ActivityManagerService.monitor(ActivityManagerService.java:22667)
- waiting to lock <0x04cc3347> (a com.android.server.am.ActivityManagerService) held by thread 11
at com.android.server.Watchdog$HandlerChecker.run(Watchdog.java:207)
at android.os.Handler.handleCallback(Handler.java:836)
at android.os.Handler.dispatchMessage(Handler.java:103)
at android.os.Looper.loop(Looper.java:203)
at android.os.HandlerThread.run(HandlerThread.java:61)
at com.android.server.ServiceThread.run(ServiceThread.java:46)

//android.fg等待的锁,被thread 11拿着;

----- pid 984 at 2017-07-15 15:13:17 -----
Cmd line: system_server
"ActivityManager" prio=5 tid=11 Blocked
| group="main" sCount=1 dsCount=0 obj=0x12c532e0 self=0x9de61f00
| sysTid=999 nice=-2 cgrp=default sched=0/0 handle=0x92751920
| state=S schedstat=( 12393939024 9340072944 12993 ) utm=679 stm=560 core=2 HZ=100
| stack=0x9264f000-0x92651000 stackSize=1038KB
| held mutexes=
at com.android.server.pm.PackageManagerService.prepareUserData(PackageManagerService.java:20141)
- waiting to lock <0x0d65c6e0> (a java.lang.Object) held by thread 23
at com.android.server.pm.UserManagerService.onBeforeUnlockUser(UserManagerService.java:2883)
at com.android.server.am.UserController.finishUserUnlocking(UserController.java:292)
- locked <0x04cc3347> (a com.android.server.am.ActivityManagerService)
at com.android.server.am.UserController.unlockUserCleared(UserController.java:973)
- locked <0x04cc3347> (a com.android.server.am.ActivityManagerService)
at com.android.server.am.UserController.maybeUnlockUser(UserController.java:938)
at com.android.server.am.UserController.finishUserBoot(UserController.java:267)
- locked <0x04cc3347> (a com.android.server.am.ActivityManagerService)
at com.android.server.am.UserController.finishUserBoot(UserController.java:222)
at com.android.server.am.UserController.finishUserSwitch(UserController.java:177)
- locked <0x04cc3347> (a com.android.server.am.ActivityManagerService)
at com.android.server.am.ActivityStackSupervisor.activityIdleInternalLocked(ActivityStackSupervisor.java:1739)
at com.android.server.am.ActivityStackSupervisor$ActivityStackSupervisorHandler.activityIdleInternal(ActivityStackSupervisor.java:3902)
- locked <0x04cc3347> (a com.android.server.am.ActivityManagerService)
at com.android.server.am.ActivityStackSupervisor$ActivityStackSupervisorHandler.handleMessage(ActivityStackSupervisor.java:3941)
at android.os.Handler.dispatchMessage(Handler.java:110)
at android.os.Looper.loop(Looper.java:203)
at android.os.HandlerThread.run(HandlerThread.java:61)
at com.android.server.ServiceThread.run(ServiceThread.java:46)

//此处为thread 11 的call stack ,看此部分也在等待锁,此锁被thread 23 拉着,

查看thread 23 的状态:

---- pid 984 at 2017-07-15 15:13:17 -----
Cmd line: system_server
"PackageManager" prio=5 tid=23 Native
| group="main" sCount=1 dsCount=0 obj=0x12e4b380 self=0x9de65b00
| sysTid=1015 nice=10 cgrp=bg_non_interactive sched=0/0 handle=0x9183b920
| state=S schedstat=( 3538952919 2746466081 4616 ) utm=294 stm=59 core=1 HZ=100
| stack=0x91739000-0x9173b000 stackSize=1038KB
| held mutexes=
at android.net.LocalSocketImpl.readba_native(Native method)
at android.net.LocalSocketImpl.-wrap1(LocalSocketImpl.java:-1)
at android.net.LocalSocketImpl$SocketInputStream.read(LocalSocketImpl.java:110)
- locked <0x05fd685e> (a java.lang.Object)
at libcore.io.Streams.readFully(Streams.java:81)
at com.android.internal.os.InstallerConnection.readFully(InstallerConnection.java:222)
at com.android.internal.os.InstallerConnection.readReply(InstallerConnection.java:237)
at com.android.internal.os.InstallerConnection.transact(InstallerConnection.java:91)
- locked <0x0966443f> (a com.android.internal.os.InstallerConnection)
at com.android.internal.os.InstallerConnection.execute(InstallerConnection.java:124)
at com.android.internal.os.InstallerConnection.dexopt(InstallerConnection.java:147)
at com.android.server.pm.Installer.dexopt(Installer.java:153)
at com.android.server.pm.PackageDexOptimizer.performDexOptLI(PackageDexOptimizer.java:257)
at com.android.server.pm.PackageDexOptimizer.performDexOpt(PackageDexOptimizer.java:101)
- locked <0x0d65c6e0> (a java.lang.Object)
at com.android.server.pm.PackageManagerService.installPackageLI(PackageManagerService.java:15729)
at com.android.server.pm.PackageManagerService.installPackageTracedLI(PackageManagerService.java:15394)
at com.android.server.pm.PackageManagerService.-wrap25(PackageManagerService.java:-1)
at com.android.server.pm.PackageManagerService$9.run(PackageManagerService.java:12798)
- locked <0x0d65c6e0> (a java.lang.Object)
at android.os.Handler.handleCallback(Handler.java:836)
at android.os.Handler.dispatchMessage(Handler.java:103)
at android.os.Looper.loop(Looper.java:203)
at android.os.HandlerThread.run(HandlerThread.java:61)
at com.android.server.ServiceThread.run(ServiceThread.java:46)

从以上的依赖可以看出:

984(system_server):1004(android.fg)—>984(system_server):999(ActivityManager)—>984(system_server):1015(PackageManager)

所以需要查看上面PackageManager 的行为:看起来在read 信息,如下backtrace ,read 为何这部分read 操作那么久??如下,在read 前还有dexoat 优化,安装,执行的操作,

所以应该是这部分的操作影响到如下这个call stack 执行力较长时间导致SWT

at android.net.LocalSocketImpl.readba_native(Native method)
at android.net.LocalSocketImpl.-wrap1(LocalSocketImpl.java:-1)
at android.net.LocalSocketImpl$SocketInputStream.read(LocalSocketImpl.java:110)
- locked <0x05fd685e> (a java.lang.Object)
at libcore.io.Streams.readFully(Streams.java:81)
at com.android.internal.os.InstallerConnection.readFully(InstallerConnection.java:222)
at com.android.internal.os.InstallerConnection.readReply(InstallerConnection.java:237)
at com.android.internal.os.InstallerConnection.transact(InstallerConnection.java:91)
- locked <0x0966443f> (a com.android.internal.os.InstallerConnection)
at com.android.internal.os.InstallerConnection.execute(InstallerConnection.java:124)
at com.android.internal.os.InstallerConnection.dexopt(InstallerConnection.java:147)
at com.android.server.pm.Installer.dexopt(Installer.java:153)
at com.android.server.pm.PackageDexOptimizer.performDexOptLI(Packag

如以上分析,是在等锁超时,SWT,导致超时的原因应该是安装一些应用时,dex2oat 操作等时间较长:如下为SWT时间时,dex2oat 的时间

07-15 15:12:36.600 12505 12505 I dex2oat : /system/bin/dex2oat --compiler-filter=interpret-only -j1
07-15 15:12:36.600 12505 12505 I dex2oat : /system/bin/dex2oat --compiler-filter=interpret-only -j1
07-15 15:12:46.360 12505 12505 I dex2oat : Large app, accepted running with swap.
07-15 15:12:50.775 12505 12505 W dex2oat : Before Android 4.1, method void com.tencent.biz.pubaccount.AccountDetail.activity.EqqAccountDetailActivity.l() would have incorrectly overridden the package-private method in com.tencent.biz.pubaccount.AccountDetailActivity
07-15 15:12:51.150 12505 12505 W dex2oat : Before Android 4.1, method void com.tencent.mobileqq.activity.fling.ContentWrapView.ensureTransformationInfo() would have incorrectly overridden the package-private method in android.view.View
07-15 15:13:19.736 12505 12505 W dex2oat : Verification of java.util.List com.tencent.mobileqq.activity.contact.addcontact.AddContactsView.b() took 151.200ms
07-15 15:13:54.697 12505 12505 I dex2oat : dex2oat took 78.103s (threads: 1) arena alloc=2MB (2243640B) java alloc=50MB (53066064B) native alloc=45MB (47911536B) free=4MB (4910480B) swap=32MB (33554432B)
07-15 15:13:54.697 12505 12505 I dex2oat : dex2oat took 78.103s (threads: 1) arena alloc=2MB (2243640B) java alloc=50MB (53066064B) native alloc=45MB (47911536B) free=4MB (4910480B) swap=32MB (33554432B)

可以 针对个别apk由于的dex2oat原因安装失败/ 安装慢 / lunch慢进行处理
关于个别apk由于的dex2oat原因安装失败/ 安装慢 / lunch慢进行处理可参考

看上面的应用:是微信,qq,之前我们还有处理过可能会导致这样的问题的apk:包括qq、微信、GMS

你可能感兴趣的:(Android-机制,Performance,and,Stability)