Android WatchDog原理分析

简述

了解WatchDog的原理,可以更好的理解系统服务的运行机制

分析

1.Watchdog extends Thread

Watchdog是一个线程

2.在SystemServer.java中启动

private void startOtherServices() {
    ······
    traceBeginAndSlog("InitWatchdog");
    final Watchdog watchdog = Watchdog.getInstance();
    watchdog.init(context, mActivityManagerService);
    traceEnd();
    ······
    traceBeginAndSlog("StartWatchdog");
    Watchdog.getInstance().start();
   traceEnd();
}
因为是线程,所以,只要start即可

3.查看WatchDog的构造方法

private Watchdog() {
        super("watchdog");
        // Initialize handler checkers for each common thread we want to check.  Note
        // that we are not currently checking the background thread, since it can
        // potentially hold longer running operations with no guarantees about the timeliness
        // of operations there.

        // The shared foreground thread is the main checker.  It is where we
        // will also dispatch monitor checks and do other work.
        mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
                "foreground thread", DEFAULT_TIMEOUT);
        mHandlerCheckers.add(mMonitorChecker);
        // Add checker for main thread.  We only do a quick check since there
        // can be UI running on the thread.
        mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),
                "main thread", DEFAULT_TIMEOUT));
        // Add checker for shared UI thread.
        mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),
                "ui thread", DEFAULT_TIMEOUT));
        // And also check IO thread.
        mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),
                "i/o thread", DEFAULT_TIMEOUT));
        // And the display thread.
        mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),
                "display thread", DEFAULT_TIMEOUT));

        // Initialize monitor for Binder threads.
        addMonitor(new BinderThreadMonitor());

        mOpenFdMonitor = OpenFdMonitor.create();

        // See the notes on DEFAULT_TIMEOUT.
        assert DB ||
                DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS;

        // mtk enhance
        exceptionHWT = new ExceptionLog();
    }
1.重点关注两个对象:mMonitorChecker和mHandlerCheckers

2.mHandlerCheckers列表元素的来源:
1)构造对象的导入:UiThread、IoThread、DisplatyThread、FgThread加入
2)外部导入:Watchdog.getInstance().addThread(handler);

3.mMonitorChecker列表元素的来源:
外部导入:Watchdog.getInstance().addMonitor(monitor);
特别说明:addMonitor(new BinderThreadMonitor());

4.查看WatchDog的run方法

public void run() {
        boolean waitedHalf = false;
        boolean mSFHang = false;
        while (true) {
            ······
            synchronized (this) {
                ······
                for (int i=0; i

5.查看HandlerChecker的scheduleCheckLocked

public void scheduleCheckLocked() {
        if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) {
                // If the target looper has recently been polling, then
                // there is no reason to enqueue our checker on it since that
                // is as good as it not being deadlocked.  This avoid having
                // to do a context switch to check the thread.  Note that we
                // only do this if mCheckReboot is false and we have no
                // monitors, since those would need to be executed at this point.
                mCompleted = true;
                return;
        }

        if (!mCompleted) {
                // we already have a check in flight, so no need
                return;
        }
        
        mCompleted = false;
        mCurrentMonitor = null;
        mStartTime = SystemClock.uptimeMillis();
        mHandler.postAtFrontOfQueue(this);
}

1.mMonitors.size() == 0的情況,
主要为了检查mHandlerCheckers中的元素是否超时,运用的手段:mHandler.getLooper().getQueue().isPolling()

2.mMonitorChecker对象的列表元素一定是大于0,此时,关注点在mHandler.postAtFrontOfQueue(this):
public void run() {
       final int size = mMonitors.size();
       for (int i = 0 ; i < size ; i++) {
            synchronized (Watchdog.this) {
                mCurrentMonitor = mMonitors.get(i);
            }
            mCurrentMonitor.monitor();
       }

       synchronized (Watchdog.this) {
            mCompleted = true;
            mCurrentMonitor = null;
       }
}
运用的手段:监听monitor方法
1)这里是对mMonitors进行monitor,而能够满足条件的只有:mMonitorChecker,例如:各种服务通过addMonitor加入列表
ActivityManagerService.java
    Watchdog.getInstance().addMonitor(this); 

InputManagerService.java
    Watchdog.getInstance().addMonitor(this); 

PowerManagerService.java
    Watchdog.getInstance().addMonitor(this); 

ActivityManagerService.java
    Watchdog.getInstance().addMonitor(this); 

WindowManagerService.java
    Watchdog.getInstance().addMonitor(this); 
而被执行的monitor方法很简单,例如ActivityManagerService:
public void monitor() {
     synchronized (this) { }
}
这里仅仅是检查系统服务是否被锁住。

2)特别说明,怎样检查BinderThreadMonitor?
Watchdog的内部类
private static final class BinderThreadMonitor implements Watchdog.Monitor {
        @Override
        public void monitor() {
            Binder.blockUntilThreadAvailable();
        }
}

android.os.Binder.java
public static final native void blockUntilThreadAvailable();

android_util_Binder.cpp
static void android_os_Binder_blockUntilThreadAvailable(JNIEnv* env, jobject clazz)
{
    return IPCThreadState::self()->blockUntilThreadAvailable();
}

IPCThreadState.cpp
void IPCThreadState::blockUntilThreadAvailable()
{
    pthread_mutex_lock(&mProcess->mThreadCountLock);
    while (mProcess->mExecutingThreadsCount >= mProcess->mMaxThreads) {
        ALOGW("Waiting for thread to be free. mExecutingThreadsCount=%lu mMaxThreads=%lu\n",
                static_cast(mProcess->mExecutingThreadsCount),
                static_cast(mProcess->mMaxThreads));
        pthread_cond_wait(&mProcess->mThreadCountDecrement, &mProcess->mThreadCountLock);
    }
    pthread_mutex_unlock(&mProcess->mThreadCountLock);
}
这里仅仅是检查进程中包含的可执行线程的数量不能超过mMaxThreads,如果超过了最大值(31个),就需要等待。
原因:
ProcessState.cpp
#define DEFAULT_MAX_BINDER_THREADS 15
但是systemserver.java进行了设置
// maximum number of binder threads used for system_server
// will be higher than the system default
private static final int sMaxBinderThreads = 31;
private void run() {
    ······
    BinderInternal.setMaxThreads(sMaxBinderThreads);
    ······
}

6.发生超时后,WatchDog会做什么?

public void run() {
    ······
    Process.killProcess(Process.myPid());
    System.exit(10);
    ······
}
kill自己所在进程(system_server),并退出。

7.问题

1).WatchDog会打印什么日志?

(1)process stack traces

保存路径由dalvik.vm.stack-trace-file或dalvik.vm.stack-trace-dir控制,常规为/data/anr/ ActivityManagerService.dumpStackTraces(true, pids, null, null, getInterestingNativePids()); 
注意点: 1.堵塞一半时即WAITED_HALF,也会打印process stack traces

(2)slog

sys log ---> android.util.Slog (hide类) 

Slog.e(TAG, "**SWT happen **" + subject); 

Slog.v(TAG, "** save all info before killnig system server **"); 

Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject); 

Slog.w(TAG, "*** GOODBYE!");

(3)event log

EventLog.writeEvent(EventLogTags.WATCHDOG, name.isEmpty() ? subject : name);

(4)kernel stack traces

保存路径由dalvik.vm.stack-trace-file控制,常规为/data/anr/
if (RECORD_KERNEL_THREADS) {
   dumpKernelStackTraces();
}
private File dumpKernelStackTraces() {
        String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);
        if (tracesPath == null || tracesPath.length() == 0) {
            return null;
        }

        native_dumpKernelStacks(tracesPath);
        return new File(tracesPath);
}

(5)dropbox

Thread dropboxThread = new Thread("watchdogWriteToDropbox") {
     public void run() {
            Slog.v(TAG, "** start addErrorToDropBox **");
            mActivity.addErrorToDropBox(
                                "watchdog", null, "system_server", null, null,
                                name.isEmpty() ? subject : name, null, stack, null);
            }
};
dropboxThread.start();
注意:
dropbox一般放在/data/system/dropbox目录下,具体原因如下:
DropBoxManagerService.java
public DropBoxManagerService(final Context context) {
        this(context, new File("/data/system/dropbox"), FgThread.get().getLooper());
}

2.为什么要监测UiThread、IoThread、DisplatyThread、FgThread?

首先,这4个类,继承ServiceThread,是单例模式。例如UiThread.java

/**
 * Shared singleton thread for showing UI.  This is a foreground thread, and in
 * additional should not have operations that can take more than a few ms scheduled
 * on it to avoid UI jank.
 */
public final class UiThread extends ServiceThread {
    private static final long SLOW_DISPATCH_THRESHOLD_MS = 100;
    private static UiThread sInstance;
    private static Handler sHandler;

    private UiThread() {
        super("android.ui", Process.THREAD_PRIORITY_FOREGROUND, false /*allowIo*/);
    }

    @Override
    public void run() {
        // Make sure UiThread is in the fg stune boost group
        Process.setThreadGroup(Process.myTid(), Process.THREAD_GROUP_TOP_APP);
        super.run();
    }

    private static void ensureThreadLocked() {
        if (sInstance == null) {
            sInstance = new UiThread();
            sInstance.start();
            final Looper looper = sInstance.getLooper();
            looper.setTraceTag(Trace.TRACE_TAG_ACTIVITY_MANAGER);
            looper.setSlowDispatchThresholdMs(SLOW_DISPATCH_THRESHOLD_MS);
            sHandler = new Handler(sInstance.getLooper());
        }
    }

    public static UiThread get() {
        synchronized (UiThread.class) {
            ensureThreadLocked();
            return sInstance;
        }
    }

    public static Handler getHandler() {
        synchronized (UiThread.class) {
            ensureThreadLocked();
            return sHandler;
        }
    }
}
1.通过get()获取对象
2.通过getHandler()获取各自线程里面的Handler对象
3.注意看,创建自身对象ensureThreadLocked的时候,就进行了start动作。也就是说,这个线程
在创建对象的时候就,就已经启动了。

其次,这四个类都继承ServiceThread ,而ServiceThread继承HandlerThread。我们重点关注线程中的Handler,因为ActivityManagerService、WMS、PMS等系统服务都涉及调用它们。

final class UiHandler extends Handler {
        public UiHandler() {
            super(com.android.server.UiThread.get().getLooper(), null, true);
        }

        @Override
        public void handleMessage(Message msg) {
            switch (msg.what) {
            case SHOW_ERROR_UI_MSG: {
                mAppErrors.handleShowAppErrorUi(msg);
                ensureBootCompleted();
            } break;
            ······
        }
}
1.UiHandler是直接获取的UiThread里面的Looper。我们清楚一个线程一个Looper,一个MessageQueue,但是可以有多个Handler.
2.我们看handleMessage里面的处理方式,说明并不一定是主线程才能更新Ui。

最后,UIThread、IoThread、DisplatyThread、FgThread之间有什么区别?

a.线程名称不一样:
分别对应名称为android.ui、android.io、android.display、android.fg

b.线程等级有差异
UiThread-->Process.THREAD_PRIORITY_FOREGROUND
IoThread、FgThread-->android.os.Process.THREAD_PRIORITY_DEFAULT
DisplatyThread-->Process.THREAD_PRIORITY_DISPLAY + 1

c.使用的场景略有差异
UiThread --> ActivityManagerService
DisplayThread --> WindowManagerService、InputManagerService、DisplayMangerService
IoThread -->
 PackageInstallerService、StorageManagerService、BluetoothManagerService

8.总结

1.Watchdog的核心对象为mHandlerCheckers和mMonitorChecker。

mHandlerCheckers:监控消息队列是否发生阻塞

mMonitorChecker:监控系统核心服务是否发生长时间持锁。

2.mHandlerCheckers的对象采用手段为通过mHandler.getLooper().getQueue().isPolling()判断是否超时;mMonitorChecker通过synchronized(this)判断是否超时,其中特别注意,BinderThreadMonitor主要是通过判断Binder线程是否超过了系统最大值来判断是否超时。

3.超时之后,系统会打印一系列的日志,可以根据各种日志输出,进行有效分析

4. 超时之后,Watchdog会杀掉自己的进程,也就是此时system_server进程id会变化

5.拓展:是否我们可以采用此方式来监听我们app是否也发生相关问题?

9.参考学习

https://blog.csdn.net/xiaosayidao/article/details/75453195

你可能感兴趣的:(Android WatchDog原理分析)