在Android系统中,所有的系统服务都运行在SystemServer进程中,如果实时监测系统所有服务是否正常运行呢?Android软 Watchdog就是用来胜任这个工作的,WatchDog的作用:
1).接收系统内部reboot请求,重启系统。
2).监护SystemServer进程,防止系统死锁。
Android watchdog类图:
Watchdog本身继承Thread,是一个线程类,监控任务运行在独立的线程中,但是Watchdog线程并没有自己的消息队列,该线程共用SystemServer主线程的消息队列。Watchdog有一个mMonitors成员变量,该变量是一个monitor类型的动态数组,用于保存所有Watchdog监测对象。Monitor定义为接口类型,需要加入Watchdog监控的服务必须实现Monitor接口。HeartbeatHandler类为WatchDog的核心,负责对各个监护对象进行监护。
Watchdog启动
WatchDog是在SystemServer进程中被初始化和启动的。在SystemServer 被Start时,各种Android服务被注册和启动,其中也包括了WatchDog的初始化和启动。
- Slog.i(TAG, "Init Watchdog");
- Watchdog.getInstance().init(context, battery, power, alarm,ActivityManagerService.self());
- Watchdog.getInstance().start();
Watchdog采用单例模式构造对象
- public static Watchdog getInstance() {
- if (sWatchdog == null) {
- sWatchdog = new Watchdog();
- }
- return sWatchdog;
- }
Watchdog构造过程
- private Watchdog() {
- super("watchdog");
- mHandler = new HeartbeatHandler();
- }
在构造Watchdog时创建了一个心跳HeartbeatHandler,用于处理Watchdog线程发送的MONITOR消息。接着调用Watchdog的init函数来初始化Watchdog对象:
- public void init(Context context, BatteryService battery,
- PowerManagerService power, AlarmManagerService alarm,
- ActivityManagerService activity) {
- mResolver = context.getContentResolver();
- mBattery = battery;
- mPower = power;
- mAlarm = alarm;
- mActivity = activity;
-
- context.registerReceiver(new RebootReceiver(),new IntentFilter(REBOOT_ACTION));
- mRebootIntent = PendingIntent.getBroadcast(context,0, new Intent(REBOOT_ACTION), 0);
-
- context.registerReceiver(new RebootRequestReceiver(),
- new IntentFilter(Intent.ACTION_REBOOT),
- android.Manifest.permission.REBOOT, null);
- mBootTime = System.currentTimeMillis();
- }
RebootReceiver负责接收由AlarManagerService发出的PendingIntent,并进行系统重启。
RebootRequestReceiver负责接收系统内部发出的重启Intent消息,并进行系统重启。
添加监控对象
在启动Watchdog前,需要向其添加监测对象。在Android4.1中有7个服务实现了Watchdog.Monitor接口,即这些服务都可以被Watchdog监控。
ActivityManagerService
InputManagerService
MountService
NativeDaemonConnector
NetworkManagementService
PowerManagerService
WindowManagerService
Watchdog提供了addMonitor方法来添加监控对象
- public void addMonitor(Monitor monitor) {
- synchronized (this) {
- if (isAlive()) {
- throw new RuntimeException("Monitors can't be added while the Watchdog is running");
- }
- mMonitors.add(monitor);
- }
- }
添加过程只是将需要被Watchdog监控的对象添加到Watchdog的动态monitor数组mMonitors中。
Watchdog监控过程
当调用Watchdog.getInstance().start()将启动Watchdog线程,Watchdog执行过程如下:
- public void run() {
- boolean waitedHalf = false;
- while (true) {
-
-
-
- mCompleted = false;
-
-
-
-
- if (mHandler.sendEmptyMessage(MONITOR)) {
- if (WATCHDOG_DEBUG) Slog.v(TAG,"**** -1-Watchdog MSG SENT! ****");
- }
-
-
-
- synchronized (this) {
-
-
-
- long timeout = TIME_TO_WAIT;
-
-
-
- long start = SystemClock.uptimeMillis();
-
-
-
- while (timeout > 0 && !mForceKillSystem) {
- try {
- wait(timeout);
- } catch (InterruptedException e) {
- Log.wtf(TAG, e);
- }
- timeout = TIME_TO_WAIT - (SystemClock.uptimeMillis() - start);
- }
-
-
-
- if (mCompleted && !mForceKillSystem) {
-
-
-
- waitedHalf = false;
- continue;
- }
-
-
-
-
-
- if (!waitedHalf) {
- ArrayList pids = new ArrayList();
- pids.add(Process.myPid());
-
-
-
-
- ActivityManagerService.dumpStackTraces(true, pids, null, null,NATIVE_STACKS_OF_INTEREST);
- SystemClock.sleep(3000);
-
-
-
- if (RECORD_KERNEL_THREADS) {
- dumpKernelStackTraces();
- SystemClock.sleep(2000);
- }
-
-
-
- waitedHalf = true;
-
-
-
-
- continue;
- }
- }
-
-
-
-
- final String name = (mCurrentMonitor != null) ?mCurrentMonitor.getClass().getName() : "null";
- Slog.w(TAG, "*** WATCHDOG IS GOING TO KILL SYSTEM PROCESS: " + name);
- EventLog.writeEvent(EventLogTags.WATCHDOG, name);
- ArrayList pids = new ArrayList();
- pids.add(Process.myPid());
- if (mPhonePid > 0) pids.add(mPhonePid);
-
-
-
-
- final File stack = ActivityManagerService.dumpStackTraces(
- !waitedHalf, pids, null, null, NATIVE_STACKS_OF_INTEREST);
-
-
-
- SystemClock.sleep(3000);
- if (RECORD_KERNEL_THREADS) {
- dumpKernelStackTraces();
- SystemClock.sleep(2000);
- }
-
-
-
- Thread dropboxThread = new Thread("watchdogWriteToDropbox") {
- public void run() {
- mActivity.addErrorToDropBox("watchdog", null, "system_server", null, null,
- name, null, stack, null);
- }
- };
- dropboxThread.start();
- try {
- dropboxThread.join(2000);
- } catch (InterruptedException ignored) {}
-
-
-
- if (!Debug.isDebuggerConnected()) {
- Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + name);
- Process.killProcess(Process.myPid());
- System.exit(10);
- } else {
- Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");
- }
- waitedHalf = false;
- }
- }
run函数实现比较简单,周期性地设置mCompleted变量为假,通知心跳handler去调用各个monitor,而心跳handler会调用各个service的monitor,如果各个monitor都返回了,心跳handler会将mCompleted设置为真。否则,经过2次等待watchgod的run()发现mCompleted还为假,就证明hang了。在Watchdog线程中只是周期性地发送MONITOR消息以达到喂狗的效果,真正监测服务对象的任务在SystemServer的主线程中完成:
- public void handleMessage(Message msg) {
- switch (msg.what) {
-
-
-
- case MONITOR: {
- if (WATCHDOG_DEBUG) Slog.v(TAG, " **** 0-CHECK IF FORCE A REBOOT ! **** ");
-
- int rebootInterval = mReqRebootInterval >= 0
- ? mReqRebootInterval : Settings.Secure.getInt(
- mResolver, Settings.Secure.REBOOT_INTERVAL,
- REBOOT_DEFAULT_INTERVAL);
- if (mRebootInterval != rebootInterval) {
- mRebootInterval = rebootInterval;
-
-
- checkReboot(false);
- }
-
- if (WATCHDOG_DEBUG) Slog.v(TAG, " **** 1-CHECK ALL MONITORS BEGIN ! **** ");
-
-
-
- final int size = mMonitors.size();
- for (int i = 0 ; i < size ; i++) {
- mCurrentMonitor = mMonitors.get(i);
- mCurrentMonitor.monitor();
- }
- if (WATCHDOG_DEBUG) Slog.v(TAG, " **** 2-CHECK ALL MONITORS FINISHED ! **** ");
-
-
-
-
-
- synchronized (Watchdog.this) {
- mCompleted = true;
- mCurrentMonitor = null;
- }
- if (WATCHDOG_DEBUG) Slog.v(TAG, " **** 3-SYNC Watchdog.THIS FINISHED ! ****");
- if (WATCHDOG_DEBUG) Slog.v(TAG, " ");
- } break;
- }
- }
每个注册到WatchDog服务中的监测对象对必须实现WatchDog.Monitor接口,同时必须实现该接口中的monitor方法,这些被监控的服务在monitor函数中都做了什么工作呢?对于ActivityManagerService来说,其实现的monitor函数如下:
- public void monitor() {
- synchronized (this) { }
- }
在ActivityManagerService服务实现的其他函数中,用于线程同步的锁都是ActivityManagerService对象自身,这里的monitor函数只是简单地去请求这个锁,如果ActivityManagerService服务运行正常,即没有发送线程死锁等,请求这个锁是很快完成的,即monitor函数可以顺利返回,但是如果ActivityManagerService在执行过程中发生线程死锁,即其他执行函数始终占用锁,monitor函数不能及时请求到该锁,也即无法正常返回,心跳HeartbeatHandler不能及时设置标志位mCompleted的值,从而告知Watchdog线程被监测的对象运行有异常,让Watchdog线程杀死SystemServer进程。SystemServer监控重要service,重要service hang则SystemServer死,SystemServer死则Zygote监控到,Zygote也死并且杀死整个Java世界,Zygote死则init监控到,init重新启动Zygote,之后SystemServer、service又进入重生过程。