[基于 Android P]
先看下MTK关于watchdog原理解释:
这个只是我们学习前的一个概论,具体代码详解如下。
private void startOtherServices() {
final Context context = mSystemContext;
...
try{
...
traceBeginAndSlog("InitWatchdog");
//【2】实例化
final Watchdog watchdog = Watchdog.getInstance();
//【3】初始化
watchdog.init(context, mActivityManagerService);
traceEnd();
...
traceBeginAndSlog("StartWatchdog");
//【4】启动
Watchdog.getInstance().start();
traceEnd();
...
}catch (RuntimeException e) {
Slog.e("System", "******************************************");
Slog.e("System", "************ Failure starting core service", e);
}
...
}
Watchdog继承Thread类,使用单例模式实例化,调用自身init方法初始化。
public static Watchdog getInstance() {
if (sWatchdog == null) {
sWatchdog = new Watchdog();
}
return sWatchdog;
}
实例化watchdog
private Watchdog() {
super("watchdog");
// 为我们要检查的每个公共线程初始化处理程序检查器。
// 请注意,我们当前没有检查后台线程,
// 因为它可能会保留更长时间的运行操作,
// 而不保证其中的操作的及时性。
// 添加android.fg线程监控
mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
"foreground thread", DEFAULT_TIMEOUT);
mHandlerCheckers.add(mMonitorChecker);
// 添加 main 线程监控器
mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),
"main thread", DEFAULT_TIMEOUT));
// 添加android.ui线程监控
mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),
"ui thread", DEFAULT_TIMEOUT));
// 添加android.io线程监控
mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),
"i/o thread", DEFAULT_TIMEOUT));
// 添加android.display线程监控
mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),
"display thread", DEFAULT_TIMEOUT));
// 初始化binder线程监控
addMonitor(new BinderThreadMonitor());
// 加载fd 监控 open次数保存在/proc/self/fd/中
mOpenFdMonitor = OpenFdMonitor.create();
// See the notes on DEFAULT_TIMEOUT.
assert DB ||
DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS;
}
public void init(Context context, ActivityManagerService activity) {
mResolver = context.getContentResolver();
mActivity = activity;
context.registerReceiver(new RebootRequestReceiver(),
new IntentFilter(Intent.ACTION_REBOOT),
android.Manifest.permission.REBOOT, null);
}
这里注册一个接收重启广播的Receiver,也就是所谓的软重启。
final class RebootRequestReceiver extends BroadcastReceiver {
@Override
public void onReceive(Context c, Intent intent) {
if (intent.getIntExtra("nowait", 0) != 0) {
rebootSystem("Received ACTION_REBOOT broadcast");
return;
}
Slog.w(TAG, "Unsupported ACTION_REBOOT broadcast: " + intent);
}
}
RebootRequestReceiver的onReceiver方法调用rebootSystem(PMS的reboot操作)执行手机重启。
因为Watchdog本身是个Thread,所以它的start方法会调用自身的run方法。
Watchdog.run():
static final boolean DB = false;
static final long DEFAULT_TIMEOUT = DB ? 10*1000 : 60*1000;
static final long CHECK_INTERVAL = DEFAULT_TIMEOUT / 2;//30s
@Override
public void run() {
boolean waitedHalf = false;
while (true) {
final List blockedCheckers;
final String subject;
final boolean allowRestart;
int debuggerWasConnected = 0;
synchronized (this) {
long timeout = CHECK_INTERVAL;//30s
//每30s轮询所有的monitor
for (int i=0; i 0) {
debuggerWasConnected--;
}
// 确保30s之后执行下面的代码(防止wait(timeout)发生中断)
long start = SystemClock.uptimeMillis();
while (timeout > 0) {
if (Debug.isDebuggerConnected()) {
debuggerWasConnected = 2;
}
try {
wait(timeout);
} catch (InterruptedException e) {
Log.wtf(TAG, e);
}
if (Debug.isDebuggerConnected()) {
debuggerWasConnected = 2;
}
timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);
}
boolean fdLimitTriggered = false;
if (mOpenFdMonitor != null) {
fdLimitTriggered = mOpenFdMonitor.monitor();
}
//评估monitor完成状态,并做相应操作
if (!fdLimitTriggered) {
//【6】
final int waitState = evaluateCheckerCompletionLocked();
if (waitState == COMPLETED) {
//已完成,跳过
waitedHalf = false;
continue;
} else if (waitState == WAITING) {
//waiting状态,但并未超过timeout
continue;
} else if (waitState == WAITED_HALF) {
if (!waitedHalf) {
//block 30s时候先dump一次system_server和一些native的 stack
ArrayList pids = new ArrayList();
pids.add(Process.myPid());
ActivityManagerService.dumpStackTraces(true, pids, null, null,
getInterestingNativePids());
waitedHalf = true;
//waitedHalf这个变量保证下一次过来还是当前状态不用dump堆栈,交给下面部分去dump.
}
continue;
}
// 如果状态是 overdue!,也就是超过60秒
blockedCheckers = getBlockedCheckersLocked();//【7】
subject = describeCheckersLocked(blockedCheckers);
} else {
blockedCheckers = Collections.emptyList();
subject = "Open FD high water mark reached";
}
allowRestart = mAllowRestart;
}
//代码执行到这里说明此时system_server中的监控线程已经卡住并且超过60s,
//此时会dump堆栈并kill system_server 然后restart
EventLog.writeEvent(EventLogTags.WATCHDOG, subject);
ArrayList pids = new ArrayList<>();
pids.add(Process.myPid());
if (mPhonePid > 0) pids.add(mPhonePid);
//dump即将被kill进程的堆栈【8】
final File stack = ActivityManagerService.dumpStackTraces(
!waitedHalf, pids, null, null, getInterestingNativePids());
// 多留一点时间保证dump信息可以保存完整
SystemClock.sleep(2000);
// 触发内核来dump所有被block的线程,并输出所有CPU上堆栈到kernel log中【9】
doSysRq('w');
doSysRq('l');
// Try to add the error to the dropbox
Thread dropboxThread = new Thread("watchdogWriteToDropbox") {
public void run() {
mActivity.addErrorToDropBox(
"watchdog", null, "system_server", null, null,
subject, null, stack, null);
}
};
dropboxThread.start();
try {
dropboxThread.join(2000); // wait up to 2 seconds for it to return.
} catch (InterruptedException ignored) {}
IActivityController controller;
synchronized (this) {
controller = mController;
}
if (controller != null) {
Slog.i(TAG, "Reporting stuck state to activity controller");
try {
Binder.setDumpDisabled("Service dumps disabled due to hung system process.");
// 1 = keep waiting, -1 = kill system
int res = controller.systemNotResponding(subject);
if (res >= 0) {
Slog.i(TAG, "Activity controller requested to coninue to wait");
waitedHalf = false;
continue;
}
} catch (RemoteException e) {
}
}
// Only kill the process if the debugger is not attached.
if (Debug.isDebuggerConnected()) {
debuggerWasConnected = 2;
}
if (debuggerWasConnected >= 2) {
Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");
} else if (debuggerWasConnected > 0) {
Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");
} else if (!allowRestart) {
Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");
} else {
Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
WatchdogDiagnostics.diagnoseCheckers(blockedCheckers);
Slog.w(TAG, "*** GOODBYE!");
// kill 掉system_server
Process.killProcess(Process.myPid());
System.exit(10);
}
waitedHalf = false;
}
}
这个方法是watchdog监控的核心:
根据waitState状态来执行不同的操作:
下面详细分析这个方法:
public final class HandlerChecker implements Runnable {
private final Handler mHandler;
private final String mName;
private final long mWaitMax;
private final ArrayList mMonitors = new ArrayList();
private boolean mCompleted;
private Monitor mCurrentMonitor;
private long mStartTime;
HandlerChecker(Handler handler, String name, long waitMaxMillis) {
mHandler = handler;
mName = name;
mWaitMax = waitMaxMillis;
mCompleted = true;
}
public void addMonitor(Monitor monitor) {
mMonitors.add(monitor);
}
public void scheduleCheckLocked() {
if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) {
//当mMonitor个数为0(除了android.fg线程之外都为0)且处于poll状态,则设置mCompleted = true;
mCompleted = true;
return;
}
if (!mCompleted) {
//当上次check还没有完成, 则直接返回.
return;
}
mCompleted = false;
mCurrentMonitor = null;
mStartTime = SystemClock.uptimeMillis();//为每个checker设置startTime
mHandler.postAtFrontOfQueue(this);//发送消息,插入消息队列最开头
}
......
}
mHandler.postAtFrontOfQueue(this): 该方法输入参数为Runnable对象,根据消息机制, 最终会回调HandlerChecker中的run方法。
[-> Watchdog.java]
@Override
public void run() {
final int size = mMonitors.size();
for (int i = 0 ; i < size ; i++) {
synchronized (Watchdog.this) {
mCurrentMonitor = mMonitors.get(i);
}
//回调实现Watchdog.Monitor的Service的monitor方法
mCurrentMonitor.monitor();
}
synchronized (Watchdog.this) {
mCompleted = true;
mCurrentMonitor = null;
}
}
run方法会循环遍历所有的Monitor接口,具体的服务实现该接口的monitor()方法,执行完成后会设置mCompleted = true. 那么当handler消息池当前的消息, 导致迟迟没有机会执行monitor()方法, 则会触发watchdog.
回调实现Watchdog.Monitor的Service的monitor方法以AMS为例:
public class ActivityManagerService extends IActivityManager.Stub
implements Watchdog.Monitor, BatteryStatsImpl.BatteryCallback {
...
public ActivityManagerService(Context systemContext) {
...
Watchdog.getInstance().addMonitor(this);
Watchdog.getInstance().addThread(mHandler);
...
}
// synchronized避免死锁
public void monitor() {
synchronized (this) { }
}
...
}
private int evaluateCheckerCompletionLocked() {
int state = COMPLETED;
for (int i=0; i
evaluateCheckerCompletionLocked()获取mHandlerCheckers列表中等待状态值最大的state.
getCompletionStateLocked():
private ArrayList getBlockedCheckersLocked() {
ArrayList checkers = new ArrayList();
for (int i=0; i
这篇文章主要看watchdog的监控流程,这里dump相关堆栈,不做深入分析了,doSysRq()也一样。
。。。。