watchdog就是看门狗。以前实习公司的watchdog就是监视进程,如果进程挂了就重新启动进程。
在Android中watchdog的原理也类似,通过向进程发送消息,判断返回值延迟时间,若超时,通知zogte自杀,后面init会重启zogte,所以重启的是android,不影响kernel,速度较快。
盗个图:
开始撸代码:
1.启动在systemserver:
final Watchdog watchdog = Watchdog.getInstance();
watchdog.init(context, mActivityManagerService);
Watchdog.getInstance().start();
2.getInstance是单例模式,就是调用watchdog的构造
250 private Watchdog() { 251 super("watchdog"); 252 // Initialize handler checkers for each common thread we want to check. Note 253 // that we are not currently checking the background thread, since it can 254 // potentially hold longer running operations with no guarantees about the timeliness 255 // of operations there. 256 257 // The shared foreground thread is the main checker. It is where we 258 // will also dispatch monitor checks and do other work. 259 mMonitorChecker = new HandlerChecker(FgThread.getHandler(), 260 "foreground thread", DEFAULT_TIMEOUT); 261 mHandlerCheckers.add(mMonitorChecker); 262 // Add checker for main thread. We only do a quick check since there 263 // can be UI running on the thread. 264 mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()), 265 "main thread", DEFAULT_TIMEOUT)); 266 // Add checker for shared UI thread. 267 mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(), 268 "ui thread", DEFAULT_TIMEOUT)); 269 // And also check IO thread. 270 mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(), 271 "i/o thread", DEFAULT_TIMEOUT)); 272 // And the display thread. 273 mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(), 274 "display thread", DEFAULT_TIMEOUT)); 275 276 // Initialize monitor for Binder threads. 277 addMonitor(new BinderThreadMonitor()); 278 279 mOpenFdMonitor = OpenFdMonitor.create(); 280 281 // See the notes on DEFAULT_TIMEOUT. 282 assert DB || 283 DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS; 284 }
在Watchdog构造函数中将main thread,UIthread,Iothread,DisplayThread加入mHandlerCheckers列表中。最后初始化monitor放入mMonitorCheckers列表中 ,还有binder和fd的monitor
3.watchdog监控
Watchdog提供两种监视方式,一种是通过monitor()回调监视服务关键区是否出现死锁或阻塞,一种是通过发送消息监视服务主线程是否阻塞。比如服务ams(monitor),跑在systemserver(发送消息)上。
addMonitor()
addThread()
monitor监控服务是通过服务实现watchdog的monitor接口,主动实现的。
发生watchdog时,会打印watchdog重启时有有两种提示语:“Block in Handler in ......”和“Block in monitor”,它们分别对应不同的阻塞类型
4.watchdog工作
watchdog是个thread,start就是调用run,看run函数,比较长
首先是进入无限循环,调用
scheduleCheckLocked();进行监控
进入这个函数里面:
1.如果monitor空,或者线程正在发消息,直接返回true,此时不可能有阻塞
2.mComplete为false,代表正在进行监控
3.若都不满足,则postAtFrontOfQueue(this),进行检查
调用postAtFrontOfQueue后,如果没有阻塞,则很快有返回,代表thread没有阻塞,有返回就会调用它的run函数,调用相应服务的monitor,而monitor就是加个锁,看能不能获取到,获取到就没有阻塞
@Override 200 public void run() { 201 final int size = mMonitors.size(); 202 for (int i = 0 ; i < size ; i++) { 203 synchronized (Watchdog.this) { 204 mCurrentMonitor = mMonitors.get(i); 205 } 206 mCurrentMonitor.monitor(); 207 } 208 209 synchronized (Watchdog.this) { 210 mCompleted = true; 211 mCurrentMonitor = null; 212 } 213 }
4.报异常逻辑
在每个监测过程中,调用evaluateCheckerCompletionLocked进行返回时间计算
complete就是没有阻塞
waitting状态就是时间在0~30,继续等待
waited_half状态实在30~59 时间过半,开始dump ams stacktrace
到60秒,就是有阻塞发生了
获取阻塞的服务和线程,生成log和dropbox
最后开杀
Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject); 563 WatchdogDiagnostics.diagnoseCheckers(blockedCheckers); 564 Slog.w(TAG, "*** GOODBYE!"); 565 Process.killProcess(Process.myPid()); 566 System.exit(10);
5.接收广播重启
在init()函数中,接下来会调用registerReceiver()来注册系统重启的BroadcastReceiver。在收到系统重启广播时会执行RebootRequestReceiver的onReceive()函数,继而调用rebootSystem()重启系统。它允许其它模块(如CTS)通过发广播来让系统重启。所以watchdog有一个重要的工作,就是接收广播并重启系统。