Android framework:watchdog

watchdog就是看门狗。以前实习公司的watchdog就是监视进程,如果进程挂了就重新启动进程。

在Android中watchdog的原理也类似,通过向进程发送消息,判断返回值延迟时间,若超时,通知zogte自杀,后面init会重启zogte,所以重启的是android,不影响kernel,速度较快。

盗个图:

Android framework:watchdog_第1张图片

开始撸代码:


1.启动在systemserver:

final Watchdog watchdog = Watchdog.getInstance();

watchdog.init(context, mActivityManagerService);

Watchdog.getInstance().start();


2.getInstance是单例模式,就是调用watchdog的构造

250    private Watchdog() {
251        super("watchdog");
252        // Initialize handler checkers for each common thread we want to check.  Note
253        // that we are not currently checking the background thread, since it can
254        // potentially hold longer running operations with no guarantees about the timeliness
255        // of operations there.
256
257        // The shared foreground thread is the main checker.  It is where we
258        // will also dispatch monitor checks and do other work.
259        mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
260                "foreground thread", DEFAULT_TIMEOUT);
261        mHandlerCheckers.add(mMonitorChecker);
262        // Add checker for main thread.  We only do a quick check since there
263        // can be UI running on the thread.
264        mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),
265                "main thread", DEFAULT_TIMEOUT));
266        // Add checker for shared UI thread.
267        mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),
268                "ui thread", DEFAULT_TIMEOUT));
269        // And also check IO thread.
270        mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),
271                "i/o thread", DEFAULT_TIMEOUT));
272        // And the display thread.
273        mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),
274                "display thread", DEFAULT_TIMEOUT));
275
276        // Initialize monitor for Binder threads.
277        addMonitor(new BinderThreadMonitor());
278
279        mOpenFdMonitor = OpenFdMonitor.create();
280
281        // See the notes on DEFAULT_TIMEOUT.
282        assert DB ||
283                DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS;
284    }

在Watchdog构造函数中将main thread,UIthread,Iothread,DisplayThread加入mHandlerCheckers列表中。最后初始化monitor放入mMonitorCheckers列表中 ,还有binder和fd的monitor


3.watchdog监控

Watchdog提供两种监视方式,一种是通过monitor()回调监视服务关键区是否出现死锁或阻塞,一种是通过发送消息监视服务主线程是否阻塞。比如服务ams(monitor),跑在systemserver(发送消息)上。

addMonitor()

addThread()

monitor监控服务是通过服务实现watchdog的monitor接口,主动实现的。

发生watchdog时,会打印watchdog重启时有有两种提示语:“Block in Handler in ......”和“Block in monitor”,它们分别对应不同的阻塞类型


4.watchdog工作

watchdog是个thread,start就是调用run,看run函数,比较长

首先是进入无限循环,调用

scheduleCheckLocked();进行监控

进入这个函数里面:

1.如果monitor空,或者线程正在发消息,直接返回true,此时不可能有阻塞

2.mComplete为false,代表正在进行监控

3.若都不满足,则postAtFrontOfQueue(this),进行检查

调用postAtFrontOfQueue后,如果没有阻塞,则很快有返回,代表thread没有阻塞,有返回就会调用它的run函数,调用相应服务的monitor,而monitor就是加个锁,看能不能获取到,获取到就没有阻塞

 @Override
200        public void run() {
201            final int size = mMonitors.size();
202            for (int i = 0 ; i < size ; i++) {
203                synchronized (Watchdog.this) {
204                    mCurrentMonitor = mMonitors.get(i);
205                }
206                mCurrentMonitor.monitor();
207            }
208
209            synchronized (Watchdog.this) {
210                mCompleted = true;
211                mCurrentMonitor = null;
212            }
213        }

4.报异常逻辑

在每个监测过程中,调用evaluateCheckerCompletionLocked进行返回时间计算

complete就是没有阻塞

waitting状态就是时间在0~30,继续等待

waited_half状态实在30~59 时间过半,开始dump ams stacktrace

到60秒,就是有阻塞发生了

获取阻塞的服务和线程,生成log和dropbox

最后开杀

Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
563                WatchdogDiagnostics.diagnoseCheckers(blockedCheckers);
564                Slog.w(TAG, "*** GOODBYE!");
565                Process.killProcess(Process.myPid());
566                System.exit(10);

5.接收广播重启

在init()函数中,接下来会调用registerReceiver()来注册系统重启的BroadcastReceiver。在收到系统重启广播时会执行RebootRequestReceiver的onReceive()函数,继而调用rebootSystem()重启系统。它允许其它模块(如CTS)通过发广播来让系统重启。所以watchdog有一个重要的工作,就是接收广播并重启系统。


盗了张图:Android framework:watchdog_第2张图片

你可能感兴趣的:(android源码)