Android 死锁解题案例 - 【01】

问题描述

步骤: 进入设置,点击显示,点击互动品保,在互动屏保中选择万花筒后,点击立即启动

实际结果: 出现一次手机死机现象

时间:大概在14:25左右

复现概率: >1%


拿到log解开查看: SWT,868,-1361051648,99,/data/core/,0,system_server_watchdog,system_server,Fri Jan 1 16:53:44 CST 2016,1

=》 SWT 异常,第一印象,system_server 进程被阻塞了导致软件看门狗异常了,解题关键在于找到循环等待的关键信息,

下面进入解题分析:

复制代码
1 Current Executing Process:
2 system_server
3 
4 Backtrace:
5 Process: system_server
6 Subject: Blocked in monitor com.android.server.am.ActivityManagerService on foreground thread (android.fg), Blocked in handler on main thread (main), Blocked in handler on ui thread (android.ui), Blocked in handler on display thread (android.display), Blocked in handler on ActivityManager (ActivityManager), Blocked in handler on PowerManagerService (PowerManagerService)
复制代码

==》
核心进程system_server 最终是被blocked在 PowerManagerService某个thread,追踪 PowerManagerService 相关的log:

复制代码
 1 pc 0001a34b  /system/lib/libc.so (epoll_pwait+26)
 2   native: #02 pc 0001a359  /system/lib/libc.so (epoll_wait+6)
 3   native: #03 pc 00012b6b  /system/lib/libutils.so (_ZN7android6Looper9pollInnerEi+102)
 4   native: #04 pc 00012deb  /system/lib/libutils.so (_ZN7android6Looper8pollOnceEiPiS1_PPv+130)
 5   native: #05 pc 00084039  /system/lib/libandroid_runtime.so (_ZN7android18NativeMessageQueue8pollOnceEP7_JNIEnvP8_jobjecti+22)
 6   native: #06 pc 00000585  /system/framework/arm/boot.oat (Java_android_os_MessageQueue_nativePollOnce__JI+96)
 7   at android.os.MessageQueue.nativePollOnce(Native method)
 8   at android.os.MessageQueue.next(MessageQueue.java:328)
 9   at android.os.Looper.loop(Looper.java:164)
10   at android.os.HandlerThread.run(HandlerThread.java:61)
11 
12 "PowerManagerService" prio=5 tid=20 Blocked
13   | group="main" sCount=1 dsCount=0 obj=0x12e98660 self=0xab3f5300
14   | sysTid=904 nice=-4 cgrp=default sched=0/0 handle=0x9fb26930
15   | state=S schedstat=( 1793024626 1555176267 7045 ) utm=97 stm=82 core=1 HZ=100
16   | stack=0x9fa24000-0x9fa26000 stackSize=1038KB
17   | held mutexes=
18   at com.android.server.am.ActivityManagerService.bindService(ActivityManagerService.java:17121)
19   - waiting to lock <0x0bb97820> (a com.android.server.am.ActivityManagerService) held by thread 13
20   at android.app.ContextImpl.bindServiceCommon(ContextImpl.java:1322)
21   at android.app.ContextImpl.bindService(ContextImpl.java:1291)
22   at com.android.server.power.PowerManagerService$CABLSettingObserver.initCABLService(PowerManagerService.java:3373)
23   at com.android.server.power.PowerManagerService$CABLSettingObserver.onChange(PowerManagerService.java:3381)
24   - locked <0x0ec53695> (a java.lang.Object)
25   at android.database.ContentObserver.onChange(ContentObserver.java:145)
26   at android.database.ContentObserver$NotificationRunnable.run(ContentObserver.java:216)
27   at android.os.Handler.handleCallback(Handler.java:815)
28   at android.os.Handler.dispatchMessage(Handler.java:104)
29   at android.os.Looper.loop(Looper.java:207)
30   at android.os.HandlerThread.run(HandlerThread.java:61)
31   at com.android.server.ServiceThread.run(ServiceThread.java:46)
复制代码

注意这行:
- waiting to lock <0x0bb97820> (a com.android.server.am.ActivityManagerService) held by thread 13
被tid=13的线程给block住了,搜索log中 tid=13 的线程:

复制代码
 1 "android.ui" prio=5 tid=13 Blocked
 2   | group="main" sCount=1 dsCount=0 obj=0x12cb8d60 self=0xab3f3000
 3   | sysTid=897 nice=-2 cgrp=default sched=0/0 handle=0xa02fd930
 4   | state=S schedstat=( 1265405032 780240593 4053 ) utm=80 stm=46 core=0 HZ=100
 5   | stack=0xa01fb000-0xa01fd000 stackSize=1038KB
 6   | held mutexes=
 7   at com.android.server.power.PowerManagerService.acquireWakeLockInternal(PowerManagerService.java:1001)
 8   - waiting to lock <0x0ec53695> (a java.lang.Object) held by thread 20
 9   at com.android.server.power.PowerManagerService.-wrap10(PowerManagerService.java:-1)
10   at com.android.server.power.PowerManagerService$BinderService.acquireWakeLock(PowerManagerService.java:3673)
11   at android.os.PowerManager$WakeLock.acquireLocked(PowerManager.java:1212)
12   at android.os.PowerManager$WakeLock.acquire(PowerManager.java:1180)
13   - locked <0x050021e7> (a android.os.Binder)
14   at com.android.server.am.ActivityStackSupervisor.goingToSleepLocked(ActivityStackSupervisor.java:3473)
15   at com.android.server.am.ActivityManagerService.updateSleepIfNeededLocked(ActivityManagerService.java:11210)
16   at com.android.server.am.ActivityManagerService$LocalService.acquireSleepToken(ActivityManagerService.java:22043)
17   - locked <0x0bb97820> (a com.android.server.am.ActivityManagerService)
18   at com.android.server.policy.PhoneWindowManager.updateDreamingSleepToken(PhoneWindowManager.java:7163)
19   at com.android.server.policy.PhoneWindowManager.-wrap19(PhoneWindowManager.java:-1)
20   at com.android.server.policy.PhoneWindowManager$PolicyHandler.handleMessage(PhoneWindowManager.java:747)
复制代码

注意这行:
- waiting to lock <0x0ec53695> (a java.lang.Object) held by thread 20
被tid=20的线程block住了,

结合上面log发现是电源管理服务进程中的 tid=13的thread和tid=20的thread发生了循环等待,==》 发生死锁!
追踪源码,看tid=13的线程代码在干什么:

 7   at com.android.server.power.PowerManagerService.acquireWakeLockInternal(PowerManagerService.java:1001)
复制代码
1 private void acquireWakeLockInternal(IBinder lock, int flags, String tag, String packageName,
2             WorkSource ws, String historyTag, int uid, int pid) {
3         synchronized (mLock) {
4             if (DEBUG_SPEW) {
5                 Slog.d(TAG, "acquireWakeLockInternal: lock=" + Objects.hashCode(lock)
6                         + ", flags=0x" + Integer.toHexString(flags)
7                         + ", tag=\"" + tag + "\", ws=" + ws + ", uid=" + uid + ", pid=" + pid);
8             }
复制代码

是在等mLock这把锁,这块代码是Android 源生代码,一般来说源生代码获取锁释放锁都会比较谨慎,应该不会导致死锁的问题.

。。。

继续分析:
这把锁目前被 tid=20 的线程拿住了,找到源代码,然后看看这段代码干了什么:

22   at com.android.server.power.PowerManagerService$CABLSettingObserver.initCABLService(PowerManagerService.java:3373)
23   at com.android.server.power.PowerManagerService$CABLSettingObserver.onChange(PowerManagerService.java:3381)
复制代码
 1 private void initCABLService(){
 2         if(null == mCABLService){
 3         mCABLServiceConn = new CABLServiceConnection();
 4         Intent i = new Intent(ICABLService.class.getName()).setPackage("com.android.cabl");
 5         mContext.bindService(i, mCABLServiceConn, Context.BIND_AUTO_CREATE);
 6         resolver = mContext.getContentResolver();
 7         }
 8     }
 9 
10         public void onChange(boolean selfChange, Uri uri) {
11             synchronized (mLock) {
12                 Slog.e(TAG, "mCABLService =" + mCABLService);
13                 initCABLService();
14                 if(null != mCABLService){
15                     boolean isON = (1 == Settings.System.getInt(resolver, Settings.System.CABL_CONTROL, 0));
16                     try{
17                     Slog.e(TAG, "isON = " + isON);
18                     if(isON){
19                         mCABLService.control(CABL_CON_TYPE_ENABLE);
20                     }else{
21                         mCABLService.control(CABL_CON_TYPE_DISABLE);
22                     }
23                     }catch(RemoteException e){
24                     }
25                 }
26             }
27         }
复制代码

mLock 是源生代码系统的锁,而上面这段代码也用了这把锁,不过发现这段代码是别人新加CBAC功能的代码,所以很可能问题就出在这里!

然后拉上CBAC功能代码的owen一起分析这个问题,最后确认正如所推测。

解题方案

换一把自定义锁,不和系统锁冲突。修改关键系统服务代码需要非常谨慎。

10         public void onChange(boolean selfChange, Uri uri) {
11             synchronized (mNewLock) {
12                 Slog.e(TAG, "mCABLService =" + mCABLService);
13                 initCABLService();

你可能感兴趣的:(【解题笔记】)