问题描述:
步骤: 进入设置,点击显示,点击互动品保,在互动屏保中选择万花筒后,点击立即启动
实际结果: 出现一次手机死机现象
时间:大概在14:25左右
复现概率: >1%
拿到log解开查看: SWT,868,-1361051648,99,/data/core/,0,system_server_watchdog,system_server,Fri Jan 1 16:53:44 CST 2016,1
=》 SWT 异常,第一印象,system_server 进程被阻塞了导致软件看门狗异常了,解题关键在于找到循环等待的关键信息,
下面进入解题分析:
1 Current Executing Process: 2 system_server 3 4 Backtrace: 5 Process: system_server 6 Subject: Blocked in monitor com.android.server.am.ActivityManagerService on foreground thread (android.fg), Blocked in handler on main thread (main), Blocked in handler on ui thread (android.ui), Blocked in handler on display thread (android.display), Blocked in handler on ActivityManager (ActivityManager), Blocked in handler on PowerManagerService (PowerManagerService)
==》
核心进程system_server 最终是被blocked在 PowerManagerService某个thread,追踪 PowerManagerService 相关的log:
1 pc 0001a34b /system/lib/libc.so (epoll_pwait+26) 2 native: #02 pc 0001a359 /system/lib/libc.so (epoll_wait+6) 3 native: #03 pc 00012b6b /system/lib/libutils.so (_ZN7android6Looper9pollInnerEi+102) 4 native: #04 pc 00012deb /system/lib/libutils.so (_ZN7android6Looper8pollOnceEiPiS1_PPv+130) 5 native: #05 pc 00084039 /system/lib/libandroid_runtime.so (_ZN7android18NativeMessageQueue8pollOnceEP7_JNIEnvP8_jobjecti+22) 6 native: #06 pc 00000585 /system/framework/arm/boot.oat (Java_android_os_MessageQueue_nativePollOnce__JI+96) 7 at android.os.MessageQueue.nativePollOnce(Native method) 8 at android.os.MessageQueue.next(MessageQueue.java:328) 9 at android.os.Looper.loop(Looper.java:164) 10 at android.os.HandlerThread.run(HandlerThread.java:61) 11 12 "PowerManagerService" prio=5 tid=20 Blocked 13 | group="main" sCount=1 dsCount=0 obj=0x12e98660 self=0xab3f5300 14 | sysTid=904 nice=-4 cgrp=default sched=0/0 handle=0x9fb26930 15 | state=S schedstat=( 1793024626 1555176267 7045 ) utm=97 stm=82 core=1 HZ=100 16 | stack=0x9fa24000-0x9fa26000 stackSize=1038KB 17 | held mutexes= 18 at com.android.server.am.ActivityManagerService.bindService(ActivityManagerService.java:17121) 19 - waiting to lock <0x0bb97820> (a com.android.server.am.ActivityManagerService) held by thread 13 20 at android.app.ContextImpl.bindServiceCommon(ContextImpl.java:1322) 21 at android.app.ContextImpl.bindService(ContextImpl.java:1291) 22 at com.android.server.power.PowerManagerService$CABLSettingObserver.initCABLService(PowerManagerService.java:3373) 23 at com.android.server.power.PowerManagerService$CABLSettingObserver.onChange(PowerManagerService.java:3381) 24 - locked <0x0ec53695> (a java.lang.Object) 25 at android.database.ContentObserver.onChange(ContentObserver.java:145) 26 at android.database.ContentObserver$NotificationRunnable.run(ContentObserver.java:216) 27 at android.os.Handler.handleCallback(Handler.java:815) 28 at android.os.Handler.dispatchMessage(Handler.java:104) 29 at android.os.Looper.loop(Looper.java:207) 30 at android.os.HandlerThread.run(HandlerThread.java:61) 31 at com.android.server.ServiceThread.run(ServiceThread.java:46)
注意这行:
- waiting to lock <0x0bb97820> (a com.android.server.am.ActivityManagerService) held by thread 13
被tid=13的线程给block住了,搜索log中 tid=13 的线程:
1 "android.ui" prio=5 tid=13 Blocked 2 | group="main" sCount=1 dsCount=0 obj=0x12cb8d60 self=0xab3f3000 3 | sysTid=897 nice=-2 cgrp=default sched=0/0 handle=0xa02fd930 4 | state=S schedstat=( 1265405032 780240593 4053 ) utm=80 stm=46 core=0 HZ=100 5 | stack=0xa01fb000-0xa01fd000 stackSize=1038KB 6 | held mutexes= 7 at com.android.server.power.PowerManagerService.acquireWakeLockInternal(PowerManagerService.java:1001) 8 - waiting to lock <0x0ec53695> (a java.lang.Object) held by thread 20 9 at com.android.server.power.PowerManagerService.-wrap10(PowerManagerService.java:-1) 10 at com.android.server.power.PowerManagerService$BinderService.acquireWakeLock(PowerManagerService.java:3673) 11 at android.os.PowerManager$WakeLock.acquireLocked(PowerManager.java:1212) 12 at android.os.PowerManager$WakeLock.acquire(PowerManager.java:1180) 13 - locked <0x050021e7> (a android.os.Binder) 14 at com.android.server.am.ActivityStackSupervisor.goingToSleepLocked(ActivityStackSupervisor.java:3473) 15 at com.android.server.am.ActivityManagerService.updateSleepIfNeededLocked(ActivityManagerService.java:11210) 16 at com.android.server.am.ActivityManagerService$LocalService.acquireSleepToken(ActivityManagerService.java:22043) 17 - locked <0x0bb97820> (a com.android.server.am.ActivityManagerService) 18 at com.android.server.policy.PhoneWindowManager.updateDreamingSleepToken(PhoneWindowManager.java:7163) 19 at com.android.server.policy.PhoneWindowManager.-wrap19(PhoneWindowManager.java:-1) 20 at com.android.server.policy.PhoneWindowManager$PolicyHandler.handleMessage(PhoneWindowManager.java:747)
注意这行:
- waiting to lock <0x0ec53695> (a java.lang.Object) held by thread 20
被tid=20的线程block住了,
结合上面log发现是电源管理服务进程中的 tid=13的thread和tid=20的thread发生了循环等待,==》 发生死锁!
追踪源码,看tid=13的线程代码在干什么:
7 at com.android.server.power.PowerManagerService.acquireWakeLockInternal(PowerManagerService.java:1001)
1 private void acquireWakeLockInternal(IBinder lock, int flags, String tag, String packageName, 2 WorkSource ws, String historyTag, int uid, int pid) { 3 synchronized (mLock) { 4 if (DEBUG_SPEW) { 5 Slog.d(TAG, "acquireWakeLockInternal: lock=" + Objects.hashCode(lock) 6 + ", flags=0x" + Integer.toHexString(flags) 7 + ", tag=\"" + tag + "\", ws=" + ws + ", uid=" + uid + ", pid=" + pid); 8 }
是在等mLock这把锁,这块代码是Android 源生代码,一般来说源生代码获取锁释放锁都会比较谨慎,应该不会导致死锁的问题.
。。。
继续分析:
这把锁目前被 tid=20 的线程拿住了,找到源代码,然后看看这段代码干了什么:
22 at com.android.server.power.PowerManagerService$CABLSettingObserver.initCABLService(PowerManagerService.java:3373)
23 at com.android.server.power.PowerManagerService$CABLSettingObserver.onChange(PowerManagerService.java:3381)
1 private void initCABLService(){ 2 if(null == mCABLService){ 3 mCABLServiceConn = new CABLServiceConnection(); 4 Intent i = new Intent(ICABLService.class.getName()).setPackage("com.android.cabl"); 5 mContext.bindService(i, mCABLServiceConn, Context.BIND_AUTO_CREATE); 6 resolver = mContext.getContentResolver(); 7 } 8 } 9 10 public void onChange(boolean selfChange, Uri uri) { 11 synchronized (mLock) { 12 Slog.e(TAG, "mCABLService =" + mCABLService); 13 initCABLService(); 14 if(null != mCABLService){ 15 boolean isON = (1 == Settings.System.getInt(resolver, Settings.System.CABL_CONTROL, 0)); 16 try{ 17 Slog.e(TAG, "isON = " + isON); 18 if(isON){ 19 mCABLService.control(CABL_CON_TYPE_ENABLE); 20 }else{ 21 mCABLService.control(CABL_CON_TYPE_DISABLE); 22 } 23 }catch(RemoteException e){ 24 } 25 } 26 } 27 }
mLock 是源生代码系统的锁,而上面这段代码也用了这把锁,不过发现这段代码是别人新加CBAC功能的代码,所以很可能问题就出在这里!
然后拉上CBAC功能代码的owen一起分析这个问题,最后确认正如所推测。
解题方案:
换一把自定义锁,不和系统锁冲突。修改关键系统服务代码需要非常谨慎。
10 public void onChange(boolean selfChange, Uri uri) { 11 synchronized (mNewLock) { 12 Slog.e(TAG, "mCABLService =" + mCABLService); 13 initCABLService();