Dexopt代码修改导致的低概率Android soft watchdog timeout

现象:
使用过程中手机低概率卡住,等待30s左右死机重启,6580-O1平台.

分析:
Exception Class: SWT
Exception Type: system_server_watchdog
Current Executing Process:
system_server
Trigger time:[2018-01-01 00:16:30.483495] pid:639
Backtrace:
Process: system_server
Subject: Blocked in handler on main thread ( main)

从上面可以推测是发生了死锁循环等待的情况.
搜索SysTid=639
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x71caf810 self=0xa85dd000
| sysTid=639 nice=-2 cgrp=default sched=0/0 handle=0xac5194a4
| state=S schedstat=( 32045830300 4443774379 18427 ) utm=2769 stm=435 core=0 HZ=100
| stack=0xbe7bb000-0xbe7bd000 stackSize=8MB
| held mutexes=
kernel: (couldn't read /proc/self/task/639/stack)
native: #00 pc 00049304 /system/lib/libc.so (__ioctl+8)
native: #01 pc 0001deef /system/lib/libc.so (ioctl+38)
native: #02 pc 0004242f /system/lib/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+170)
native: #03 pc 00042de9 /system/lib/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+236)
native: #04 pc 0003d2e5 /system/lib/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+36)
native: #05 pc 000bcdad /system/lib/libandroid_runtime.so (???)
native: #06 pc 0074f305 /system/framework/arm/boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+132)
at android.os.BinderProxy.transactNative(Native method)
at android.os.BinderProxy.transact(Binder.java:764)
at android.os.IInstalld$Stub$Proxy.idmap(IInstalld.java:936)
at com.android.server.pm.Installer.idmap(Installer.java:327)
at com.android.server.om.IdmapManager.createIdmap(IdmapManager.java:62)
at com.android.server.om.OverlayManagerServiceImpl.updateState(OverlayManagerServiceImpl.java:493)
at com.android.server.om.OverlayManagerServiceImpl.updateAllOverlaysForTarget(OverlayManagerServiceImpl.java:237)
at com.android.server.om.OverlayManagerServiceImpl.onTargetPackageChanged(OverlayManagerServiceImpl.java:187)
at com.android.server.om.OverlayManagerService$PackageReceiver.onPackageChanged(OverlayManagerService.java:392)
- locked < 0x0fac5e13> (a java.lang.Object)
at com.android.server.om.OverlayManagerService$PackageReceiver. onReceive(OverlayManagerService.java:350)
at android.app.LoadedApk$ReceiverDispatcher$Args.lambda$-android_app_LoadedApk$ReceiverDispatcher$Args_53034(LoadedApk.java:1323)

然后SysTid=639线程拿住了锁 < 0x0fac5e13>,同时它是在等binder对端响应,看看binder call的信息:
proc 921
context binder
thread 921: l 10 need_return 0 tr 0
outgoing transaction 381856: d60c0900 from 921:921 to 639:2504 code 3 flags 10 pri 0:120 r1
这里表示,921进程的921线程在跟639进程的2504线程binder通信.

看看SysTid=2504的调用栈:
"Binder:639_12" prio=5 tid=106 Blocked
| group="main" sCount=1 dsCount=0 flags=1 obj=0x135c11d8 self=0x81f49a00
| sysTid=2504 nice=0 cgrp=default sched=0/0 handle=0x777c2970
| state=S schedstat=( 2680127942 4417945095 5465 ) utm=191 stm=77 core=0 HZ=100
| stack=0x776c8000-0x776ca000 stackSize=1006KB
| held mutexes=
at com.android.server.om.OverlayManagerService$1. getOverlayInfo(OverlayManagerService.java:511)
- waiting to lock < 0x0fac5e13> (a java.lang.Object) held by thread 1
at android.content.om.IOverlayManager$Stub.onTransact(IOverlayManager.java:79)
at android.os.Binder.execTransact(Binder.java:697)
SysTid=2504在等锁:< 0x0fac5e13> ,而这个锁是已经被SysTid=639拿住的.

再看SysTid=921线程的调用栈:
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x71caf810 self=0xa85dd000
| sysTid=921 nice=0 cgrp=default sched=0/0 handle=0xac5194a4
| state=S schedstat=( 10855603299 11998377296 23241 ) utm=760 stm=325 core=1 HZ=100
| stack=0xbe7bb000-0xbe7bd000 stackSize=8MB
| held mutexes=
kernel: (couldn't read /proc/self/task/921/stack)
native: #00 pc 00049304 /system/lib/libc.so (__ioctl+8)
native: #01 pc 0001deef /system/lib/libc.so (ioctl+38)
native: #02 pc 0004242f /system/lib/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+170)
native: #03 pc 00042de9 /system/lib/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+236)
native: #04 pc 0003d2e5 /system/lib/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+36)
native: #05 pc 000bcdad /system/lib/libandroid_runtime.so (???)
native: #06 pc 0074f305 /system/framework/arm/boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+132)
at android.os.BinderProxy.transactNative(Native method)
at android.os.BinderProxy.transact(Binder.java:764)
at android.content.om.IOverlayManager$Stub$Proxy. getOverlayInfo(IOverlayManager.java:254)
at com.android.systemui.statusbar.phone.StatusBar.isUsingDarkTheme(unavailable:-1)
at com.android.systemui.statusbar.phone.StatusBar.updateTheme(unavailable:-1)
at com.android.systemui.statusbar.phone.StatusBar.onColorsChanged(unavailable:-1)
at com.android.internal.colorextraction.ColorExtractor.triggerColorsChanged(ColorExtractor.java:186)
at com.android.systemui.colorextraction.SysuiColorExtractor.setWallpaperVisible(unavailable:-1)
at com.android.systemui.colorextraction.SysuiColorExtractor$1.lambda$-com_android_systemui_colorextraction_SysuiColorExtractor$1_3105(unavailable:-1)
at com.android.systemui.colorextraction.-$Lambda$j2m7lOWVNe22BvvVwNuW1ftTq4c.$m$0(unavailable:-1)
at com.android.systemui.colorextraction.-$Lambda$j2m7lOWVNe22BvvVwNuW1ftTq4c.run(unavailable:-1)

从上面的几个线程backtrace可以看出一些端倪,从这里就可以大致撸下流程:
SysTid=921线程(systemui)向 SysTid=2504(system_server的binder线程)发起binder请求,但是在等待锁< 0x0fac5e13> ,一直没拿到,陷入睡眠(S),
而这个锁< 0x0fac5e13> 是被SysTid=639拿住了,搜索kernel log中关于639相关的log:
Line 1959: [ 539.580770] (0)[5548:kworker/u8:7]binder: release 639: 656 transaction 366283 out, still active
Line 1960: [ 539.580952] (0)[5548:kworker/u8:7]binder: release 639: 2504 transaction 381856 in, still active
这个是binder异常才会打印的日子,结合上下文看,这个656线程嫌疑很大,

查看SysTid=656的调用栈:
"ActivityManager: dexopt" prio=5 tid=14 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x13180a28 self=0x9ef95a00
| sysTid=656 nice=10 cgrp=default sched=0/0 handle=0x919b4970
| state=S schedstat=( 59861611 44158081 88 ) utm=3 stm=2 core=0 HZ=100
| stack=0x918b2000-0x918b4000 stackSize=1038KB
| held mutexes=
kernel: (couldn't read /proc/self/task/656/stack)
native: #00 pc 00049304 /system/lib/libc.so (__ioctl+8)
native: #01 pc 0001deef /system/lib/libc.so (ioctl+38)
native: #02 pc 0004242f /system/lib/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+170)
native: #03 pc 00042de9 /system/lib/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+236)
native: #04 pc 0003d2e5 /system/lib/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+36)
native: #05 pc 000bcdad /system/lib/libandroid_runtime.so (???)
native: #06 pc 0074f305 /system/framework/arm/boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+132)
at android.os.BinderProxy.transactNative(Native method)
at android.os.BinderProxy.transact(Binder.java:764)
at android.os.IInstalld$Stub$Proxy.dexopt(IInstalld.java:814)
at com.android.server.pm.Installer.dexopt(Installer.java:287)
at com.android.server.pm.PackageDexOptimizer.dexOptPath(PackageDexOptimizer.java:263)
at com.android.server.pm.PackageDexOptimizer.performDexOptLI(PackageDexOptimizer.java:214)
at com.android.server.pm.PackageDexOptimizer.performDexOpt(PackageDexOptimizer.java:130)
- locked <0x0f25597c> (a java.lang.Object)
at com.android.server.pm.PackageManagerService.performDexOptInternalWithDependenciesLI(PackageManagerService.java:10185)
at com.android.server.pm.PackageManagerService.performDexOptInternal(PackageManagerService.java:10137)
- locked <0x0f25597c> (a java.lang.Object)
at com.android.server.pm.PackageManagerService.performDexOptTraced(PackageManagerService.java:10115)
at com.android.server.pm.PackageManagerService.performDexOptWithStatus(PackageManagerService.java:10107)
at com.android.server.pm.BackgroundDexOptService.optimizePackages(BackgroundDexOptService.java:345)
at com.android.server.pm.BackgroundDexOptService.idleOptimization(BackgroundDexOptService.java:265)
at com.android.server.pm.BackgroundDexOptService. runIdleOptimizationsForFirstStart(BackgroundDexOptService.java:424)
at com.android.server.pm.PackageManagerService. runBackgroundDexoptJobByPackageNameInternal(PackageManagerService.java:10234)
at com.android.server.pm.PackageManagerService$PackageManagerInternalImpl. runBackgroundDexoptJobByPackageName(PackageManagerService.java:25460)
at com.android.server.am.ActivityManagerService$DexoptHandler.handleMessage(ActivityManagerService.java:1858)
at android.os.Handler.dispatchMessage(Handler.java:106)

这里发现656也是在等binder,看看kernel log被阻塞的D态进程是否异常:
Line 3: [ 537.143997] (1)[1016:watchdog]fuse_log D c0a41cf4 0 72 2 0x00000000
Line 9: [ 537.144128] (1)[1016:watchdog]hps_main D c0a41cf4 0 94 2 0x00000000
Line 19: [ 537.144333] (1)[1016:watchdog]ddp_irq_log_kth D c0a41cf4 0 129 2 0x00000000
Line 25: [ 537.144443] (1)[1016:watchdog]display_esd_che D c0a41cf4 0 130 2 0x00000000
Line 37: [ 537.144686] (1)[1016:watchdog]decouple_trigge D c0a41cf4 0 133 2 0x00000000
Line 45: [ 537.144845] (1)[1016:watchdog]disp_idlemgr D c0a41cf4 0 137 2 0x00000000
Line 57: [ 537.145104] (1)[1016:watchdog]hang_detect D c0a41cf4 0 145 2 0x00000000
Line 69: [ 537.145352] (1)[1016:watchdog]bat_thread_kthr D c0a41cf4 0 174 2 0x00000000
逐一分析调用栈,没有发现明显异常,推断问题应该不在kernel 层卡住, 很遗憾,没有找到656在等哪个线程,

重点仔细分析656线程的调用栈,从上下文可以看出这个656线程是在执行 dexopt相关操作,
而从639的调用栈上下文可以看到:
at com.android.server.om.OverlayManagerService$PackageReceiver.onReceive(OverlayManagerService.java:350)
323 public void onReceive(@NonNull final Context context, @NonNull final Intent intent) {
...
349 case ACTION_PACKAGE_CHANGED:
350 onPackageChanged(packageName, userIds);
351 break;
这里是有收到package change的操作也是需要执行dexopt操作,所以可以证明上面的推测是正确的:
thread 921 wait =》 thread 2504
thread 2504 wait =》 thread 639
thread 639 wait =》 one binder thread...?
thread 656 wait =》one binder thread...?

这里等待关系无法进一步梳理下去了,可能是日志不全的原因,只能分析现有的调用栈了,
从656线程的调用栈可以看到一丝端倪:
at com.android.server.pm.PackageManagerService. runBackgroundDexoptJobByPackageNameInternal(PackageManagerService.java:10234)
at com.android.server.pm.PackageManagerService$PackageManagerInternalImpl. runBackgroundDexoptJobByPackageName(PackageManagerService.java:25460)
查看这个文件代码改动记录:
Dexopt代码修改导致的低概率Android soft watchdog timeout_第1张图片
发现这个函数不是平台基线代码源生的,而是新加的,接下来就是重点分析这快代码.



你可能感兴趣的:(【解题笔记】,【系统异常分析】)