背景:
bug# 813706
日志路径:Z:\stability_doc\case\binderfull_getcontentprovider
systemui,settings等一直anr导致系统无法正常运行。
通过查看trace确定systemui anr的直接原因是由于SS的binder线程被占满且都在getContentProviderImpl时wait,无法及时响应
----- pid 1863 at 2018-08-06 19:57:10 -----
Cmd line: com.android.systemui
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x72c33e38 self=0x78a64c2a00
| sysTid=1863 nice=0 cgrp=default sched=0/0 handle=0x792b5f49a8
| state=S schedstat=( 2587940558080 1667956760666 8922333 ) utm=201251 stm=57543 core=7 HZ=100
| stack=0x7fd02c9000-0x7fd02cb000 stackSize=8MB
| held mutexes=
kernel: (couldn't read /proc/self/task/1863/stack)
native: #00 pc 000000000007a2c4 /system/lib64/libc.so (__ioctl+4)
native: #01 pc 0000000000027a34 /system/lib64/libc.so (ioctl+132)
native: #02 pc 0000000000058458 /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+252)
native: #03 pc 000000000005912c /system/lib64/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+60)
native: #04 pc 0000000000058f68 /system/lib64/libbinder.so (android::IPCThreadState::transact(int, unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+216)
native: #05 pc 000000000004f3ec /system/lib64/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+72)
native: #06 pc 0000000000127890 /system/lib64/libandroid_runtime.so (???)
native: #07 pc 000000000093b1e4 /system/framework/arm64/boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+196)
at android.os.BinderProxy.transactNative(Native method)
at android.os.BinderProxy.transact(Binder.java:769)
at android.app.IActivityManager$Stub$Proxy.getCurrentUser(IActivityManager.java:7528)
at android.app.ActivityManager.getCurrentUser(ActivityManager.java:4081)
at com.android.systemui.statusbar.policy.Clock.getSmallTime(Clock.java:216)
at com.android.systemui.statusbar.policy.Clock.updateClock(Clock.java:173)
at com.android.systemui.statusbar.policy.Clock.onTimeTick(Clock.java:369)
----- pid 1391 at 2018-08-06 19:57:10 -----
Cmd line: system_server
"Binder:1391_1" prio=5 tid=7 TimedWaiting
| group="main" sCount=1 dsCount=0 flags=1 obj=0x136803e8 self=0x78a65a6800
| sysTid=1401 nice=0 cgrp=default sched=0/0 handle=0x7889d814f0
| state=S schedstat=( 629918502103 239344250787 1866819 ) utm=45619 stm=17372 core=6 HZ=100
| stack=0x7889c87000-0x7889c89000 stackSize=1005KB
| held mutexes=
at java.lang.Object.wait(Native method)
- waiting on <0x02fd645a> (a com.android.server.am.ContentProviderRecord)
at java.lang.Object.wait(Object.java:422)
at com.android.server.am.ActivityManagerService.getContentProviderImpl(ActivityManagerService.java:12588)
- locked <0x02fd645a> (a com.android.server.am.ContentProviderRecord)
at com.android.server.am.ActivityManagerService.getContentProvider(ActivityManagerService.java:12663)
at android.app.IActivityManager$Stub.onTransact(IActivityManager.java:439)
at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:3132)
at android.os.Binder.execTransact(Binder.java:702)
本文重点记录getContentProviderImpl wait被占满的调查过程:
08-06 19:57:10.041 1391 1405 I am_anr : [0,1863,com.android.systemui,952647181, }]
最早发生systemui anr的时间是08-06 19:57:10,这56分的时候出现com.android.providers.calendar进程启动失败的日志。
// 启动com.android.providers.calendar进程且fork进程成功
08-06 19:56:23.777 8600 8600 I Zygote : ForkAndSpecializeCommon: fork is finished, pid=0
08-06 19:56:23.781 1391 1401 I am_proc_start: [0,8600,10007,com.android.providers.calendar,content provider,com.android.providers.calendar/.CalendarProvider2]
// com.android.providers.calendar 8600进程启动超时并kill改进程。
08-06 19:56:33.788 1391 1405 I am_process_start_timeout: [0,8600,10007,com.android.providers.calendar]
08-06 19:56:33.789 1391 1405 I am_kill : [0,8600,com.android.providers.calendar,-10000,start timeout]
//下面code说明ams必须收到calendar provider的attachApplicationLocked回调,才会remove
PROC_START_TIMEOUT_MSG,启动成功。
//从上面fork的日志来看calendar provider进程已经启动成功了,但是ams仍然打印timeout日志,说明ams并没有接收到8600的回调。继续调查原因。
public void handleMessage(Message msg) {
//...
case PROC_START_TIMEOUT_MSG: {
ProcessRecord app = (ProcessRecord)msg.obj;
synchronized (ActivityManagerService.this) {
processStartTimedOutLocked(app);
}
} break;
//...
}
private final boolean attachApplicationLocked(IApplicationThread thread, int pid) {
mHandler.removeMessages(PROC_START_TIMEOUT_MSG, app);
}
// getprovider的binder线程 发现launchapp为null返回,13379 2072 这些均为binder线程,熟了下一共32条日志刚好是ss的最大binder线程数。 这意味这什么呢?请接下来看代码解析
Line 125379: 08-06 19:56:33.789 1391 13379 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125381: 08-06 19:56:33.789 1391 2072 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125385: 08-06 19:56:33.790 1391 2756 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125387: 08-06 19:56:33.790 1391 2804 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125388: 08-06 19:56:33.790 1391 2045 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125390: 08-06 19:56:33.790 1391 2757 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125393: 08-06 19:56:33.790 1391 2075 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125395: 08-06 19:56:33.790 1391 15409 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125396: 08-06 19:56:33.791 1391 2044 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125397: 08-06 19:56:33.791 1391 2803 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125398: 08-06 19:56:33.791 1391 13449 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125399: 08-06 19:56:33.791 1391 2129 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125401: 08-06 19:56:33.792 1391 2799 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125408: 08-06 19:56:33.792 1391 2805 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125410: 08-06 19:56:33.792 1391 2831 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125411: 08-06 19:56:33.792 1391 18489 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125412: 08-06 19:56:33.793 1391 2834 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125415: 08-06 19:56:33.793 1391 2824 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125418: 08-06 19:56:33.793 1391 5968 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125419: 08-06 19:56:33.793 1391 1809 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125420: 08-06 19:56:33.793 1391 2718 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125421: 08-06 19:56:33.793 1391 8866 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125422: 08-06 19:56:33.794 1391 2729 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125424: 08-06 19:56:33.794 1391 15408 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125425: 08-06 19:56:33.794 1391 1873 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125427: 08-06 19:56:33.794 1391 1401 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125429: 08-06 19:56:33.794 1391 2719 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125430: 08-06 19:56:33.794 1391 1402 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125431: 08-06 19:56:33.794 1391 2835 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125432: 08-06 19:56:33.795 1391 2832 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125433: 08-06 19:56:33.795 1391 6326 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
Line 125434: 08-06 19:56:33.795 1391 2800 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
private ContentProviderHolder getContentProviderImpl(IApplicationThread caller,
String name, IBinder token, boolean stable, int userId) {
proc = startProcessLocked(cpi.processName,
cpr.appInfo, false, 0, "content provider",
new ComponentName(cpi.applicationInfo.packageName,
cpi.name), false, false, false);
while (cpr.provider == null) {
if (cpr.launchingApp == null) {
Slog.w(TAG, "Unable to launch app "
+ cpi.applicationInfo.packageName + "/"
+ cpi.applicationInfo.uid + " for provider "
+ name + ": launching app became null");
return null;
}
cpr.wait(waitTimeout);
}
}
打印上面日志的地方在getContentProviderImpl方法里,cpr.launchingApp为null的时候
下面代码执行是又某应用进程(此处是calendar)发起,设想执行流程应该是这样的
1 第一个访问calendar provider的应用向ams发起getContentProvider的请求,ams发现proc 为空则去启动com.android.providers.calendar进程,且cpr wait ,该binder线程被hang住。
2 接着ams等待com.android.providers.calendar启动完成的attachApplicationLocked回调
3 然后ams 又收到31个calendar的binder请求均被 wait住了
4 calendar provider 的process启动完成回调发现所有binder被hang住,ss没有多余的binder线程可以分配.等待10秒后超时
5 08-06 19:56:33 ,10秒超时后回调cpr.notifyall被调用, 上述代码里的while循环被继续执行,发现cpr.launchingApp则返回。所有binder线程被释放,也就是上面的" Unable to launch app "的日志打印
6 接着又从一开始循环执行到第5步,至此发生了systemui等应用anr。
Line 15194: 08-06 16:58:42.550 1391 1405 I am_process_start_timeout: [0,14150,10007,com.android.providers.calendar]
Line 87516: 08-06 19:56:33.788 1391 1405 I am_process_start_timeout: [0,8600,10007,com.android.providers.calendar]
Line 87559: 08-06 19:56:43.810 1391 1405 I am_process_start_timeout: [0,8621,10007,com.android.providers.calendar]
Line 87597: 08-06 19:56:53.854 1391 1405 I am_process_start_timeout: [0,8636,10007,com.android.providers.calendar]
Line 87642: 08-06 19:57:03.877 1391 1405 I am_process_start_timeout: [0,8659,10007,com.android.providers.calendar]
Line 87684: 08-06 19:57:13.894 1391 1405 I am_process_start_timeout: [0,8676,10007,com.android.providers.calendar]
Line 87728: 08-06 19:57:26.282 1391 1405 I am_process_start_timeout: [0,8720,10007,com.android.providers.calendar]
Line 87788: 08-06 19:57:38.406 1391 1405 I am_process_start_timeout: [0,8761,10007,com.android.providers.calendar]