问题描述
在L版本上安装一个apk
移动到sd卡上
升级系统到M版本
打开应用(必要操作)
卸载
就出现了重启
系统被杀掉了
自己写一个ap,也会出现这个问题
在应用信息中进行move操作
在Android 5系统中,应用移动到SD卡后,应用apk被挂载到/mnt/asec目录中
这个复现步骤很繁琐,随着sd卡的消退,出现该问题的几率很小
但是,为了追求完美,我们还是要去分析
查看log
05-0109:48:27.582086 1163 1192 I MountService: unmountSecureContainer,id=com.UCMobile-2, force=true
05-0109:48:27.672052 1163 1192 D VoldConnector: SND -> {42 asec unmountcom.UCMobile-2 force}
05-0109:48:27.672459 438 449 D FrameworkListener: dispatchCommanddata = (42 asec unmount com.UCMobile-2 force)
05-0109:48:27.672553 438 449 D VoldCmdListener: asec unmountcom.UCMobile-2 force
05-0109:48:27.673646 438 449 W Vold : com.UCMobile-2 unmount attempt 1 failed (Device or resource busy)
05-0109:48:28.349615 438 449 E ProcessKiller: Process system_server(1163) has open file /mnt/asec/com.UCMobile-2/base.apk
05-0109:48:29.870884 438 449 W Vold : com.UCMobile-2 unmount attempt 2 failed (Device or resource busy)
05-0109:48:30.681231 438 449 E ProcessKiller: Process system_server(1163) has open file /mnt/asec/com.UCMobile-2/base.apk
05-0109:48:32.243998 438 449 W Vold : com.UCMobile-2 unmount attempt 3 failed (Device or resource busy)
05-0109:48:33.041623 438 449 E ProcessKiller: Process system_server(1163) has open file /mnt/asec/com.UCMobile-2/base.apk
05-0109:48:33.045726 438 449 W ProcessKiller: SendingTerminated to process 1163
结合上面的log和相关代码,我们可以分析出发生重启的大概流程。在这一次的执行中,vold进程的pid是438,system_server进程的pid是1163,卸载安装在SD卡上的应用com.UCMobile时,system_server通过socket方式向vold进程发送命令{42 asecunmount com.UCMobile-2 force},vold进程接收到该指令后,对/mnt/asec/com.UCMobile-2进行卸载操作(调用系统umount方法),但由于/mnt/asec/com.UCMobile-2被system_server进程占用(Device orresource busy),导致umount操作失败,在连续3次umount操作都失败的时候,vold通过查看系统/proc遍历得到占用了/mnt/asec/com.UCMobile-2资源的进程,然后向这些进程发送SIGTERM(15)信号甚至是SIGKILL(9)信号,不幸的是,system_server由于打开了/mnt/asec/com.UCMobile-2/base.apk而名在kill名单之中,于是system_server被无情的杀掉,接着系统便发生了重启。这是一个忧伤的故事。
根据该log,我们查找到了
android/system/vold/Process.cpp,ProcessKiller的log是从这里打印出来的
void Process::killProcessesWithOpenFiles(const char *path, int signal)
我们来看看system_server是怎样被kill掉的
调用到
/system/vold/VolumeManager.cpp
#define UNMOUNT_RETRIES 5
#define UNMOUNT_SLEEP_BETWEEN_RETRY_MS (1000 * 1000)
//这里传入的id为com.UCMobile-2
int VolumeManager::unmountAsec(const char *id, bool force) {
char asecFileName[255];
char mountPoint[255];
if (!isLegalAsecId(id)) {
SLOGE("unmountAsec: Invalid asec id \"%s\"", id);
errno = EINVAL;
return -1;
}
if (findAsec(id, asecFileName, sizeof(asecFileName))) {
SLOGE("Couldn't find ASEC %s", id);
return -1;
}
//有定义const char *VolumeManager::ASECDIR = "/mnt/asec";
int written = snprintf(mountPoint, sizeof(mountPoint), "%s/%s", VolumeManager::ASECDIR, id);
if ((written < 0) || (size_t(written) >= sizeof(mountPoint))) {
SLOGE("ASEC unmount failed for %s: couldn't construct mountpoint", id);
return -1;
}
char idHash[33];
if (!asecHash(id, idHash, sizeof(idHash))) {
SLOGE("Hash of '%s' failed (%s)", id, strerror(errno));
return -1;
}
//进行卸载操作
return unmountLoopImage(id, idHash, asecFileName, mountPoint, force);
}
//这里传入的id为com.UCMobile-2, mountPoint为/mnt/asec/com.UCMobile-2
int VolumeManager::unmountLoopImage(const char *id, const char *idHash,
const char *fileName, const char *mountPoint, bool force) {
if (!isMountpointMounted(mountPoint)) {
SLOGE("Unmount request for %s when not mounted", id);
errno = ENOENT;
return -1;
}
int i, rc;
//循环操作,如果umount操作成功,退出循环
for (i = 1; i <= UNMOUNT_RETRIES; i++) {
//调用umount方法进行unmount操作,出错编号保存在errno中
//umount成功,则返回值为0
rc = umount(mountPoint);
//如果rc为0,表示umount成功,退出循环
if (!rc) {
break;
}
if (rc && (errno == EINVAL || errno == ENOENT)) {
SLOGI("Container %s unmounted OK", id);
rc = 0;
break;
}
//umount调用异常,打印出失败原因
SLOGW("%s unmount attempt %d failed (%s)",
id, i, strerror(errno));
int signal = 0; // default is to just complain
if (force) {
//i为4,5时。即如果SIGTERM信号不足以kill掉目标进程,改用大杀器SIGKILL来kill
if (i > (UNMOUNT_RETRIES - 2))
signal = SIGKILL;
//i为3时
else if (i > (UNMOUNT_RETRIES - 3))
signal = SIGTERM;
}
//杀进程处理
Process::killProcessesWithOpenFiles(mountPoint, signal);
//sleep 1秒,
usleep(UNMOUNT_SLEEP_BETWEEN_RETRY_MS);
}
//省略部分代码
… …
}
system/vold/Process.cpp
//kill掉打开文件的进程
//在这个例子中path为/mnt/asec/com.UCMobile-2
voidProcess::killProcessesWithOpenFiles(const char *path, int signal) {
DIR* dir;
struct dirent* de;
//打开文件/proc,如果打开失败,return
if (!(dir = opendir("/proc"))) {
SLOGE("opendir failed (%s)",strerror(errno));
return;
}
//遍历文件
while ((de = readdir(dir))) {
//根据文件名获取进程号,如果文件名不是数字形式的pid,返回-1
int pid = getPid(de->d_name);
char name[PATH_MAX];
if (pid == -1)
continue;
//根据pid得到进程名,写入到name中
getProcessName(pid, name, sizeof(name));
char openfile[PATH_MAX];
//查看/proc/
if (checkFileDescriptorSymLinks(pid,path, openfile, sizeof(openfile))) {
SLOGE("Process %s (%d) hasopen file %s", name, pid, openfile);
} else if (checkFileMaps(pid, path,openfile, sizeof(openfile))) {
//检查文件/proc/
SLOGE("Process %s (%d) hasopen filemap for %s", name, pid, openfile);
} else if (checkSymLink(pid, path,"cwd")) {
//检查/proc/
SLOGE("Process %s (%d) has cwdwithin %s", name, pid, path);
} else if (checkSymLink(pid, path,"root")) {
//检查/proc/
SLOGE("Process %s (%d) haschroot within %s", name, pid, path);
} else if (checkSymLink(pid, path,"exe")) {
//检查/proc/
SLOGE("Process %s (%d) hasexecutable path within %s", name, pid, path);
//没有找到,查看下一个文件
} else {
continue;
}
if (signal != 0) {
SLOGW("Sending %s to process%d", strsignal(signal), pid);
//发送信号给指定进程
kill(pid, signal);
}
}
closedir(dir);
}
killProcessesWithOpenFiles方法把占用了/mnt/asec/com.UCMobile-2的进程都给kill掉了。其本意是关闭掉占用资源的进程,但是却把system_server给kill掉了,导致系统发生重启。
屏蔽向system_server发送信号,不让vold进程把system_server杀掉。
void Process::killProcessesWithOpenFiles(constchar *path, int signal) {
DIR* dir;
struct dirent* de;
if (!(dir = opendir("/proc"))) {
SLOGE("opendir failed (%s)", strerror(errno));
return;
}
while ((de = readdir(dir))) {
int pid = getPid(de->d_name);
char name[PATH_MAX];
if (pid == -1)
continue;
getProcessName(pid, name, sizeof(name));
char openfile[PATH_MAX];
// openfile用来存取得到的文件名
if (checkFileDescriptorSymLinks(pid, path, openfile,sizeof(openfile))) {
SLOGE("Process %s (%d) has open file %s, path=[%s]", name,pid, openfile, path);
} else if (checkFileMaps(pid, path, openfile, sizeof(openfile))) {
SLOGE("Process %s (%d)has open filemap for %s", name, pid, openfile);
} else if (checkSymLink(pid, path, "cwd")) {
SLOGE("Process %s (%d) has cwd within %s", name, pid, path);
} else if (checkSymLink(pid, path, "root")) {
SLOGE("Process %s (%d) has chroot within %s", name, pid,path);
} else if (checkSymLink(pid, path, "exe")) {
SLOGE("Process %s (%d) has executable path within %s", name,pid, path);
} else {
continue;
}
if (signal != 0) {
SLOGW("Sending %s to process %d", strsignal(signal), pid);
//test
if(strcmp(name, "system_server") == 0)
{
SLOGW("do not killsystem_server, skip");
continue;
}
kill(pid, signal);
}
}
closedir(dir);
}
在我分析这个问题的时候,我对应用的打开流程并不熟悉,为了快速的了解base.apk是在哪里进行加载的,我考虑删除掉base.apk进行探索查看
在手机上安装一个UC浏览器进行测试,root手机后,删掉其对应的base.apk,/data/app/com.UCMobile-1/base.apk,然后在桌面上点击UC的图标,会出现错误提示,不能打开该应用。
抓取相关log
01-02 01:13:58.043: I/ActivityManager(1062): START u0{act=android.intent.action.MAIN cat=[android.intent.category.LAUNCHER]flg=0x10200000 cmp=com.UCMobile/.main.UCMobile (has extras)} from uid 10009from pid 1705 on display 0
01-02 01:13:58.044: W/asset(1062):Asset path /data/app/com.UCMobile-1/base.apk is neither a directory nor file(type=1).
01-02 01:13:58.045: V/WindowManager(1062): Looking for focus: 10 =Window{c3f667e u0 StatusBar}, flags=-2122055608, canReceive=false
01-02 01:13:58.045: V/WindowManager(1062): findFocusedWindow: Found newfocus @ 1 = Window{9f6bd81 u0com.android.launcher3/com.android.launcher3.Launcher}
01-02 01:13:58.047: I/BufferQueueProducer(396): [com.android.launcher3/com.android.launcher3.Launcher](this:0x7f8baca000,id:81,api:1,p:1705,c:396)queueBuffer: fps=0.10 dur=10490.40 max=10490.40 min=10490.40
01-02 01:13:58.050: W/asset(1062):Asset path /data/app/com.UCMobile-1/base.apk is neither a directory nor file(type=1).
01-02 01:13:58.052: D/PowerManagerService(1062):acquireWakeLockInternal: lock=170577702, flags=0x1, tag="*launch*",ws=WorkSource{10072}, uid=1000, pid=1062
根据该log里的关键词,找到frameworks/base/libs/androidfw/AssetManager.cpp的addAssetPath方法。接下来我们就可以有的放矢来查看AssetManager了。
AssetManager是Android系统中的资源管理器,用来加载和管理APK文件里的数据资源。在应用开发中,就可以使用AssetManager去访问assets目录中的资源:
//获取AssetManager
AssetManager am = getApplicationContext().getAssets();
//获取assets目录下的文件列表
String[] filePathList =am.list("");
//打开assets目录中的文件,如test.txt
InputStream inStream = am.open("test.txt");
从上面的log中我们看到,AssetManager试图读取/data/app/com.UCMobile-1/base.apk这样的apk文件来加载资源,base.apk是原始apk安装包的一个复制文件,如果将其从手机中pull出来,是可以进行安装使用的。有时候我们可以通过获取base.apk来得到应用安装包。
在Java层有和AssetManager.cpp相对应的AssetManager.java文件,Java层的AssetManager提供对外的访问接口,主要功能由C++层的AssetManager来实现。
frameworks/base/libs/androidfw/AssetManager.cpp
frameworks/base/core/java/android/content/res/AssetManager.java
frameworks/base/core/jni/android_util_AssetManager.cpp
frameworks/base/libs/androidfw/ZipFileRO.cpp
在MountService.java的unmountSecureContainer方法中,在向vold发送unmount命令之前,进行了GC操作:
public intunmountSecureContainer(String id, boolean force) {
… …
/*
* Force a GC to makesure AssetManagers in other threads of the
* system_server arecleaned up. We have to do this since AssetManager
* instances are kept as a WeakReference andit's possible we have files
* open on the externalstorage.
*/
Runtime.getRuntime().gc();
int rc =StorageResultCode.OperationSucceeded;
try {
final Command cmd =new Command("asec", "unmount", id);
if (force) {
cmd.appendArg("force");
}
//向vold发送消息
mConnector.execute(cmd);
} catch(NativeDaemonConnectorException e) {
int code =e.getCode();
if (code ==VoldResponseCode.OpFailedStorageBusy) {
rc =StorageResultCode.OperationFailedStorageBusy;
} else {
rc =StorageResultCode.OperationFailedInternalError;
}
}
… …
return rc;
}
显然,Google的开发人员意识到这里需要进行gc操作来对AssetManager进行释放。
AssetManager的创建很容易找到,就是newAssetManager的地方,但是在java里怎么调到的finalize()方法,进而调用到C++里的AssetManager的析构方法的呢?
不断的尝试去进行分析。
在前面的分析中,我们查看到,C++层的AssetManager的处理代码中没有内存泄漏,只要AssetManager对象正确的释放了,是可以释放打开的base.apk的。那么会不会是Java层的AssetManager出现了泄漏导致C++层的AssetManager没有释放呢?Java层的AssetManager对象的创建和释放又是在哪里处理的呢?
看着茫茫代码,让人不禁感叹:只在此山中,云深不知处。
我们从AssetManager对象创建过程来入手去分析。
点击一个应用的时候,会加载其对应的base.apk,我们在Java层AssetManager 类的addAssetPath方法中添加堆栈打印,
其调用堆栈为
android.content.res.AssetManager.addAssetPath(AssetManager.java:653)
android.app.ResourcesManager.getTopLevelResources(ResourcesManager.java:221)
android.app.ActivityThread.getTopLevelResources(ActivityThread.java:1854)
android.app.LoadedApk.getResources(LoadedApk.java:558)
android.app.ContextImpl.
android.app.ContextImpl.createPackageContextAsUser(ContextImpl.java:1733)
android.app.ContextImpl.createPackageContextAsUser(ContextImpl.java:1718)
com.android.server.AttributeCache.get(AttributeCache.java:114)
com.android.server.am.ActivityRecord.
com.android.server.am.ActivityStackSupervisor.startActivityMayWait(ActivityStackSupervisor.java:1153)
com.android.server.am.ActivityManagerService.startActivityAsUser(ActivityManagerService.java:4271)
com.android.server.am.ActivityManagerService.startActivity(ActivityManagerService.java:4258)
android.app.ActivityManagerNative.onTransact(ActivityManagerNative.java:168)
com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:2703)
分析这个堆栈信息,我们来追根溯源的查看AssetManager对象的创建过程。
第一次打开应用的时候,调到ActivityManagerService的startActivityAsUser方法,
后来调到com.android.server.am.ActivityRecord.
462 ActivityRecord(ActivityManagerService _service, ProcessRecord _caller,
463 int _launchedFromUid, String_launchedFromPackage, Intent _intent, String _resolvedType,
464 ActivityInfo aInfo, Configuration_configuration,
465 ActivityRecord _resultTo, String_resultWho, int _reqCode,
466 boolean _componentSpecified,boolean _rootVoiceInteraction,
467 ActivityStackSupervisorsupervisor,
468 ActivityContainer container,Bundle options) {
… …
564 AttributeCache.Entry ent = AttributeCache.instance().get(packageName,
565 realTheme, com.android.internal.R.styleable.Window,userId);
继续调用
com.android.server.AttributeCache.get(AttributeCache.java:114)
查看AttributeCache.java
98 public Entry get(String packageName, intresId, int[] styleable, int userId) {
… …
112 Context context;
113 try {
114 context =mContext.createPackageContextAsUser(packageName, 0,
115 new UserHandle(userId));
116 if (context == null) {
117 return null;
118 }
119 } catch(PackageManager.NameNotFoundException e) {
120 return null;
121 }
122 pkg = new Package(context);
123 mPackages.put(packageName, pkg);
在这里,mPackages是一个map,其定义为
private final WeakHashMap
newWeakHashMap
在Android 7上,其定义修改为了
private final ArrayMap
在114行,变量context指向了
mContext.createPackageContextAsUser(packageName,0, new UserHandle(userId));的返回值
在122行对context进行了引用。我们可以查看到AttributeCache的内部类Package的定义及其构造方法
public final static classPackage {
public final Contextcontext;
private finalSparseArray
= newSparseArray
public Package(Context c) {
context = c;
}
}
114行调用到
android.app.ContextImpl.createPackageContextAsUser(ContextImpl.java:1718)
继续调用到
android.app.ContextImpl.createPackageContextAsUser(ContextImpl.java:1733)
查看ContextImpl.java
1733 ContextImpl c = new ContextImpl(this,mMainThread, pi, mActivityToken,
1734 user, restricted, mDisplay, null,Display.INVALID_DISPLAY, themePackageName);
1733行又调用到
android.app.ContextImpl.
查看ContextImpl.java
1884 Resources resources = packageInfo.getResources(mainThread);
… …
1906 mResources = resources;
新创建的ContextImpl对象的成员变量mResources对resources进行了引用。
1884行继续调用
android.app.LoadedApk.getResources(LoadedApk.java:558)
android.app.ActivityThread.getTopLevelResources(ActivityThread.java:1854)
android.app.ResourcesManager.getTopLevelResources(ResourcesManager.java:221)
查看ResourcesManager.java
181 Resources getTopLevelResources(StringresDir, String[] splitResDirs,
182 String[] overlayDirs, String[] libDirs, intdisplayId, String packageName,
183 Configuration overrideConfiguration, CompatibilityInfo compatInfo,Context context) {
… …
211 AssetManager assets = newAssetManager();
… …
218 // already.
219 if (resDir != null) {
220
221 if (assets.addAssetPath(resDir) == 0) {
222 return null;
223 }
224 }
… …
301 r = new Resources(assets, dm, config,compatInfo);
… …
//ContextImpl.java中的第1884行Resources resources指向该返回值r
319 return r;
终于,找到了AssetManager创建的地方了,然后创建了C++层的AssetManager对象,再调用addAssetPath方法加载了base.apk
在Resources.java中有
public Resources(AssetManager assets, DisplayMetrics metrics,Configuration config, CompatibilityInfocompatInfo) {
mAssets = assets;
mMetrics.setToDefaults();
if (compatInfo != null) {
mCompatibilityInfo = compatInfo;
}
updateConfiguration(config, metrics);
assets.recreateStringBlocks();//Modified for ThemeManager
}
Resources中的mAssets对新创建的AssetManager对象进行了引用。
AssetManager的创建可以追溯到这里,而对AssetManager对象的一系列的引用可以定位到AttributeCache类的mPackages
一个大胆的想法
AttributeCache是单例设计,而PMS与AMS都是在system_server进程中。我们来尝试调用AttributeCache的removePackage(String packageName)方法来对AssetManager对象进行释放。
public final class AttributeCache {
private static AttributeCache sInstance = null;
private final Context mContext;
private final WeakHashMap
new WeakHashMap
//在SystemServer.java中的startBootstrapServices()方法里调用该init方法进行的初始化
public static void init(Context context) {
if (sInstance == null) {
//创建单例对象
sInstance = new AttributeCache(context);
}
}
//获取单例对象
public static AttributeCache instance() {
return sInstance;
}
public AttributeCache(Context context) {
mContext = context;
}
public void removePackage(String packageName) {
synchronized (this) {
mPackages.remove(packageName);
}
}
…
}
PackageManagerService.java
AsecInstallArgs类的doPostDeleteLI中进行释放,
booleandoPostDeleteLI(boolean delete) {
if (DEBUG_SD_INSTALL) Slog.i(TAG,"doPostDeleteLI() del=" + delete);
final List
//从cid如com.UCMobile-2中提取出分隔符’-’前的包名com.UCMobile
int pos = cid.lastIndexOf("-");
if (pos > 0)
{
String packageName = cid.substring(0, pos);//cid为com.UCMobile-2
AttributeCache ac =AttributeCache.instance();
if (ac != null) { // packageName为com.UCMobile
ac.removePackage(packageName);
}
}
if (mounted) {
// Unmount first
if (PackageHelper.unMountSdDir(cid)) {
mounted = false;
}
}
return !mounted;
}
另外,需要对AttributeCache进行import
importcom.android.server.AttributeCache;
添加后进行测试,竟然不重启了,增加了一个释放操作就解决了这个问题。回顾分析处理的过程,真是山穷水尽疑无路,柳暗花明又一村。
在AsecInstallArgs类中,已经有现成的方法来从cid(com.UCMobile-2)中截取应用包名(com.UCMobile)
String getPackageName() {
return getAsecPackageName(cid);
}
static String getAsecPackageName(StringpackageCid) {
int idx = packageCid.lastIndexOf("-");
if (idx == -1) {
return packageCid;
}
return packageCid.substring(0, idx);
}
我们可以直接使用这个方法来使改动更加简洁优雅:
AttributeCache ac =AttributeCache.instance();
if (ac != null) {
ac.removePackage( getPackageName() );
}