前段时间碰到一个apk多个process依次开跑,跑到最后一个process的时候,第一个process给kill掉了,虽然第一个process中含有broadcast receive,被kill掉的原因是由于触发到了lowmemorykiller,这样一来apk最后的结果就异常了~ 尝试再三 规避掉了这个问题,记录一下~
撰写不易,转载需注明出处:http://blog.csdn.net/jscese/article/details/47317765本文来自 【jscese】的博客!
andorid用户层的application process ,在各种activity生命周期切换时,会触发AMS中的回收机制,比如启动新的apk,一直back 退出一个apk,在5.1上的代码来看,除了android AMS中默认的回收机制外,还会去维护一个oom adj 变量,作为linux层 lowmemorykiller的参考依据。
入口为trimApplications() 可以发现很多地方有调用,stop unregisterreceiver之类的操作时都会去触发回收:
final void trimApplications() {
synchronized (this) {
int i;
// First remove any unused application processes whose package
// has been removed.
for (i=mRemovedProcesses.size()-1; i>=0; i--) {
final ProcessRecord app = mRemovedProcesses.get(i);
if (app.activities.size() == 0
&& app.curReceiver == null && app.services.size() == 0) {
Slog.i(
TAG, "Exiting empty application process "
+ app.processName + " ("
+ (app.thread != null ? app.thread.asBinder() : null)
+ ")\n");
if (app.pid > 0 && app.pid != MY_PID) {
app.kill("empty", false);
} else {
try {
app.thread.scheduleExit();
} catch (Exception e) {
// Ignore exceptions.
}
}
cleanUpApplicationRecordLocked(app, false, true, -1);
mRemovedProcesses.remove(i);
if (app.persistent) {
addAppLocked(app.info, false, null /* ABI override */);
}
}
}
// Now update the oom adj for all processes.
updateOomAdjLocked();
}
}
mRemovedProcesses 列表中主要包含了 crash 的进程、5 秒内没有响应并被用户选在强制关闭的进程、以及应用开发这调用 killBackgroundProcess 想要杀死的进程。调用 Process.killProcess 将所有此类进程全部杀死。
updateOomAdjLocked 计算更新所有process的 oomadj
继续看:
final void updateOomAdjLocked() {
...
// First update the OOM adjustment for each of the
// application processes based on their current state.
int curCachedAdj = ProcessList.CACHED_APP_MIN_ADJ;
int nextCachedAdj = curCachedAdj+1;
int curEmptyAdj = ProcessList.CACHED_APP_MIN_ADJ;
int nextEmptyAdj = curEmptyAdj+2;
for (int i=N-1; i>=0; i--) {
ProcessRecord app = mLruProcesses.get(i);
if (!app.killedByAm && app.thread != null) {
app.procStateChanged = false;
computeOomAdjLocked(app, ProcessList.UNKNOWN_ADJ, TOP_APP, true, now);
...
applyOomAdjLocked(app, TOP_APP, true, now);
...
}
...
}
...
}
computeOomAdjLocked 计算当前process的adj,详细规则就不贴了,大体会根据 top activity .receiver.service process state 一大堆相关的因素去得出一个adj值。
adj 值 (-17 ~15) 越小优先级越高,从注释能看出来,策略在kernel中的lowmemorykiller驱动中实现
这里看下定义 在\frameworks\base\services\core\java\com\android\server\am\ProcessList.java :
// This is a process only hosting activities that are not visible,
// so it can be killed without any disruption.
static final int CACHED_APP_MAX_ADJ = 15;
static final int CACHED_APP_MIN_ADJ = 9;
// The B list of SERVICE_ADJ -- these are the old and decrepit
// services that aren't as shiny and interesting as the ones in the A list.
static final int SERVICE_B_ADJ = 8;
// This is the process of the previous application that the user was in.
// This process is kept above other things, because it is very common to
// switch back to the previous app. This is important both for recent
// task switch (toggling between the two top recent apps) as well as normal
// UI flow such as clicking on a URI in the e-mail app to view in the browser,
// and then pressing back to return to e-mail.
static final int PREVIOUS_APP_ADJ = 7;
// This is a process holding the home application -- we want to try
// avoiding killing it, even if it would normally be in the background,
// because the user interacts with it so much.
static final int HOME_APP_ADJ = 6;
// This is a process holding an application service -- killing it will not
// have much of an impact as far as the user is concerned.
static final int SERVICE_ADJ = 5;
// This is a process with a heavy-weight application. It is in the
// background, but we want to try to avoid killing it. Value set in
// system/rootdir/init.rc on startup.
static final int HEAVY_WEIGHT_APP_ADJ = 4;
// This is a process currently hosting a backup operation. Killing it
// is not entirely fatal but is generally a bad idea.
static final int BACKUP_APP_ADJ = 3;
// This is a process only hosting components that are perceptible to the
// user, and we really want to avoid killing them, but they are not
// immediately visible. An example is background music playback.
static final int PERCEPTIBLE_APP_ADJ = 2;
// This is a process only hosting activities that are visible to the
// user, so we'd prefer they don't disappear.
static final int VISIBLE_APP_ADJ = 1;
// This is the process running the current foreground app. We'd really
// rather not kill it!
static final int FOREGROUND_APP_ADJ = 0;
// This is a process that the system or a persistent process has bound to,
// and indicated it is important.
static final int PERSISTENT_SERVICE_ADJ = -11;
// This is a system persistent process, such as telephony. Definitely
// don't want to kill it, but doing so is not completely fatal.
static final int PERSISTENT_PROC_ADJ = -12;
// The system process runs at the default adjustment.
static final int SYSTEM_ADJ = -16;
// Special code for native processes that are not being managed by the system (so
// don't have an oom adj assigned by the system).
static final int NATIVE_ADJ = -17;
applyOomAdjLocked 将上面计算好的adj值经过一定的修整,设置到对应的process。
只关注跟lowmemorykiller 相关的调用接口, 省略了AMS中自己根据PROCESS_STATE的 kill策略, 大体有如下:
必须是非 persistent 进程,即非系统进程;
必须是空进程,即进程中没有任何 activity 存在。如果杀死存在 Activity 的进程,有可能关闭用户正在使用的程序,或者使应用程序恢复的时延变大,从而影响用户体验;
必须无 broadcast receiver。运行 broadcast receiver 一般都在等待一个事件的发生,用户并不希望此类程序被系统强制关闭;
进程中 service 的数量必须为 0。存在 service 的进程很有可能在为一个或者多个程序提供某种服务,如 GPS 定位服务。杀死此类进程将使其他进程无法正常服务。
看下 applyOomAdjLocked中的核心:
private final boolean applyOomAdjLocked(ProcessRecord app,
ProcessRecord TOP_APP, boolean doingAll, long now) {
...
if (app.curAdj != app.setAdj) {
ProcessList.setOomAdj(app.pid, app.info.uid, app.curAdj);
if (DEBUG_SWITCH || DEBUG_OOM_ADJ) Slog.v(
TAG, "Set " + app.pid + " " + app.processName +
" adj " + app.curAdj + ": " + app.adjType);
app.setAdj = app.curAdj;
}
...
}
设置oomadj ,继续看 ProcessList.java
/** * Set the out-of-memory badness adjustment for a process. * * @param pid The process identifier to set. * @param uid The uid of the app * @param amt Adjustment value -- lmkd allows -16 to +15. * * {@hide} */
public static final void setOomAdj(int pid, int uid, int amt) {
if (amt == UNKNOWN_ADJ)
return;
long start = SystemClock.elapsedRealtime();
ByteBuffer buf = ByteBuffer.allocate(4 * 4);
buf.putInt(LMK_PROCPRIO);
buf.putInt(pid);
buf.putInt(uid);
buf.putInt(amt);
writeLmkd(buf);
long now = SystemClock.elapsedRealtime();
if ((now-start) > 250) {
Slog.w("ActivityManager", "SLOW OOM ADJ: " + (now-start) + "ms for pid " + pid
+ " = " + amt);
}
}
writeLmkd 写process pid uid 以及adj 的buf, 以一个定义command打头
private static void writeLmkd(ByteBuffer buf) {
for (int i = 0; i < 3; i++) {
if (sLmkdSocket == null) {
if (openLmkdSocket() == false) {
try {
Thread.sleep(1000);
} catch (InterruptedException ie) {
}
continue;
}
}
try {
sLmkdOutputStream.write(buf.array(), 0, buf.position());
return;
} catch (IOException ex) {
Slog.w(ActivityManagerService.TAG,
"Error writing to lowmemorykiller socket");
try {
sLmkdSocket.close();
} catch (IOException ex2) {
}
sLmkdSocket = null;
}
}
}
}
可以看到try了3次 ,去打开对应的socket 然后写数据,openLmkdSocket 实现如下,android的 LocalSocket 机制,通过lmkd 这个socket通信
sLmkdSocket = new LocalSocket(LocalSocket.SOCKET_SEQPACKET);
sLmkdSocket.connect(
new LocalSocketAddress("lmkd",
LocalSocketAddress.Namespace.RESERVED));
sLmkdOutputStream = sLmkdSocket.getOutputStream();
这是作为client去请求connect ,而service端的处理在 \system\core\lmkd\lmkd.c , 可以看下这个service的启动:
service lmkd /system/bin/lmkd
class core
critical
socket lmkd seqpacket 0660 system system
标准的 C/S模式,AMS中发起的通信command 开头种类,以及buf格式 如下:
// LMK_TARGET <minfree> <minkillprio> ... (up to 6 pairs)
// LMK_PROCPRIO <pid> <prio>
// LMK_PROCREMOVE <pid>
static final byte LMK_TARGET = 0;
static final byte LMK_PROCPRIO = 1;
static final byte LMK_PROCREMOVE = 2;
设置单一process 用的是 LMK_PROCPRIO
设置整个LMK adj minfree策略的是 LMK_TARGET
LMK_PROCREMOVE kill process用
这个service 的代码比较短 1K不到,机制也比较简单:
int main(int argc __unused, char **argv __unused) {
struct sched_param param = {
.sched_priority = 1,
};
mlockall(MCL_FUTURE);
sched_setscheduler(0, SCHED_FIFO, ¶m);
if (!init())
mainloop();
ALOGI("exiting");
return 0;
}
init 那些socket相关,注册event handle func ,最后跳到mainloop去 循环poll等待
ctrl_lfd = android_get_control_socket("lmkd");
if (ctrl_lfd < 0) {
ALOGE("get lmkd control socket failed");
return -1;
}
细节不做关注,直接看收到刚刚 AMS那边发过来的command buf 的处理:
static void ctrl_command_handler(void) {
int ibuf[CTRL_PACKET_MAX / sizeof(int)];
int len;
int cmd = -1;
int nargs;
int targets;
len = ctrl_data_read((char *)ibuf, CTRL_PACKET_MAX);
if (len <= 0)
return;
nargs = len / sizeof(int) - 1;
if (nargs < 0)
goto wronglen;
cmd = ntohl(ibuf[0]);
switch(cmd) {
case LMK_TARGET:
targets = nargs / 2;
if (nargs & 0x1 || targets > (int)ARRAY_SIZE(lowmem_adj))
goto wronglen;
cmd_target(targets, &ibuf[1]);
break;
case LMK_PROCPRIO:
if (nargs != 3)
goto wronglen;
cmd_procprio(ntohl(ibuf[1]), ntohl(ibuf[2]), ntohl(ibuf[3]));
break;
case LMK_PROCREMOVE:
if (nargs != 1)
goto wronglen;
cmd_procremove(ntohl(ibuf[1]));
break;
default:
ALOGE("Received unknown command code %d", cmd);
return;
}
return;
wronglen:
ALOGE("Wrong control socket read length cmd=%d len=%d", cmd, len);
}
先看上面对应的LMK_PROCPRIO 设置某个process 的adj ,核心如下:
snprintf(path, sizeof(path), "/proc/%d/oom_score_adj", pid);
snprintf(val, sizeof(val), "%d", lowmem_oom_adj_to_oom_score_adj(oomadj));
writefilestring(path, val);
直接写process对应的文件节点,传下来的adj值 做了一个转换再写进 oom_score_adj:
static int lowmem_oom_adj_to_oom_score_adj(int oom_adj)
{
if (oom_adj == OOM_ADJUST_MAX) //15
return OOM_SCORE_ADJ_MAX; //1000
else
return (oom_adj * OOM_SCORE_ADJ_MAX) / -OOM_DISABLE;//-17
}
kernel驱动中process manager 会有对应的处理机制 ,转换后个oom_score_adj保存在进程结构体 kernel\include\linux\sched.h
task_struct->signal_struct->short oom_score_adj; /* OOM kill score adjustment */
kernel中的支持:
autoconf.h中
#define CONFIG_ANDROID_LOW_MEMORY_KILLER 1
可自行到kernel中 make menuconfig 查看
驱动目录:\kernel\drivers\staging\android\lowmemorykiller.c
static int __init lowmem_init(void)
{
register_shrinker(&lowmem_shrinker);
return 0;
}
#ifdef CONFIG_ANDROID_LOW_MEMORY_KILLER_AUTODETECT_OOM_ADJ_VALUES
__module_param_call(MODULE_PARAM_PREFIX, adj,
&lowmem_adj_array_ops,
.arr = &__param_arr_adj,
S_IRUGO | S_IWUSR, -1);
__MODULE_PARM_TYPE(adj, "array of short");
#else
module_param_array_named(adj, lowmem_adj, short, &lowmem_adj_size,
S_IRUGO | S_IWUSR);
#endif
module_param_array_named(minfree, lowmem_minfree, uint, &lowmem_minfree_size,
S_IRUGO | S_IWUSR);
注册了一个shrinker ,这个机制之前没接触过,大体的意义就是向系统注册了这个shrinker 回调函数之后,当系统空闲内存页面不足时会调用。
CONFIG_ANDROID_LOW_MEMORY_KILLER_AUTODETECT_OOM_ADJ_VALUES 支持动态改 策略阀门 值。
另外注册了文件ops 以及节点 adj .minfree 分别与lowmem_adj . lowmen_minfree 数组成对应关系
策略数组以及阀值,两个数组之间也是一一对应关系,当内存小于64M时 就去准备 kill adj >=12 的process,取最低优先级 先kill, 比如此时有3个进程分别为 12 12 14 ,那么首先kill掉的是 14 ,kill之后还是少于64M 那么两个12 adj之间,先杀占用高的,这个函数实现在下面的 lowmem_shrink 中。
static short lowmem_adj[6] = {
0,
1,
6,
12,
};
static int lowmem_adj_size = 4;
static int lowmem_minfree[6] = {
3 * 512, /* 6MB */
2 * 1024, /* 8MB */
4 * 1024, /* 16MB */
16 * 1024, /* 64MB */
};
先看下处理函数 lowmem_shrinker:
static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
{
struct task_struct *tsk;
struct task_struct *selected = NULL;
int rem = 0;
int tasksize;
int i;
short min_score_adj = OOM_SCORE_ADJ_MAX + 1;
int minfree = 0;
int selected_tasksize = 0;
short selected_oom_score_adj;
int array_size = ARRAY_SIZE(lowmem_adj);
int other_free = global_page_state(NR_FREE_PAGES) - totalreserve_pages;
int other_file = global_page_state(NR_FILE_PAGES) -
global_page_state(NR_SHMEM);
if (lowmem_adj_size < array_size)
array_size = lowmem_adj_size;
if (lowmem_minfree_size < array_size)
array_size = lowmem_minfree_size;
for (i = 0; i < array_size; i++) {
//依次遍历策略阀值数组,从小到大,根据当前memory free情况,取触发adj值
minfree = lowmem_minfree[i];
if (other_free < minfree && other_file < minfree) {
min_score_adj = lowmem_adj[i];
break;
}
}
//这里得到的min_score_adj 就是此时内存状态下 将会kill掉的最小score_adj
...
for_each_process(tsk) {
...
tasksize = get_mm_rss(p->mm);
...
if (selected) {
if (oom_score_adj < selected_oom_score_adj)
continue;
if (oom_score_adj == selected_oom_score_adj &&
tasksize <= selected_tasksize)
continue;
}//可以看到 遍历一圈process 只为找到一个 oom_score_adj tasksize 最大的process
selected = p;
selected_tasksize = tasksize;
selected_oom_score_adj = oom_score_adj;
}
if (selected) {
lowmem_print(1, "Killing '%s' (%d), adj %hd,\n" \
" to free %ldkB on behalf of '%s' (%d) because\n" \
" cache %ldkB is below limit %ldkB for oom_score_adj %hd\n" \
" Free memory is %ldkB above reserved\n",
selected->comm, selected->pid,
selected_oom_score_adj,
selected_tasksize * (long)(PAGE_SIZE / 1024),
current->comm, current->pid,
other_file * (long)(PAGE_SIZE / 1024),
minfree * (long)(PAGE_SIZE / 1024),
min_score_adj,
other_free * (long)(PAGE_SIZE / 1024));
trace_lowmem_kill(selected, other_file, minfree, min_score_adj, other_free);
lowmem_deathpending_timeout = jiffies + HZ;
send_sig(SIGKILL, selected, 0); //发送kill signal 去kill selected的process
set_tsk_thread_flag(selected, TIF_MEMDIE);
rem -= selected_tasksize;
}
}
以上就是正常的依次 触发lowmemorykill 回收策略流程,驱动比较灵活,还提供了策略阀门值动态修改的机制,通过file ops 让application层去写入修改。
上面有贴出文件节点代码 分别对应为 adj minfree
实际路径为:
同样application 那边设置下来的话 也是需要通过 lowmem_oom_adj_to_oom_score_adj 去转换之后 赋值到 策略数组 lowmem_adj 中,需要注意的是 driver中定义的数组size 为 6 。
application层的接口还是放在上面说到过的 ProcessList.java 中,贴出相关的定义和函数吧,具体流程还是得看代码:
// These are the various interesting memory levels that we will give to
// the OOM killer. Note that the OOM killer only supports 6 slots, so we
// can't give it a different value for every possible kind of process.
private final int[] mOomAdj = new int[] {
FOREGROUND_APP_ADJ, VISIBLE_APP_ADJ, PERCEPTIBLE_APP_ADJ,
BACKUP_APP_ADJ, CACHED_APP_MIN_ADJ, CACHED_APP_MAX_ADJ
};
// These are the low-end OOM level limits. This is appropriate for an
// HVGA or smaller phone with less than 512MB. Values are in KB.
private final int[] mOomMinFreeLow = new int[] {
12288, 18432, 24576, /*8192, 12288, 16384,*/
36864, 43008+20000, 49152+20000 /*36864, 43008, 49152*/
};
// These are the high-end OOM level limits. This is appropriate for a
// 1280x800 or larger screen with around 1GB RAM. Values are in KB.
private final int[] mOomMinFreeHigh = new int[] {
73728, 92160, 110592, /*32768, 61440, 73728,*/
129024, 147456+20000, 184320+20000 /*129024, 147456, 184320*/
};
// The actual OOM killer memory levels we are using.
private final int[] mOomMinFree = new int[mOomAdj.length];
上面为定义的 策略adj 以及minfree 相关的数组资源
private void updateOomLevels(int displayWidth, int displayHeight, boolean write) {
...
if (write) {
ByteBuffer buf = ByteBuffer.allocate(4 * (2*mOomAdj.length + 1));
buf.putInt(LMK_TARGET);
for (int i=0; i<mOomAdj.length; i++) {
buf.putInt((mOomMinFree[i]*1024)/PAGE_SIZE);
buf.putInt(mOomAdj[i]);
}
writeLmkd(buf);
SystemProperties.set("sys.sysctl.extra_free_kbytes", Integer.toString(reserve));
}// 同上面提到的setoomadj 一样,最终的设置由lmkd service 去实现
}
lmkd service 中对应的command LMK_TARGET 函数为:
static void cmd_target(int ntargets, int *params) {
...
for (i = 0; i < lowmem_targets_size; i++) {
char val[40];
if (i) {
strlcat(minfreestr, ",", sizeof(minfreestr));
strlcat(killpriostr, ",", sizeof(killpriostr));
}
snprintf(val, sizeof(val), "%d", lowmem_minfree[i]);
strlcat(minfreestr, val, sizeof(minfreestr));
snprintf(val, sizeof(val), "%d", lowmem_adj[i]);
strlcat(killpriostr, val, sizeof(killpriostr));
}
writefilestring(INKERNEL_MINFREE_PATH, minfreestr);
writefilestring(INKERNEL_ADJ_PATH, killpriostr);
// 获取了上面processlist 里面传下来的两个 策略数组 然后write到文件节点中
//#define INKERNEL_MINFREE_PATH "/sys/module/lowmemorykiller/parameters/minfree"
//#define INKERNEL_ADJ_PATH "/sys/module/lowmemorykiller/parameters/adj"
//驱动中对应的 文件节点
}
大体脉络理清,运行机制也清晰了,具体的实现细节就需要细读code了
而我碰到的问题,就是memory 不足 触发了lowmemorykiller,但是运行的apk 多进程时 ,最初的进程已经进入后台,compute adj 优先级较低,此时新开进程申请memory,触发回收,memory free 达到lowmemorykill的阀值,直接就杀掉了正在运行apk的后台进程, 不管有没有receiver ,service 而apk如有依赖性 就会触发异常了。
从另一层面上来说,multi process 的apk 对lowmemory device的兼容性存在缺陷,设计并不合理。
作为平台需求,只能从系统的方面来规避了,强提特定的process 的adj 以达到正常运行,但这只是规避~不符合规范
从最近的一些工作来看,碰到了好几次这种存在兼容性问题的apk ,上一篇 32bit/64bit 平台apk动态库的问题 也是这种现象,这还都是 google play 上的apk ,apk的门槛越来越低,都是一味的去追求屌绚酷 ,对深层次的运行机制不去关注,稳定以及运行兼容性自然无法达标,当然这也可以算是google在平台框架兼容性上存在漏洞,贵圈太乱~ ~ 这也是android 无法追上apple的一个原因之一吧~ 个人吐槽~请笑看