Android——内存管理-lowmemorykiller 机制

  前段时间碰到一个apk多个process依次开跑,跑到最后一个process的时候,第一个process给kill掉了,虽然第一个process中含有broadcast receive,被kill掉的原因是由于触发到了lowmemorykiller,这样一来apk最后的结果就异常了~ 尝试再三 规避掉了这个问题,记录一下~

撰写不易,转载需注明出处:http://blog.csdn.net/jscese/article/details/47317765本文来自 【jscese】的博客!

概念

andorid用户层的application process ,在各种activity生命周期切换时,会触发AMS中的回收机制,比如启动新的apk,一直back 退出一个apk,在5.1上的代码来看,除了android AMS中默认的回收机制外,还会去维护一个oom adj 变量,作为linux层 lowmemorykiller的参考依据。

AMS回收机制

入口为trimApplications() 可以发现很多地方有调用,stop unregisterreceiver之类的操作时都会去触发回收:

    final void trimApplications() {
        synchronized (this) {
            int i;

            // First remove any unused application processes whose package
            // has been removed.
            for (i=mRemovedProcesses.size()-1; i>=0; i--) {
                final ProcessRecord app = mRemovedProcesses.get(i);
                if (app.activities.size() == 0
                        && app.curReceiver == null && app.services.size() == 0) {
                    Slog.i(
                        TAG, "Exiting empty application process "
                        + app.processName + " ("
                        + (app.thread != null ? app.thread.asBinder() : null)
                        + ")\n");
                    if (app.pid > 0 && app.pid != MY_PID) {
                        app.kill("empty", false);
                    } else {
                        try {
                            app.thread.scheduleExit();
                        } catch (Exception e) {
                            // Ignore exceptions.
                        }
                    }
                    cleanUpApplicationRecordLocked(app, false, true, -1);
                    mRemovedProcesses.remove(i);

                    if (app.persistent) {
                        addAppLocked(app.info, false, null /* ABI override */);
                    }
                }
            }

            // Now update the oom adj for all processes.

            updateOomAdjLocked();

        }
    }

mRemovedProcesses 列表中主要包含了 crash 的进程、5 秒内没有响应并被用户选在强制关闭的进程、以及应用开发这调用 killBackgroundProcess 想要杀死的进程。调用 Process.killProcess 将所有此类进程全部杀死。

updateOomAdjLocked 计算更新所有process的 oomadj

继续看:

final void updateOomAdjLocked() {
...
        // First update the OOM adjustment for each of the
        // application processes based on their current state.
        int curCachedAdj = ProcessList.CACHED_APP_MIN_ADJ;
        int nextCachedAdj = curCachedAdj+1;
        int curEmptyAdj = ProcessList.CACHED_APP_MIN_ADJ;
        int nextEmptyAdj = curEmptyAdj+2;
        for (int i=N-1; i>=0; i--) {
            ProcessRecord app = mLruProcesses.get(i);
            if (!app.killedByAm && app.thread != null) {
                app.procStateChanged = false;
                computeOomAdjLocked(app, ProcessList.UNKNOWN_ADJ, TOP_APP, true, now);
    ...
     applyOomAdjLocked(app, TOP_APP, true, now);
    ...
      }
      ...
     }
...


}

computeOomAdjLocked 计算当前process的adj,详细规则就不贴了,大体会根据 top activity .receiver.service process state 一大堆相关的因素去得出一个adj值。

adj 值 (-17 ~15) 越小优先级越高,从注释能看出来,策略在kernel中的lowmemorykiller驱动中实现
这里看下定义 在\frameworks\base\services\core\java\com\android\server\am\ProcessList.java :

    // This is a process only hosting activities that are not visible,
    // so it can be killed without any disruption.
    static final int CACHED_APP_MAX_ADJ = 15;
    static final int CACHED_APP_MIN_ADJ = 9;

    // The B list of SERVICE_ADJ -- these are the old and decrepit
    // services that aren't as shiny and interesting as the ones in the A list.
    static final int SERVICE_B_ADJ = 8;

    // This is the process of the previous application that the user was in.
    // This process is kept above other things, because it is very common to
    // switch back to the previous app.  This is important both for recent
    // task switch (toggling between the two top recent apps) as well as normal
    // UI flow such as clicking on a URI in the e-mail app to view in the browser,
    // and then pressing back to return to e-mail.
    static final int PREVIOUS_APP_ADJ = 7;

    // This is a process holding the home application -- we want to try
    // avoiding killing it, even if it would normally be in the background,
    // because the user interacts with it so much.
    static final int HOME_APP_ADJ = 6;

    // This is a process holding an application service -- killing it will not
    // have much of an impact as far as the user is concerned.
    static final int SERVICE_ADJ = 5;

    // This is a process with a heavy-weight application.  It is in the
    // background, but we want to try to avoid killing it.  Value set in
    // system/rootdir/init.rc on startup.
    static final int HEAVY_WEIGHT_APP_ADJ = 4;

    // This is a process currently hosting a backup operation.  Killing it
    // is not entirely fatal but is generally a bad idea.
    static final int BACKUP_APP_ADJ = 3;

    // This is a process only hosting components that are perceptible to the
    // user, and we really want to avoid killing them, but they are not
    // immediately visible. An example is background music playback.
    static final int PERCEPTIBLE_APP_ADJ = 2;

    // This is a process only hosting activities that are visible to the
    // user, so we'd prefer they don't disappear.
    static final int VISIBLE_APP_ADJ = 1;

    // This is the process running the current foreground app.  We'd really
    // rather not kill it!
    static final int FOREGROUND_APP_ADJ = 0;

    // This is a process that the system or a persistent process has bound to,
    // and indicated it is important.
    static final int PERSISTENT_SERVICE_ADJ = -11;

    // This is a system persistent process, such as telephony.  Definitely
    // don't want to kill it, but doing so is not completely fatal.
    static final int PERSISTENT_PROC_ADJ = -12;

    // The system process runs at the default adjustment.
    static final int SYSTEM_ADJ = -16;

    // Special code for native processes that are not being managed by the system (so
    // don't have an oom adj assigned by the system).
    static final int NATIVE_ADJ = -17;

applyOomAdjLocked 将上面计算好的adj值经过一定的修整,设置到对应的process。

只关注跟lowmemorykiller 相关的调用接口, 省略了AMS中自己根据PROCESS_STATE的 kill策略, 大体有如下:
必须是非 persistent 进程,即非系统进程;
必须是空进程,即进程中没有任何 activity 存在。如果杀死存在 Activity 的进程,有可能关闭用户正在使用的程序,或者使应用程序恢复的时延变大,从而影响用户体验;
必须无 broadcast receiver。运行 broadcast receiver 一般都在等待一个事件的发生,用户并不希望此类程序被系统强制关闭;
进程中 service 的数量必须为 0。存在 service 的进程很有可能在为一个或者多个程序提供某种服务,如 GPS 定位服务。杀死此类进程将使其他进程无法正常服务。

看下 applyOomAdjLocked中的核心:

    private final boolean applyOomAdjLocked(ProcessRecord app,
            ProcessRecord TOP_APP, boolean doingAll, long now) {
...

        if (app.curAdj != app.setAdj) {
            ProcessList.setOomAdj(app.pid, app.info.uid, app.curAdj);
            if (DEBUG_SWITCH || DEBUG_OOM_ADJ) Slog.v(
                TAG, "Set " + app.pid + " " + app.processName +
                " adj " + app.curAdj + ": " + app.adjType);
            app.setAdj = app.curAdj;
        }
...

}

设置oomadj ,继续看 ProcessList.java

    /** * Set the out-of-memory badness adjustment for a process. * * @param pid The process identifier to set. * @param uid The uid of the app * @param amt Adjustment value -- lmkd allows -16 to +15. * * {@hide} */
    public static final void setOomAdj(int pid, int uid, int amt) {
        if (amt == UNKNOWN_ADJ)
            return;

        long start = SystemClock.elapsedRealtime();
        ByteBuffer buf = ByteBuffer.allocate(4 * 4);
        buf.putInt(LMK_PROCPRIO);
        buf.putInt(pid);
        buf.putInt(uid);
        buf.putInt(amt);
        writeLmkd(buf);
        long now = SystemClock.elapsedRealtime();
        if ((now-start) > 250) {
            Slog.w("ActivityManager", "SLOW OOM ADJ: " + (now-start) + "ms for pid " + pid
                    + " = " + amt);
        }
    }

writeLmkd 写process pid uid 以及adj 的buf, 以一个定义command打头

    private static void writeLmkd(ByteBuffer buf) {

        for (int i = 0; i < 3; i++) {
            if (sLmkdSocket == null) {
                    if (openLmkdSocket() == false) {
                        try {
                            Thread.sleep(1000);
                        } catch (InterruptedException ie) {
                        }
                        continue;
                    }
            }

            try {
                sLmkdOutputStream.write(buf.array(), 0, buf.position());
                return;
            } catch (IOException ex) {
                Slog.w(ActivityManagerService.TAG,
                       "Error writing to lowmemorykiller socket");

                try {
                    sLmkdSocket.close();
                } catch (IOException ex2) {
                }

                sLmkdSocket = null;
            }
        }
    }
}

可以看到try了3次 ,去打开对应的socket 然后写数据,openLmkdSocket 实现如下,android的 LocalSocket 机制,通过lmkd 这个socket通信

sLmkdSocket = new LocalSocket(LocalSocket.SOCKET_SEQPACKET);
            sLmkdSocket.connect(
                new LocalSocketAddress("lmkd",
                        LocalSocketAddress.Namespace.RESERVED));
            sLmkdOutputStream = sLmkdSocket.getOutputStream();

这是作为client去请求connect ,而service端的处理在 \system\core\lmkd\lmkd.c , 可以看下这个service的启动:

service lmkd /system/bin/lmkd
    class core
    critical
    socket lmkd seqpacket 0660 system system

标准的 C/S模式,AMS中发起的通信command 开头种类,以及buf格式 如下:

    // LMK_TARGET <minfree> <minkillprio> ... (up to 6 pairs)
    // LMK_PROCPRIO <pid> <prio> 
    // LMK_PROCREMOVE <pid>
    static final byte LMK_TARGET = 0;
    static final byte LMK_PROCPRIO = 1;
    static final byte LMK_PROCREMOVE = 2;

设置单一process 用的是 LMK_PROCPRIO
设置整个LMK adj minfree策略的是 LMK_TARGET
LMK_PROCREMOVE kill process用

lmkd service

这个service 的代码比较短 1K不到,机制也比较简单:

int main(int argc __unused, char **argv __unused) {
    struct sched_param param = {
            .sched_priority = 1,
    };

    mlockall(MCL_FUTURE);
    sched_setscheduler(0, SCHED_FIFO, &param);
    if (!init())
        mainloop();

    ALOGI("exiting");
    return 0;
}

init 那些socket相关,注册event handle func ,最后跳到mainloop去 循环poll等待

 ctrl_lfd = android_get_control_socket("lmkd");
    if (ctrl_lfd < 0) {
        ALOGE("get lmkd control socket failed");
        return -1;
    }

细节不做关注,直接看收到刚刚 AMS那边发过来的command buf 的处理:

static void ctrl_command_handler(void) {
    int ibuf[CTRL_PACKET_MAX / sizeof(int)];
    int len;
    int cmd = -1;
    int nargs;
    int targets;

    len = ctrl_data_read((char *)ibuf, CTRL_PACKET_MAX);
    if (len <= 0)
        return;

    nargs = len / sizeof(int) - 1;
    if (nargs < 0)
        goto wronglen;

    cmd = ntohl(ibuf[0]);

    switch(cmd) {
    case LMK_TARGET:
        targets = nargs / 2;
        if (nargs & 0x1 || targets > (int)ARRAY_SIZE(lowmem_adj))
            goto wronglen;
        cmd_target(targets, &ibuf[1]);
        break;
    case LMK_PROCPRIO:
        if (nargs != 3)
            goto wronglen;
        cmd_procprio(ntohl(ibuf[1]), ntohl(ibuf[2]), ntohl(ibuf[3]));
        break;
    case LMK_PROCREMOVE:
        if (nargs != 1)
            goto wronglen;
        cmd_procremove(ntohl(ibuf[1]));
        break;
    default:
        ALOGE("Received unknown command code %d", cmd);
        return;
    }

    return;

wronglen:
    ALOGE("Wrong control socket read length cmd=%d len=%d", cmd, len);
}

先看上面对应的LMK_PROCPRIO 设置某个process 的adj ,核心如下:

    snprintf(path, sizeof(path), "/proc/%d/oom_score_adj", pid);
    snprintf(val, sizeof(val), "%d", lowmem_oom_adj_to_oom_score_adj(oomadj));
    writefilestring(path, val);

直接写process对应的文件节点,传下来的adj值 做了一个转换再写进 oom_score_adj:

static int lowmem_oom_adj_to_oom_score_adj(int oom_adj)
{
    if (oom_adj == OOM_ADJUST_MAX) //15
        return OOM_SCORE_ADJ_MAX; //1000
    else
        return (oom_adj * OOM_SCORE_ADJ_MAX) / -OOM_DISABLE;//-17
}

kernel驱动中process manager 会有对应的处理机制 ,转换后个oom_score_adj保存在进程结构体 kernel\include\linux\sched.h
task_struct->signal_struct->short oom_score_adj; /* OOM kill score adjustment */

lowmemorykiller driver

kernel中的支持:
autoconf.h中

#define CONFIG_ANDROID_LOW_MEMORY_KILLER 1

可自行到kernel中 make menuconfig 查看

驱动目录:\kernel\drivers\staging\android\lowmemorykiller.c

static int __init lowmem_init(void)
{
    register_shrinker(&lowmem_shrinker);
    return 0;
}

#ifdef CONFIG_ANDROID_LOW_MEMORY_KILLER_AUTODETECT_OOM_ADJ_VALUES
__module_param_call(MODULE_PARAM_PREFIX, adj,
            &lowmem_adj_array_ops,
            .arr = &__param_arr_adj,
            S_IRUGO | S_IWUSR, -1);
__MODULE_PARM_TYPE(adj, "array of short");
#else
module_param_array_named(adj, lowmem_adj, short, &lowmem_adj_size,
             S_IRUGO | S_IWUSR);
#endif
module_param_array_named(minfree, lowmem_minfree, uint, &lowmem_minfree_size,
             S_IRUGO | S_IWUSR);

注册了一个shrinker ,这个机制之前没接触过,大体的意义就是向系统注册了这个shrinker 回调函数之后,当系统空闲内存页面不足时会调用。
CONFIG_ANDROID_LOW_MEMORY_KILLER_AUTODETECT_OOM_ADJ_VALUES 支持动态改 策略阀门 值。
另外注册了文件ops 以及节点 adj .minfree 分别与lowmem_adj . lowmen_minfree 数组成对应关系

策略数组以及阀值,两个数组之间也是一一对应关系,当内存小于64M时 就去准备 kill adj >=12 的process,取最低优先级 先kill, 比如此时有3个进程分别为 12 12 14 ,那么首先kill掉的是 14 ,kill之后还是少于64M 那么两个12 adj之间,先杀占用高的,这个函数实现在下面的 lowmem_shrink 中。

static short lowmem_adj[6] = {
    0,
    1,
    6,
    12,
};
static int lowmem_adj_size = 4;
static int lowmem_minfree[6] = {
    3 * 512,    /* 6MB */
    2 * 1024,   /* 8MB */
    4 * 1024,   /* 16MB */
    16 * 1024,  /* 64MB */
};

先看下处理函数 lowmem_shrinker:

static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
{
    struct task_struct *tsk;
    struct task_struct *selected = NULL;
    int rem = 0;
    int tasksize;
    int i;
    short min_score_adj = OOM_SCORE_ADJ_MAX + 1;
    int minfree = 0;
    int selected_tasksize = 0;
    short selected_oom_score_adj;
    int array_size = ARRAY_SIZE(lowmem_adj);
    int other_free = global_page_state(NR_FREE_PAGES) - totalreserve_pages;
    int other_file = global_page_state(NR_FILE_PAGES) -
                        global_page_state(NR_SHMEM);

    if (lowmem_adj_size < array_size)
        array_size = lowmem_adj_size;
    if (lowmem_minfree_size < array_size)
        array_size = lowmem_minfree_size;
    for (i = 0; i < array_size; i++) {  
    //依次遍历策略阀值数组,从小到大,根据当前memory free情况,取触发adj值
        minfree = lowmem_minfree[i];
        if (other_free < minfree && other_file < minfree) {
            min_score_adj = lowmem_adj[i];
            break;
        }
    }
//这里得到的min_score_adj  就是此时内存状态下 将会kill掉的最小score_adj 
...
for_each_process(tsk) {
...
tasksize = get_mm_rss(p->mm);
...
        if (selected) {
            if (oom_score_adj < selected_oom_score_adj)
                continue;
            if (oom_score_adj == selected_oom_score_adj &&
                tasksize <= selected_tasksize)
                continue;
        }//可以看到 遍历一圈process 只为找到一个 oom_score_adj tasksize 最大的process
        selected = p;
        selected_tasksize = tasksize;
        selected_oom_score_adj = oom_score_adj;
  }
    if (selected) {
        lowmem_print(1, "Killing '%s' (%d), adj %hd,\n" \
                " to free %ldkB on behalf of '%s' (%d) because\n" \
                " cache %ldkB is below limit %ldkB for oom_score_adj %hd\n" \
                " Free memory is %ldkB above reserved\n",
                 selected->comm, selected->pid,
                 selected_oom_score_adj,
                 selected_tasksize * (long)(PAGE_SIZE / 1024),
                 current->comm, current->pid,
                 other_file * (long)(PAGE_SIZE / 1024),
                 minfree * (long)(PAGE_SIZE / 1024),
                 min_score_adj,
                 other_free * (long)(PAGE_SIZE / 1024));

        trace_lowmem_kill(selected,  other_file, minfree, min_score_adj, other_free);

        lowmem_deathpending_timeout = jiffies + HZ;
        send_sig(SIGKILL, selected, 0);  //发送kill signal 去kill selected的process
        set_tsk_thread_flag(selected, TIF_MEMDIE);
        rem -= selected_tasksize;
    }
}

以上就是正常的依次 触发lowmemorykill 回收策略流程,驱动比较灵活,还提供了策略阀门值动态修改的机制,通过file ops 让application层去写入修改。
上面有贴出文件节点代码 分别对应为 adj minfree
实际路径为:
这里写图片描述

同样application 那边设置下来的话 也是需要通过 lowmem_oom_adj_to_oom_score_adj 去转换之后 赋值到 策略数组 lowmem_adj 中,需要注意的是 driver中定义的数组size 为 6 。

application层的接口还是放在上面说到过的 ProcessList.java 中,贴出相关的定义和函数吧,具体流程还是得看代码:

    // These are the various interesting memory levels that we will give to
    // the OOM killer. Note that the OOM killer only supports 6 slots, so we
    // can't give it a different value for every possible kind of process.
    private final int[] mOomAdj = new int[] {
            FOREGROUND_APP_ADJ, VISIBLE_APP_ADJ, PERCEPTIBLE_APP_ADJ,
            BACKUP_APP_ADJ, CACHED_APP_MIN_ADJ, CACHED_APP_MAX_ADJ
    };
    // These are the low-end OOM level limits. This is appropriate for an
    // HVGA or smaller phone with less than 512MB. Values are in KB.
    private final int[] mOomMinFreeLow = new int[] {
            12288, 18432, 24576, /*8192, 12288, 16384,*/
            36864, 43008+20000, 49152+20000 /*36864, 43008, 49152*/
    };
    // These are the high-end OOM level limits. This is appropriate for a
    // 1280x800 or larger screen with around 1GB RAM. Values are in KB.
    private final int[] mOomMinFreeHigh = new int[] {
            73728, 92160, 110592, /*32768, 61440, 73728,*/
            129024, 147456+20000, 184320+20000 /*129024, 147456, 184320*/
    };

    // The actual OOM killer memory levels we are using.
    private final int[] mOomMinFree = new int[mOomAdj.length];

上面为定义的 策略adj 以及minfree 相关的数组资源

  private void updateOomLevels(int displayWidth, int displayHeight, boolean write) {
  ...
          if (write) {
            ByteBuffer buf = ByteBuffer.allocate(4 * (2*mOomAdj.length + 1));
            buf.putInt(LMK_TARGET);
            for (int i=0; i<mOomAdj.length; i++) {
                buf.putInt((mOomMinFree[i]*1024)/PAGE_SIZE);
                buf.putInt(mOomAdj[i]);
            }

            writeLmkd(buf);
            SystemProperties.set("sys.sysctl.extra_free_kbytes", Integer.toString(reserve));
        }// 同上面提到的setoomadj 一样,最终的设置由lmkd service 去实现
}

lmkd service 中对应的command LMK_TARGET 函数为:

static void cmd_target(int ntargets, int *params) {
...
        for (i = 0; i < lowmem_targets_size; i++) {
            char val[40];

            if (i) {
                strlcat(minfreestr, ",", sizeof(minfreestr));
                strlcat(killpriostr, ",", sizeof(killpriostr));
            }

            snprintf(val, sizeof(val), "%d", lowmem_minfree[i]);
            strlcat(minfreestr, val, sizeof(minfreestr));
            snprintf(val, sizeof(val), "%d", lowmem_adj[i]);
            strlcat(killpriostr, val, sizeof(killpriostr));
        }

        writefilestring(INKERNEL_MINFREE_PATH, minfreestr);
        writefilestring(INKERNEL_ADJ_PATH, killpriostr);
        // 获取了上面processlist 里面传下来的两个 策略数组 然后write到文件节点中
        //#define INKERNEL_MINFREE_PATH "/sys/module/lowmemorykiller/parameters/minfree"
        //#define INKERNEL_ADJ_PATH "/sys/module/lowmemorykiller/parameters/adj"
        //驱动中对应的 文件节点
}

大体脉络理清,运行机制也清晰了,具体的实现细节就需要细读code了

而我碰到的问题,就是memory 不足 触发了lowmemorykiller,但是运行的apk 多进程时 ,最初的进程已经进入后台,compute adj 优先级较低,此时新开进程申请memory,触发回收,memory free 达到lowmemorykill的阀值,直接就杀掉了正在运行apk的后台进程, 不管有没有receiver ,service 而apk如有依赖性 就会触发异常了。

从另一层面上来说,multi process 的apk 对lowmemory device的兼容性存在缺陷,设计并不合理。
作为平台需求,只能从系统的方面来规避了,强提特定的process 的adj 以达到正常运行,但这只是规避~不符合规范

从最近的一些工作来看,碰到了好几次这种存在兼容性问题的apk ,上一篇 32bit/64bit 平台apk动态库的问题 也是这种现象,这还都是 google play 上的apk ,apk的门槛越来越低,都是一味的去追求屌绚酷 ,对深层次的运行机制不去关注,稳定以及运行兼容性自然无法达标,当然这也可以算是google在平台框架兼容性上存在漏洞,贵圈太乱~ ~ 这也是android 无法追上apple的一个原因之一吧~ 个人吐槽~请笑看

你可能感兴趣的:(内存管理,兼容,multi-proc,AMS回收,lowmemory)