SystemServer启动和重启流程

序言

记录SystemServer启动过程以及crash后如何重启的。

流程

SystemServer 是由Zygote进程fork出来的位于ZygoteInit.java的main方法中。

            if (startSystemServer) {
                Runnable r = forkSystemServer(abiList, zygoteSocketName, zygoteServer);

                // {@code r == null} in the parent (zygote) process, and {@code r != null} in the
                // child (system_server) process.
                if (r != null) {
                    r.run();
                    return;
                }
            }

接下来我们分析下forkSystemServer这个方法:

        /* Hardcoded command line to start the system server */
        String args[] = { // 1
                "--setuid=1000",
                "--setgid=1000",
                "--setgroups=1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1018,1021,1023,"
                        + "1024,1032,1065,3001,3002,3003,3006,3007,3009,3010",
                "--capabilities=" + capabilities + "," + capabilities,
                "--nice-name=system_server",
                "--runtime-args",
                "--target-sdk-version=" + VMRuntime.SDK_VERSION_CUR_DEVELOPMENT,
                "com.android.server.SystemServer",
        };
        ZygoteArguments parsedArgs = null;

        int pid;

        try {
            ...
            /* Request to fork the system server process */
            pid = Zygote.forkSystemServer( // 2
                    parsedArgs.mUid, parsedArgs.mGid,
                    parsedArgs.mGids,
                    parsedArgs.mRuntimeFlags,
                    null,
                    parsedArgs.mPermittedCapabilities,
                    parsedArgs.mEffectiveCapabilities);
        } catch (IllegalArgumentException ex) {
            throw new RuntimeException(ex);
        }

        /* For child process */
        if (pid == 0) { 
            if (hasSecondZygote(abiList)) {
                waitForSecondaryZygote(socketName);
            }

            zygoteServer.closeServerSocket(); // 3
            return handleSystemServerProcess(parsedArgs); // 4
        }

代码1的地方,设置了system_server进程的uid、gid和groups(Process.java中有定义),以及进程名字"system_server",接着调用代码2处Zygote的7个参数的forkSystemServer来fork一个进程,由于fork出来的子进程拥有所有父进程的东西,所以这里的pid会返回两个值,如果这个值是fork出来的子进程的pid,那么就证明此时代码运行在Zygote进程,如果pid == 0 ,那就证明此时代码运行在systemsever进程。如果运行在SystemServer进程,SystemServer继承了Zygote进程的所有内容,但是SystemServer进程又不用Zygote进程中的Socket,所以必须close它,如代码3所示。

接下来分析下代码2和代码4的相关逻辑,首先我们来看下代码2的源码:

    public static int forkSystemServer(int uid, int gid, int[] gids, int runtimeFlags,
            int[][] rlimits, long permittedCapabilities, long effectiveCapabilities) {
        ...
        int pid = nativeForkSystemServer(
                uid, gid, gids, runtimeFlags, rlimits,
                permittedCapabilities, effectiveCapabilities);
        ...
    }

forkSystemServer方法又调用了nativeForkSystemServer方法,从名称上可以看出,它是一个native方法:

    private static native int nativeForkSystemServer(int uid, int gid, int[] gids, int runtimeFlags,
            int[][] rlimits, long permittedCapabilities, long effectiveCapabilities);

我们看下它对应的jni方法。由于nativeForkSystemServer位于Zygote.java中,Zygote.java的路径为
frameworks/base/core/java/com/android/internal/os/,所以相对应的native方法位于frameworks/base/core/jni/中。而Zygote.java对应的jni的文件名是以包名+类名定义的,即com_android_internal_os_Zygote.cpp。而nativeForkSystemServer对应的jni方法的名字必须包括包名+类名+方法名,即

static jint com_android_internal_os_Zygote_nativeForkAndSpecialize(
        JNIEnv* env, jclass, jint uid, jint gid, jintArray gids,
        jint runtime_flags, jobjectArray rlimits,
        jint mount_external, jstring se_info, jstring nice_name,
        jintArray managed_fds_to_close, jintArray managed_fds_to_ignore, jboolean is_child_zygote,
        jstring instruction_set, jstring app_data_dir) {
    ...
    pid_t pid = ForkCommon(env, false, fds_to_close, fds_to_ignore);
    ...
}

nativeForkAndSpecialize又调用了ForkCommon方法,对应的实现如下:

// Utility routine to fork a process from the zygote.
static pid_t ForkCommon(JNIEnv* env, bool is_system_server,
                        const std::vector& fds_to_close,
                        const std::vector& fds_to_ignore) {
  SetSignalHandlers();
  ...
  pid_t pid = fork();
  ...
}

ForkCommon调用了两个重要的函数,一个是fork函数(它的作用是创建一个新的子进程这里fork出来的进程就是SystemServer进程),一个是SetSignalHandlers函数。

static void SetSignalHandlers() {
  struct sigaction sig_chld = {};
  sig_chld.sa_handler = SigChldHandler;

  if (sigaction(SIGCHLD, &sig_chld, nullptr) < 0) {
    ALOGW("Error setting SIGCHLD handler: %s", strerror(errno));
  }

  struct sigaction sig_hup = {};
  sig_hup.sa_handler = SIG_IGN;
  if (sigaction(SIGHUP, &sig_hup, nullptr) < 0) {
    ALOGW("Error setting SIGHUP handler: %s", strerror(errno));
  }
}

在SetSignalHandlers函数中调用SigChldHandler函数,此函数用来捕捉SigChld信号(SigChld属于linux的一种信号,在一个进程终止或者停止时,将SIGCHLD信号发送给其父进程。系统默认将忽略此信号。如果父进程希望被告知其子系统的这种状态,则应捕捉此信号。信号的捕捉函数中通常调用wait函数以取得进程ID和其终止状态),我们看下它的实现:

// This signal handler is for zygote mode, since the zygote must reap its children
static void SigChldHandler(int /*signal_number*/) {
  ...
  while ((pid = waitpid(-1, &status, WNOHANG)) > 0) {
    ...
    // If the just-crashed process is the system_server, bring down zygote
    // so that it is restarted by init and system server will be restarted
    // from there.
    if (pid == gSystemServerPid) {
      async_safe_format_log(ANDROID_LOG_ERROR, LOG_TAG,
                            "Exit zygote because system server (pid %d) has terminated", pid);
      kill(getpid(), SIGKILL);
    }

就像SIGCHLD信号的描述,SigChldHandler 利用一个死循环和一个waitpd函数来获取进程的ID和其终止状态,如果发现捕获的crash进程的pid是SystemServer进程,则通过getpid函数获取自己的pid,然后自己杀死自己。目的是同生共死,因为当Zygote进程死掉后,其父进程Init进程会检测到,就会重启其子进程Zygote进程,这样Zygote也会拉起SystemServer进程。

分析完了代码2 forkSystemServer的代码,我们再来看下代码4的handleSystemServerProcess的代码,实现如下:

    /**
     * Finish remaining work for the newly forked system server process.
     */
    private static Runnable handleSystemServerProcess(ZygoteArguments parsedArgs) {
        ...
            /*
             * Pass the remaining arguments to SystemServer.
             */
            return ZygoteInit.zygoteInit(parsedArgs.mTargetSdkVersion,
                    parsedArgs.mRemainingArgs, cl);
    }

handleSystemServerProcess又调用了ZygoteInit的zygoteInit方法,如以上注释所言:handleSystemServerProcess是完成fork进程之后的工作,而ZygoteInit的zygoteInit方法是为了传递ZygoteArguments类型的mRemainingArgs变量内容给SystemServer,具体来看下zygoteInit的实现:

    public static final Runnable zygoteInit(int targetSdkVersion, String[] argv,
            ClassLoader classLoader) {
        if (RuntimeInit.DEBUG) {
            Slog.d(RuntimeInit.TAG, "RuntimeInit: Starting application from zygote");
        }

        Trace.traceBegin(Trace.TRACE_TAG_ACTIVITY_MANAGER, "ZygoteInit");
        RuntimeInit.redirectLogStreams();

        RuntimeInit.commonInit();
        ZygoteInit.nativeZygoteInit(); // 5
        return RuntimeInit.applicationInit(targetSdkVersion, argv, classLoader); // 6
    }

zygoteInit 方法接收三个参数,分别是targetSdkVersion,剩余参数,以及一个ClassLoder(对这个感兴趣可以返回上一个方法进行查看)。并且最终调用了代码5和代码6。代码5处执行的是一个native方法:

    private static final native void nativeZygoteInit();

实现在AndroidRuntime.cpp文件里:

static void com_android_internal_os_ZygoteInit_nativeZygoteInit(JNIEnv* env, jobject clazz)
{
    gCurRuntime->onZygoteInit();
}

这里调用了AndroidRuntime的onZygoteInit方法

    virtual void onZygoteInit()
    {
        sp proc = ProcessState::self();
        ALOGV("App process: starting thread pool.\n");
        proc->startThreadPool();
    }

这个方法定义在app_main.cpp中,proc是一个ProcessState类型的对象,这里调用startThreadPool函数来启动线程池,主要用来进行Binder进程间通信,这里就不做详细分析了。

我们重点来看代码6的逻辑实现:

    protected static Runnable applicationInit(int targetSdkVersion, String[] argv,
            ClassLoader classLoader) {
       ...
        // Remaining arguments are passed to the start class's static main
        return findStaticMain(args.startClass, args.startArgs, classLoader);
    }

applicationInit又调用了findStaticMain方法,而findStaticMain如注释所言是为了传递数据给SystemServer的main方法。

    protected static Runnable findStaticMain(String className, String[] argv,
            ClassLoader classLoader) {
        Class cl;

        try {
            cl = Class.forName(className, true, classLoader);
        } catch (ClassNotFoundException ex) {
            throw new RuntimeException(
                    "Missing class when invoking static main " + className,
                    ex);
        }

        Method m;
        try {
            m = cl.getMethod("main", new Class[] { String[].class }); // 7
        } catch (NoSuchMethodException ex) {
            throw new RuntimeException(
                    "Missing static main on " + className, ex);
        } catch (SecurityException ex) {
            throw new RuntimeException(
                    "Problem getting static main on " + className, ex);
        }

        int modifiers = m.getModifiers();
        if (! (Modifier.isStatic(modifiers) && Modifier.isPublic(modifiers))) { // 8
            throw new RuntimeException(
                    "Main method is not public and static on " + className);
        }

        /*
         * This throw gets caught in ZygoteInit.main(), which responds
         * by invoking the exception's run() method. This arrangement
         * clears up all the stack frames that were required in setting
         * up the process.
         */
        return new MethodAndArgsCaller(m, argv); // 9
    }

代码7利用反射拿到了SystemServer类的main方法,代码8处校验main方法,代码9返回一个Runnable类型的MethodAndArgsCaller对象,对象里面保存了方法和其他参数以及一个run方法

    static class MethodAndArgsCaller implements Runnable {
        /** method to call */
        private final Method mMethod;

        /** argument array */
        private final String[] mArgs;

        public MethodAndArgsCaller(Method method, String[] args) {
            mMethod = method;
            mArgs = args;
        }

        public void run() {
            try {
                mMethod.invoke(null, new Object[] { mArgs }); // 10
            } catch (IllegalAccessException ex) {
                throw new RuntimeException(ex);
            } catch (InvocationTargetException ex) {
                Throwable cause = ex.getCause();
                if (cause instanceof RuntimeException) {
                    throw (RuntimeException) cause;
                } else if (cause instanceof Error) {
                    throw (Error) cause;
                }
                throw new RuntimeException(ex);
            }
        }
    }

这个对象在在ZygoteInit.java的main方法中拿到,并执行这个run方法,即在执行10的时候,其实就调用了SystemServer类的main方法。那为什么不直接调用这个main方法,而是在这里返回一个对象呢,如注释所言:清理堆栈,即执行main方法之前看不到堆栈信息。而事实上,在调用main方法之前已经做了大量工作。再看下ZygoteInit.java的main方法:

            if (startSystemServer) {
                Runnable r = forkSystemServer(abiList, zygoteSocketName, zygoteServer);

                // {@code r == null} in the parent (zygote) process, and {@code r != null} in the
                // child (system_server) process.
                if (r != null) {
                    r.run();
                    return;
                }
            }

到此,SystemServer启动完成,而整个流程主要完成五件事情,分别是fork SystemServer进程、关闭SystemServer中的Socket、初始化Binder驱动程序以及调用SystemServer类的main方法,和处理SystemServer死亡后进行重启的相关工作。

后续

如果大家喜欢这篇文章,欢迎点赞!
如果想看更多 framework 方面的文章,欢迎关注!

你可能感兴趣的:(SystemServer启动和重启流程)