SystemServer启动和重启流程

序言

记录SystemServer启动过程以及crash后如何重启的。

流程

SystemServer 是由Zygote进程fork出来的位于ZygoteInit.java的main方法中。

            if (startSystemServer) {
                Runnable r = forkSystemServer(abiList, zygoteSocketName, zygoteServer);

                // {@code r == null} in the parent (zygote) process, and {@code r != null} in the
                // child (system_server) process.
                if (r != null) {
                    r.run();
                    return;
                }
            }

接下来我们分析下forkSystemServer这个方法：

        /* Hardcoded command line to start the system server */
        String args[] = { // 1
                "--setuid=1000",
                "--setgid=1000",
                "--setgroups=1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1018,1021,1023,"
                        + "1024,1032,1065,3001,3002,3003,3006,3007,3009,3010",
                "--capabilities=" + capabilities + "," + capabilities,
                "--nice-name=system_server",
                "--runtime-args",
                "--target-sdk-version=" + VMRuntime.SDK_VERSION_CUR_DEVELOPMENT,
                "com.android.server.SystemServer",
        };
        ZygoteArguments parsedArgs = null;

        int pid;

        try {
            ...
            /* Request to fork the system server process */
            pid = Zygote.forkSystemServer( // 2
                    parsedArgs.mUid, parsedArgs.mGid,
                    parsedArgs.mGids,
                    parsedArgs.mRuntimeFlags,
                    null,
                    parsedArgs.mPermittedCapabilities,
                    parsedArgs.mEffectiveCapabilities);
        } catch (IllegalArgumentException ex) {
            throw new RuntimeException(ex);
        }

        /* For child process */
        if (pid == 0) { 
            if (hasSecondZygote(abiList)) {
                waitForSecondaryZygote(socketName);
            }

            zygoteServer.closeServerSocket(); // 3
            return handleSystemServerProcess(parsedArgs); // 4
        }

代码1的地方，设置了system_server进程的uid、gid和groups(Process.java中有定义），以及进程名字"system_server"，接着调用代码2处Zygote的7个参数的forkSystemServer来fork一个进程，由于fork出来的子进程拥有所有父进程的东西，所以这里的pid会返回两个值，如果这个值是fork出来的子进程的pid，那么就证明此时代码运行在Zygote进程，如果pid == 0 ，那就证明此时代码运行在systemsever进程。如果运行在SystemServer进程，SystemServer继承了Zygote进程的所有内容，但是SystemServer进程又不用Zygote进程中的Socket，所以必须close它，如代码3所示。

接下来分析下代码2和代码4的相关逻辑，首先我们来看下代码2的源码：

    public static int forkSystemServer(int uid, int gid, int[] gids, int runtimeFlags,
            int[][] rlimits, long permittedCapabilities, long effectiveCapabilities) {
        ...
        int pid = nativeForkSystemServer(
                uid, gid, gids, runtimeFlags, rlimits,
                permittedCapabilities, effectiveCapabilities);
        ...
    }

forkSystemServer方法又调用了nativeForkSystemServer方法，从名称上可以看出，它是一个native方法：

    private static native int nativeForkSystemServer(int uid, int gid, int[] gids, int runtimeFlags,
            int[][] rlimits, long permittedCapabilities, long effectiveCapabilities);

我们看下它对应的jni方法。由于nativeForkSystemServer位于Zygote.java中，Zygote.java的路径为
frameworks/base/core/java/com/android/internal/os/，所以相对应的native方法位于frameworks/base/core/jni/中。而Zygote.java对应的jni的文件名是以包名+类名定义的，即com_android_internal_os_Zygote.cpp。而nativeForkSystemServer对应的jni方法的名字必须包括包名+类名+方法名，即

static jint com_android_internal_os_Zygote_nativeForkAndSpecialize(
        JNIEnv* env, jclass, jint uid, jint gid, jintArray gids,
        jint runtime_flags, jobjectArray rlimits,
        jint mount_external, jstring se_info, jstring nice_name,
        jintArray managed_fds_to_close, jintArray managed_fds_to_ignore, jboolean is_child_zygote,
        jstring instruction_set, jstring app_data_dir) {
    ...
    pid_t pid = ForkCommon(env, false, fds_to_close, fds_to_ignore);
    ...
}

nativeForkAndSpecialize又调用了ForkCommon方法，对应的实现如下：

// Utility routine to fork a process from the zygote.
static pid_t ForkCommon(JNIEnv* env, bool is_system_server,
                        const std::vector& fds_to_close,
                        const std::vector& fds_to_ignore) {
  SetSignalHandlers();
  ...
  pid_t pid = fork();
  ...
}

ForkCommon调用了两个重要的函数，一个是fork函数（它的作用是创建一个新的子进程这里fork出来的进程就是SystemServer进程），一个是SetSignalHandlers函数。

static void SetSignalHandlers() {
  struct sigaction sig_chld = {};
  sig_chld.sa_handler = SigChldHandler;

  if (sigaction(SIGCHLD, &sig_chld, nullptr) < 0) {
    ALOGW("Error setting SIGCHLD handler: %s", strerror(errno));
  }

  struct sigaction sig_hup = {};
  sig_hup.sa_handler = SIG_IGN;
  if (sigaction(SIGHUP, &sig_hup, nullptr) < 0) {
    ALOGW("Error setting SIGHUP handler: %s", strerror(errno));
  }
}

在SetSignalHandlers函数中调用SigChldHandler函数，此函数用来捕捉SigChld信号（SigChld属于linux的一种信号，在一个进程终止或者停止时，将SIGCHLD信号发送给其父进程。系统默认将忽略此信号。如果父进程希望被告知其子系统的这种状态，则应捕捉此信号。信号的捕捉函数中通常调用wait函数以取得进程ID和其终止状态），我们看下它的实现：

// This signal handler is for zygote mode, since the zygote must reap its children
static void SigChldHandler(int /*signal_number*/) {
  ...
  while ((pid = waitpid(-1, &status, WNOHANG)) > 0) {
    ...
    // If the just-crashed process is the system_server, bring down zygote
    // so that it is restarted by init and system server will be restarted
    // from there.
    if (pid == gSystemServerPid) {
      async_safe_format_log(ANDROID_LOG_ERROR, LOG_TAG,
                            "Exit zygote because system server (pid %d) has terminated", pid);
      kill(getpid(), SIGKILL);
    }

就像SIGCHLD信号的描述，SigChldHandler 利用一个死循环和一个waitpd函数来获取进程的ID和其终止状态，如果发现捕获的crash进程的pid是SystemServer进程，则通过getpid函数获取自己的pid，然后自己杀死自己。目的是同生共死，因为当Zygote进程死掉后，其父进程Init进程会检测到，就会重启其子进程Zygote进程，这样Zygote也会拉起SystemServer进程。

分析完了代码2 forkSystemServer的代码，我们再来看下代码4的handleSystemServerProcess的代码，实现如下：

    /**
     * Finish remaining work for the newly forked system server process.
     */
    private static Runnable handleSystemServerProcess(ZygoteArguments parsedArgs) {
        ...
            /*
             * Pass the remaining arguments to SystemServer.
             */
            return ZygoteInit.zygoteInit(parsedArgs.mTargetSdkVersion,
                    parsedArgs.mRemainingArgs, cl);
    }

handleSystemServerProcess又调用了ZygoteInit的zygoteInit方法，如以上注释所言：handleSystemServerProcess是完成fork进程之后的工作，而ZygoteInit的zygoteInit方法是为了传递ZygoteArguments类型的mRemainingArgs变量内容给SystemServer，具体来看下zygoteInit的实现：

    public static final Runnable zygoteInit(int targetSdkVersion, String[] argv,
            ClassLoader classLoader) {
        if (RuntimeInit.DEBUG) {
            Slog.d(RuntimeInit.TAG, "RuntimeInit: Starting application from zygote");
        }

        Trace.traceBegin(Trace.TRACE_TAG_ACTIVITY_MANAGER, "ZygoteInit");
        RuntimeInit.redirectLogStreams();

        RuntimeInit.commonInit();
        ZygoteInit.nativeZygoteInit(); // 5
        return RuntimeInit.applicationInit(targetSdkVersion, argv, classLoader); // 6
    }

zygoteInit 方法接收三个参数，分别是targetSdkVersion，剩余参数，以及一个ClassLoder（对这个感兴趣可以返回上一个方法进行查看）。并且最终调用了代码5和代码6。代码5处执行的是一个native方法：

    private static final native void nativeZygoteInit();

实现在AndroidRuntime.cpp文件里：

static void com_android_internal_os_ZygoteInit_nativeZygoteInit(JNIEnv* env, jobject clazz)
{
    gCurRuntime->onZygoteInit();
}

这里调用了AndroidRuntime的onZygoteInit方法

    virtual void onZygoteInit()
    {
        sp proc = ProcessState::self();
        ALOGV("App process: starting thread pool.\n");
        proc->startThreadPool();
    }

这个方法定义在app_main.cpp中，proc是一个ProcessState类型的对象，这里调用startThreadPool函数来启动线程池，主要用来进行Binder进程间通信，这里就不做详细分析了。

我们重点来看代码6的逻辑实现：

    protected static Runnable applicationInit(int targetSdkVersion, String[] argv,
            ClassLoader classLoader) {
       ...
        // Remaining arguments are passed to the start class's static main
        return findStaticMain(args.startClass, args.startArgs, classLoader);
    }

applicationInit又调用了findStaticMain方法，而findStaticMain如注释所言是为了传递数据给SystemServer的main方法。

    protected static Runnable findStaticMain(String className, String[] argv,
            ClassLoader classLoader) {
        Class cl;

        try {
            cl = Class.forName(className, true, classLoader);
        } catch (ClassNotFoundException ex) {
            throw new RuntimeException(
                    "Missing class when invoking static main " + className,
                    ex);
        }

        Method m;
        try {
            m = cl.getMethod("main", new Class[] { String[].class }); // 7
        } catch (NoSuchMethodException ex) {
            throw new RuntimeException(
                    "Missing static main on " + className, ex);
        } catch (SecurityException ex) {
            throw new RuntimeException(
                    "Problem getting static main on " + className, ex);
        }

        int modifiers = m.getModifiers();
        if (! (Modifier.isStatic(modifiers) && Modifier.isPublic(modifiers))) { // 8
            throw new RuntimeException(
                    "Main method is not public and static on " + className);
        }

        /*
         * This throw gets caught in ZygoteInit.main(), which responds
         * by invoking the exception's run() method. This arrangement
         * clears up all the stack frames that were required in setting
         * up the process.
         */
        return new MethodAndArgsCaller(m, argv); // 9
    }

代码7利用反射拿到了SystemServer类的main方法，代码8处校验main方法，代码9返回一个Runnable类型的MethodAndArgsCaller对象，对象里面保存了方法和其他参数以及一个run方法

    static class MethodAndArgsCaller implements Runnable {
        /** method to call */
        private final Method mMethod;

        /** argument array */
        private final String[] mArgs;

        public MethodAndArgsCaller(Method method, String[] args) {
            mMethod = method;
            mArgs = args;
        }

        public void run() {
            try {
                mMethod.invoke(null, new Object[] { mArgs }); // 10
            } catch (IllegalAccessException ex) {
                throw new RuntimeException(ex);
            } catch (InvocationTargetException ex) {
                Throwable cause = ex.getCause();
                if (cause instanceof RuntimeException) {
                    throw (RuntimeException) cause;
                } else if (cause instanceof Error) {
                    throw (Error) cause;
                }
                throw new RuntimeException(ex);
            }
        }
    }

这个对象在在ZygoteInit.java的main方法中拿到，并执行这个run方法，即在执行10的时候，其实就调用了SystemServer类的main方法。那为什么不直接调用这个main方法，而是在这里返回一个对象呢，如注释所言：清理堆栈，即执行main方法之前看不到堆栈信息。而事实上，在调用main方法之前已经做了大量工作。再看下ZygoteInit.java的main方法：

            if (startSystemServer) {
                Runnable r = forkSystemServer(abiList, zygoteSocketName, zygoteServer);

                // {@code r == null} in the parent (zygote) process, and {@code r != null} in the
                // child (system_server) process.
                if (r != null) {
                    r.run();
                    return;
                }
            }

到此，SystemServer启动完成，而整个流程主要完成五件事情，分别是fork SystemServer进程、关闭SystemServer中的Socket、初始化Binder驱动程序以及调用SystemServer类的main方法，和处理SystemServer死亡后进行重启的相关工作。

后续

如果大家喜欢这篇文章，欢迎点赞！
如果想看更多 framework 方面的文章，欢迎关注!

SystemServer启动和重启流程

序言

流程

后续

你可能感兴趣的:(SystemServer启动和重启流程)