Android系统启动学习记录

1. init进程启动

initshiAndroid系统用户空间的第1个进程，被赋予了很多重要职责，比如创建孵化器Zygote，属性服务等。由多个源文件组成

1.1 引入init进程

简单说明init进程的前几步

启动电源以及系统启动

电源按下时引导芯片代码从预定义处开始执行。加载引导程序BootLoader到RAM中，然后执行

引导程序BootLoader

是Android操作系统运行前的一个小程序，主要用于拉起系统OS并运行

Linux内核启动

此时设置缓存、被保护存储器、计划列表、加载驱动。内核完成系统设置后首先在系统文件中寻找init.rc文件，并启动init进程

init进程启动

主要用于初始化启动属性服务，也用来启动Zygote进程

1.2 init进程的入口函数

init进程的入口函数main，抽取部分代码，如下

main（第1部分）

int main(int argc, char** argv) {
    if (!strcmp(basename(argv[0]), "ueventd")) {
        return ueventd_main(argc, argv);
    }

    if (!strcmp(basename(argv[0]), "watchdogd")) {
        return watchdogd_main(argc, argv);
    }

    if (argc > 1 && !strcmp(argv[1], "subcontext")) {
        InitKernelLogging(argv);
        const BuiltinFunctionMap function_map;
        return SubcontextMain(argc, argv, &function_map);
    }

    if (REBOOT_BOOTLOADER_ON_PANIC) {
        InstallRebootSignalHandlers();
    }

    bool is_first_stage = (getenv("INIT_SECOND_STAGE") == nullptr);

    if (is_first_stage) {
        boot_clock::time_point start_time = boot_clock::now();

        // Clear the umask.
        umask(0);

        clearenv();
        setenv("PATH", _PATH_DEFPATH, 1);
        // Get the basic filesystem setup we need put together in the initramdisk
        // on / and then we'll let the rc file figure out the rest.
        mount("tmpfs", "/dev", "tmpfs", MS_NOSUID, "mode=0755");
        mkdir("/dev/pts", 0755);
        mkdir("/dev/socket", 0755);
        mount("devpts", "/dev/pts", "devpts", 0, NULL);
        #define MAKE_STR(x) __STRING(x)
        mount("proc", "/proc", "proc", 0, "hidepid=2,gid=" MAKE_STR(AID_READPROC));
        // Don't expose the raw commandline to unprivileged processes.
        chmod("/proc/cmdline", 0440);
        gid_t groups[] = { AID_READPROC };
        setgroups(arraysize(groups), groups);
        mount("sysfs", "/sys", "sysfs", 0, NULL);
        mount("selinuxfs", "/sys/fs/selinux", "selinuxfs", 0, NULL);

        mknod("/dev/kmsg", S_IFCHR | 0600, makedev(1, 11));

        if constexpr (WORLD_WRITABLE_KMSG) {
            mknod("/dev/kmsg_debug", S_IFCHR | 0622, makedev(1, 11));
        }

        mknod("/dev/random", S_IFCHR | 0666, makedev(1, 8));
        mknod("/dev/urandom", S_IFCHR | 0666, makedev(1, 9));

        // Mount staging areas for devices managed by vold
        // See storage config details at http://source.android.com/devices/storage/
        mount("tmpfs", "/mnt", "tmpfs", MS_NOEXEC | MS_NOSUID | MS_NODEV,
              "mode=0755,uid=0,gid=1000");
        // /mnt/vendor is used to mount vendor-specific partitions that can not be
        // part of the vendor partition, e.g. because they are mounted read-write.
        mkdir("/mnt/vendor", 0755);

        // Now that tmpfs is mounted on /dev and we have /dev/kmsg, we can actually
        // talk to the outside world...
        InitKernelLogging(argv);
···

上面多段if处都是函数调用，在变量is_first_stage为真时，会创建和挂载tmpfs、devpts、proc、sysfs和selinuxfs共5种文件系统，这些都是系统运行时目录，系统停止时会消失

main（第2部分）

···
property_init();
···
epoll_fd = epoll_create1(EPOLL_CLOEXEC);
    if (epoll_fd == -1) {
        PLOG(FATAL) << "epoll_create1 failed";
    }

    sigchld_handler_init();

    if (!IsRebootCapable()) {
        // If init does not have the CAP_SYS_BOOT capability, it is running in a container.
        // In that case, receiving SIGTERM will cause the system to shut down.
        InstallSigtermHandler();
    }

    property_load_boot_defaults();
    export_oem_lock_status();
    start_property_service();
    set_usb_controller();

    const BuiltinFunctionMap function_map;
    Action::set_function_map(&function_map);

    subcontexts = InitializeSubcontexts();

    ActionManager& am = ActionManager::GetInstance();
    ServiceList& sm = ServiceList::GetInstance();

    LoadBootScripts(am, sm);
···

开头property_init函数对属性进行了初始化

末尾处start_property_service函数启动属性服务

中间sigchld_handler_init函数用于设置子进程信号处理函数，主要用于防止init进程的子进程成为僵尸进程。为了防止僵尸进程的出现，系统会在子进程暂停和终止的时候发出SIGCHLD信号，这个函数就是用来接收信号的

结尾处LoadBootScripts解析init.rc文件，讲完main函数再分析

延申：僵尸进程

进程在OS中使用fork函数创建。父进程fork子进程，在子进程终止后，如果父进程不知道子进程已经终止，这时虽然子进程已经退出，到那时系统进程仍然保留了它的信息，这个子进程就被称作僵尸进程。由于系统进程表是有限的，如果被耗尽，则无法创建新进程

main（第3部分）

···
while (true) {
        // By default, sleep until something happens.
        int epoll_timeout_ms = -1;

        if (do_shutdown && !shutting_down) {
            do_shutdown = false;
            if (HandlePowerctlMessage(shutdown_command)) {
                shutting_down = true;
            }
        }

        if (!(waiting_for_prop || Service::is_exec_service_running())) {
            am.ExecuteOneCommand();
        }
        if (!(waiting_for_prop || Service::is_exec_service_running())) {
            if (!shutting_down) {
                auto next_process_restart_time = RestartProcesses();

                // If there's a process that needs restarting, wake up in time for that.
                if (next_process_restart_time) {
                    epoll_timeout_ms = std::chrono::ceil(
                                           *next_process_restart_time - boot_clock::now())
                                           .count();
                    if (epoll_timeout_ms < 0) epoll_timeout_ms = 0;
                }
            }

            // If there's more work to do, wake up again immediately.
            if (am.HasMoreCommands()) epoll_timeout_ms = 0;
        }

        epoll_event ev;
        int nr = TEMP_FAILURE_RETRY(epoll_wait(epoll_fd, &ev, 1, epoll_timeout_ms));
        if (nr == -1) {
            PLOG(ERROR) << "epoll_wait failed";
        } else if (nr == 1) {
            ((void (*)()) ev.data.ptr)();
        }
    }

    return 0;
}

中间处的RestartProcess函数就是用来重启Zygote的。因为假如init进程的子进程Zygote终止了，sigchld_handler_init函数内部经过层层调用最终会找到Zygote进程并移除所有的Zygote进程的信息，再重启Zygote服务的脚本，其他init进程子进程的原理也是类似的

1.3 解析init.rc

这是一个非常重要的配置文件，是由Android初始化语言编写的，主要包含5种类型的语句：Action、Command、Service、Option和Import，配置代码如下

init.rc

···
on init
    sysclktz 0

    # Mix device-specific information into the entropy pool
    copy /proc/cmdline /dev/urandom
    copy /default.prop /dev/urandom
···
on boot
    # basic network init
    ifup lo
    hostname localhost
    domainname localdomain
···

这里截取一部分，on init和on bool是Action类型的语句，格式如下

on  [&&]* //设置触发器
   
    //动作触发之后的命令

为了分析如何创建Zygote，我们主要查看Service类型语句，格式如下

service   []*  // <执行路径> <传递参数>

我们分析Zygote启动脚本则在init.zygoteXX.rc中定义，以64位处理器为例，分析zygote64.rc

zygote64.rc

service zygote /system/bin/app_process64 -Xzygote /system/bin --zygote --start-system-server
    class main
    priority -20
    user root
    group root readproc reserved_disk
    socket zygote stream 660 root system
    onrestart write /sys/android_power/request_state wake
    onrestart write /sys/power/state on
    onrestart restart audioserver
    onrestart restart cameraserver
    onrestart restart media
    onrestart restart netd
    onrestart restart wificond
    writepid /dev/cpuset/foreground/tasks

大概分析，Service用于通知，init进程创建名为zygote的进程，这个进程执行程序的路径为 /system/bin/app_process64，后面的代码是要传给app_process64的参数。class main指的是Zygote的classname为main

1.4 解析Service类型语句

init.rc中Action类型语句和Service类型语句都有相应的类来解析，Action类型采用ActionParser解析，Service类型采用ServiceParser解析，这里主要分析Zygote，所以只介绍ServiceParser。其实现代码在service.cpp中，解析Service用到两个函数：一个是ParseSection，会解析Service的rc文件，搭建Service的架子。另一个是ParselineSection，解析子项，代码如下

service

Result ServiceParser::ParseSection(std::vector&& args,
                                            const std::string& filename, int line) {
    if (args.size() < 3) {
        return Error() << "services must have a name and a program";
    }

    const std::string& name = args[1];
    if (!IsValidName(name)) {
        return Error() << "invalid service name '" << name << "'";
    }

    Subcontext* restart_action_subcontext = nullptr;
    if (subcontexts_) {
        for (auto& subcontext : *subcontexts_) {
            if (StartsWith(filename, subcontext.path_prefix())) {
                restart_action_subcontext = &subcontext;
                break;
            }
        }
    }

    std::vector str_args(args.begin() + 2, args.end());
    service_ = std::make_unique(name, restart_action_subcontext, str_args);
    return Success();
}

Result ServiceParser::ParseLineSection(std::vector&& args, int line) {
    return service_ ? service_->ParseLine(std::move(args)) : Success();
}

Result ServiceParser::EndSection() {
    if (service_) {
        Service* old_service = service_list_->FindService(service_->name());
        if (old_service) {
            if (!service_->is_override()) {
                return Error() << "ignored duplicate definition of service '" << service_->name()
                               << "'";
            }

            service_list_->RemoveService(*old_service);
            old_service = nullptr;
        }

        service_list_->AddService(std::move(service_));
    }

    return Success();
}

第1个if处，判断Service是否有name与可执行程序

第2个if处，检查Service的name是否有效

中间处std::make_unique函数构建一个Service对象

解析完数据后，最后调用EndSection函数，函数里又有AddService函数

service的AddService

void ServiceList::AddService(std::unique_ptr service) {
    services_.emplace_back(std::move(service));
}

此函数将Service对象添加进链表

1.5 init启动Zygote

接下来说说init如何启动Service，主要讲解启动Zygote这个Service，在init.rc中有如下配置代码

init.rc

···
on nonencrypted
    class_start main
    class_start late_start
···

其中class_start是一个是一个Command，对应的函数为do_class_start，启动了那些classname为main的Service，我们在上文知道Zygote的classname就是main，所以这里就是用来启动Zygote的。do_class_start函数在builtins.cpp中定义，如下

builtins的do_class_start

static Result do_class_start(const BuiltinArguments& args) {
    // Starting a class does not start services which are explicitly disabled.
    // They must  be started individually.
    for (const auto& service : ServiceList::GetInstance()) {
        if (service->classnames().count(args[1])) {
            if (auto result = service->StartIfNotDisabled(); !result) {
                LOG(ERROR) << "Could not start service '" << service->name()
                           << "' as part of class '" << args[1] << "': " << result.error();
            }
        }
    }
    return Success();
}

for循环会遍历ServiceList链表，找到classname为main的Zygote，并执行service.cpp里的StartIfNotDisabled函数，如下

service的StartIfNotDisabled

Result Service::StartIfNotDisabled() {
    if (!(flags_ & SVC_DISABLED)) {
        return Start();
    } else {
        flags_ |= SVC_DISABLED_START;
    }
    return Success();
}

第一个if处，如果Service没有在其对应的rc文件中设置disabled选项，则会调用Start函数启动Service，Zygote对应的init.zygote64.rc中并没有设置disabled选项，因此我们来查看Start函数

service的Start

Result Service::Start() {
    bool disabled = (flags_ & (SVC_DISABLED | SVC_RESET));
    // Starting a service removes it from the disabled or reset state and
    // immediately takes it out of the restarting state if it was in there.
    flags_ &= (~(SVC_DISABLED|SVC_RESTARTING|SVC_RESET|SVC_RESTART|SVC_DISABLED_START));

    // Running processes require no additional work --- if they're in the
    // process of exiting, we've ensured that they will immediately restart
    // on exit, unless they are ONESHOT. For ONESHOT service, if it's in
    // stopping status, we just set SVC_RESTART flag so it will get restarted
    // in Reap().
    if (flags_ & SVC_RUNNING) {
        if ((flags_ & SVC_ONESHOT) && disabled) {
            flags_ |= SVC_RESTART;
        }
        // It is not an error to try to start a service that is already running.
        return Success();
    }

    bool needs_console = (flags_ & SVC_CONSOLE);
    if (needs_console) {
        if (console_.empty()) {
            console_ = default_console;
        }

        // Make sure that open call succeeds to ensure a console driver is
        // properly registered for the device node
        int console_fd = open(console_.c_str(), O_RDWR | O_CLOEXEC);
        if (console_fd < 0) {
            flags_ |= SVC_DISABLED;
            return ErrnoError() << "Couldn't open console '" << console_ << "'";
        }
        close(console_fd);
    }

    struct stat sb;
    if (stat(args_[0].c_str(), &sb) == -1) {
        flags_ |= SVC_DISABLED;
        return ErrnoError() << "Cannot find '" << args_[0] << "'";
    }
···
    pid_t pid = -1;
    if (namespace_flags_) {
        pid = clone(nullptr, nullptr, namespace_flags_ | SIGCHLD, nullptr);
    } else {
        pid = fork();
    }

    if (pid == 0) {
        umask(077);
···
        if (!ExpandArgsAndExecv(args_)) {
            PLOG(ERROR) << "cannot execve('" << args_[0] << "')";
        }

        _exit(127);
    }
···
   return Success();
}

第一个if处判断Service是否已经运行，如果运行则不再启动

如果上面程序一直都没有启动子进程，末尾处pid = fork()时就会创建子进程并返回pid值，如果pid值为0，说明当前代码逻辑在子进程运行

最后一个if处，调用ExpandArgsExecv函数，为真时则启动子进程，并进入该Service的main函数，如果该Service是Zygote，上文我们知道Zygote的执行路径为 /system/bin/app_process64，对应的文件为app_main.cpp，这样就会进入app_main.cpp的main函数，也就是Zygote的main函数，如下

app_main

int main(int argc, char* const argv[])
{
···
if (zygote){ 
        runtime.start("com.android.internal.os.ZygoteInit", args, zygote); 
    } else if (className) {
        runtime.start("com.android.internal.os.RuntimeInit", args, zygote); 
    } else { 
        fprintf(stderr, "Error: no class name or --zygote supplied.\n"); 
        app_usage(); 
        LOG_ALWAYS_FATAL("app_process: no class name or --zygote supplied."); 
    } 
}

可以看到runtime的start函数启动Zygote，由此，Zygote正式启动

1.6 属性服务

Windows的注册表管理器中采用键值对形式记录用户软件的一些信息，即使系统重启，也可以根据记录进行初始化。Android的属性服务即是类似的机制

init进程启动时会启动属性服务，并为其分配内存，用来存储属性，需要时则直接读取。在上文中我们提到init.cpp的main函数中与属性服务相关的代码如下

init的main

property_init();
start_property_service();

用于初始化属性服务和启动属性服务，首先我们来看属性服务配置的初始化和启动

属性服务配置的初始化和启动

property_init函数的具体实现如下，在property_service.cpp文件中

property_service的property_init

void property_init() {
    mkdir("/dev/__properties__", S_IRWXU | S_IXGRP | S_IXOTH);
    CreateSerializedPropertyInfo();
    if (__system_property_area_init()) {
        LOG(FATAL) << "Failed to initialize property area";
    }
    if (!property_info_area.LoadDefaultPath()) {
        LOG(FATAL) << "Failed to load serialized property info file";
    }
}

__system_property_area_init用来初始化属性内存区域，接下来查看start_property_service函数的具体代码

property_service的start_property_service

void start_property_service() {
    selinux_callback cb;
    cb.func_audit = SelinuxAuditCallback;
    selinux_set_callback(SELINUX_CB_AUDIT, cb);

    property_set("ro.property_service.version", "2");

    property_set_fd = CreateSocket(PROP_SERVICE_NAME, SOCK_STREAM | SOCK_CLOEXEC | SOCK_NONBLOCK,
                                   false, 0666, 0, 0, nullptr);
    if (property_set_fd == -1) {
        PLOG(FATAL) << "start_property_service socket creation failed";
    }

    listen(property_set_fd, 8);

    register_epoll_handler(property_set_fd, handle_property_set_fd);
}

CreateSocket函数创建了非阻塞的Socket，下面调用listen方法对参数property_set_fd进行监听，这样Socket就成为server，也就是属性服务

listen方法的第二个参数8意味着属性服务最多可以同时为8个试图设置属性服务的用户提供服务

最后一行用epoll监听property_set_fd，当有数据到来时，init进程将调用handle_property_set_fd来进行处理

在Linux新内核中，epoll用来替换select，是为了处理大量文件描述符而做了改进的poll，底层为红黑树，查找效率高，显著增强CPU利用率，只有当少量数据时，epoll效率才和select差不多

服务处理客户端请求

上面我们了解了，属性服务接收到客户端请求时，会调用handle_property_set_fd函数进行处理

property_service的handle_property_set_fd

static void handle_property_set_fd() {
···
    switch (cmd) {
    case PROP_MSG_SETPROP: {
        char prop_name[PROP_NAME_MAX];
        char prop_value[PROP_VALUE_MAX];

        if (!socket.RecvChars(prop_name, PROP_NAME_MAX, &timeout_ms) ||
            !socket.RecvChars(prop_value, PROP_VALUE_MAX, &timeout_ms)) {
          PLOG(ERROR) << "sys_prop(PROP_MSG_SETPROP): error while reading name/value from the socket";
          return;
        }

        prop_name[PROP_NAME_MAX-1] = 0;
        prop_value[PROP_VALUE_MAX-1] = 0;

        const auto& cr = socket.cred();
        std::string error;
        uint32_t result =
            HandlePropertySet(prop_name, prop_value, socket.source_context(), cr, &error);
        if (result != PROP_SUCCESS) {
            LOG(ERROR) << "Unable to set property '" << prop_name << "' to '" << prop_value
                       << "' from uid:" << cr.uid << " gid:" << cr.gid << " pid:" << cr.pid << ": "
                       << error;
        }

        break;
      }
···

最后一个if处的上一句，HandlePropertySet函数做了封装，如下

property_service的HandlePropertySet

uint32_t HandlePropertySet(const std::string& name, const std::string& value,
                           const std::string& source_context, const ucred& cr, std::string* error) {
···
    if (StartsWith(name, "ctl.")) {
        if (!CheckControlPropertyPerms(name, value, source_context, cr)) {
            *error = StringPrintf("Invalid permissions to perform '%s' on '%s'", name.c_str() + 4,
                                  value.c_str());
            return PROP_ERROR_HANDLE_CONTROL_MESSAGE;
        }

        HandleControlMessage(name.c_str() + 4, value, cr.pid);
        return PROP_SUCCESS;
    }

    const char* target_context = nullptr;
    const char* type = nullptr;
    property_info_area->GetPropertyInfo(name.c_str(), &target_context, &type);

    if (!CheckMacPerms(name, target_context, source_context.c_str(), cr)) {
        *error = "SELinux permission check failed";
        return PROP_ERROR_PERMISSION_DENIED;
    }

    if (type == nullptr || !CheckType(type, value)) {
        *error = StringPrintf("Property type check failed, value doesn't match expected type '%s'",
                              (type ?: "(null)"));
        return PROP_ERROR_INVALID_VALUE;
    }
···
    if (name == "selinux.restorecon_recursive") {
        return PropertySetAsync(name, value, RestoreconRecursiveAsync, error);
    }

    return PropertySet(name, value, error);
}

系统属性分为控制属性和普通属性。控制属性用来执行一些命令，比如开机动画。上面代码第1个if和第2个if处，检查是否以ctl.开头，说明是控制属性，紧接着HandleControlMessage设置控制属性

第3个if处检查是否是普通属性，是的话最后会调用PropertySet函数对普通属性进行修改。这里分析如何修改普通属性

property_service的PropertySet

static uint32_t PropertySet(const std::string& name, const std::string& value, std::string* error) {
    size_t valuelen = value.size();

    if (!IsLegalPropertyName(name)) {
        *error = "Illegal property name";
        return PROP_ERROR_INVALID_NAME;
    }
···
    prop_info* pi = (prop_info*) __system_property_find(name.c_str());
    if (pi != nullptr) {
        // ro.* properties are actually "write-once".
        if (StartsWith(name, "ro.")) {
            *error = "Read-only property was already set";
            return PROP_ERROR_READ_ONLY_PROPERTY;
        }

        __system_property_update(pi, value.c_str(), valuelen);
    } else {
        int rc = __system_property_add(name.c_str(), name.size(), value.c_str(), valuelen);
        if (rc < 0) {
            *error = "__system_property_add failed";
            return PROP_ERROR_SET_FAILED;
        }
    }

    // Don't write properties to disk until after we have read all default
    // properties to prevent them from being overwritten by default values.
    if (persistent_properties_loaded && StartsWith(name, "persist.")) {
        WritePersistentProperty(name, value);
    }
    property_changed(name, value);
    return PROP_SUCCESS;
}

第1个if处，判断属性是否合法，下面的(prop_info*) __system_property_find函数用于从属性存储空间查找该属性

第2个if处，为真代表属性存在，以ro.开头表示只读不能修改，直接返回

第3个if下面__system_property_update()表示如果属性存在就更新属性值，紧接着else块里的逻辑表示如果不存在则添加该属性

最后一个if处，是对以persist.开头的属性进行了处理

1.7 init进程启动总结

总的来说做了下面三件事

创建和挂载启动所需的文件目录
初始化和启动属性服务
解析init.rc配置文件并启动Zygote进程

2. Zygote进程启动过程

了解其启动过程，先了解Zygote是什么

2.1 Zygote概述

Zygote，亦称孵化器。DVM、ART、应用程序进程以及运行系统的关键服务的SystemServer都是他创建的。它通过fork形式创建这些进程。由于Zygote启动时会创建DVM或ART，因此通过fork创建的应用程序进程和SystemServer进程可以在内部获取一个DVM或ART的实例副本

从上文我们知道Zygote是在init进程启动时创建的，原名就是app_process，在Android.mk中定义。Zygote进程启动后，Linux系统下的pctrl系统会调用app_process，将其名称替换为Zygote

2.2 Zygote启动脚本

在init.rc文件中采用Import语句来引入Zygote启动脚本，这些启动脚本都是由Android初始化语言来编写的：

import  /init.${ro.zygote}.rc

所以init.rc并不会直接引入一个固定文件，而是根据ro.zygote的内容来引入不同文件的。由32位和64位程序的不同，ro.zygote的属性取值有以下四种

init.zygote32.rc
init.zygote32_64.rc
init.zygote64.rc
init.zygote64_32.rc

为了更好理解这些启动脚本，我们回顾一下上文提到的Service类型语句的格式

service   []*  // <执行路径> <传递参数>

接下来分别了解这些启动脚本

init.zygote32.rc

只支持32位，内容如下

service zygote /system/bin/app_process -Xzygote /system/bin --zygote --start-system-server
    class main
    priority -20
    user root
    group root readproc reserved_disk
    socket zygote stream 660 root system
    onrestart write /sys/android_power/request_state wake
    onrestart write /sys/power/state on
    onrestart restart audioserver
    onrestart restart cameraserver
    onrestart restart media
    onrestart restart netd
    onrestart restart wificond
    writepid /dev/cpuset/foreground/tasks

进程名称为zygote，执行程序为app_process，classname为main，如果audioserver、cameraserver、media等进程终止了，就需要进行重启

init.zygote32_64.rc

service zygote /system/bin/app_process32 -Xzygote /system/bin --zygote --start-system-server --socket-name=zygote
    class main
    priority -20
    user root
    group root readproc reserved_disk
    socket zygote stream 660 root system
    onrestart write /sys/android_power/request_state wake
    onrestart write /sys/power/state on
    onrestart restart audioserver
    onrestart restart cameraserver
    onrestart restart media
    onrestart restart netd
    onrestart restart wificond
    writepid /dev/cpuset/foreground/tasks

service zygote_secondary /system/bin/app_process64 -Xzygote /system/bin --zygote --socket-name=zygote_secondary
    class main
    priority -20
    user root
    group root readproc reserved_disk
    socket zygote_secondary stream 660 root system
    onrestart restart zygote
    writepid /dev/cpuset/foreground/tasks

两个Service语句，说明启动两个Zygote进程，一个名为zygote，启动程序为app_process32，作为主模式。另一个名称为zygote_secondary，执行程序为app_process64，作为辅模式

另外64位处理与上面两种类似，这里不赘述了

2.3 Zygote进程启动过程介绍

由上面可知init启动Zygote时主要调用app_main.cpp的main函数中的runtime的start方法来启动Zygote进程的。时序图如下，我们就从app_main.cpp的main函数开始分析

image.png

app_main的main

int main(int argc, char* const argv[])
{
···
    while (i < argc) {
        const char* arg = argv[i++];
        if (strcmp(arg, "--zygote") == 0) {
            zygote = true;
            niceName = ZYGOTE_NICE_NAME;
        } else if (strcmp(arg, "--start-system-server") == 0) {
            startSystemServer = true;
        } else if (strcmp(arg, "--application") == 0) {
            application = true;
        } else if (strncmp(arg, "--nice-name=", 12) == 0) {
            niceName.setTo(arg + 12);
        } else if (strncmp(arg, "--", 2) != 0) {
            className.setTo(arg);
            break;
        } else {
            --i;
            break;
        }
    }
···
    if (!niceName.isEmpty()) {
        runtime.setArgv0(niceName.string(), true /* setProcName */);
    }

    if (zygote) {
        runtime.start("com.android.internal.os.ZygoteInit", args, zygote);
    } else if (className) {
        runtime.start("com.android.internal.os.RuntimeInit", args, zygote);
    } else {
        fprintf(stderr, "Error: no class name or --zygote supplied.\n");
        app_usage();
        LOG_ALWAYS_FATAL("app_process: no class name or --zygote supplied.");
    }
}

Zygote进程都是通过fork自身来创建子进程的，这样Zygote进程和其子进程都可以进入app_main.cpp的main函数，因此main函数中为了区分当前运行在哪个进程，在第1个if处判断参数是否包含--zygote，若包含则说明main函数运行在Zygote进程中，将其设置为true。同理下面都是参数的判断，为真则同样为它们设置参数

最后一个if处上文已经分析过，如果为true，就调用runtime的start函数，在AndroidRuntime.cpp文件中

AndroidRuntime的start

void AndroidRuntime::start(const char* className, const Vector& options, bool zygote)
{
···
    /* start the virtual machine */
    JniInvocation jni_invocation;
    jni_invocation.Init(NULL);
    JNIEnv* env;
    if (startVm(&mJavaVM, &env, zygote, primary_zygote) != 0) {
        return;
    }
    onVmCreated(env);

    /*
     * Register android functions.
     */
    if (startReg(env) < 0) {
        ALOGE("Unable to register all android natives\n");
        return;
    }

    /*
     * We want to call main() with a String array with arguments in it.
     * At present we have two arguments, the class name and an option string.
     * Create an array to hold them.
     */
    jclass stringClass;
    jobjectArray strArray;
    jstring classNameStr;

    stringClass = env->FindClass("java/lang/String");
    assert(stringClass != NULL);
    strArray = env->NewObjectArray(options.size() + 1, stringClass, NULL);
    assert(strArray != NULL);
    classNameStr = env->NewStringUTF(className);
    assert(classNameStr != NULL);
    env->SetObjectArrayElement(strArray, 0, classNameStr);

    for (size_t i = 0; i < options.size(); ++i) {
        jstring optionsStr = env->NewStringUTF(options.itemAt(i).string());
        assert(optionsStr != NULL);
        env->SetObjectArrayElement(strArray, i + 1, optionsStr);
    }

    /*
     * Start VM.  This thread becomes the main thread of the VM, and will
     * not return until the VM exits.
     */
    char* slashClassName = toSlashClassName(className != NULL ? className : "");
    jclass startClass = env->FindClass(slashClassName);
    if (startClass == NULL) {
        ALOGE("JavaVM unable to locate class '%s'\n", slashClassName);
        /* keep going */
    } else {
        jmethodID startMeth = env->GetStaticMethodID(startClass, "main",
            "([Ljava/lang/String;)V");
        if (startMeth == NULL) {
            ALOGE("JavaVM unable to find main() in '%s'\n", className);
            /* keep going */
        } else {
            env->CallStaticVoidMethod(startClass, startMeth, strArray);

#if 0
            if (env->ExceptionCheck())
                threadExitUncaughtException(env);
#endif
        }
    }
···
}

可以看到开头调用startVm函数来创建Java虚拟机，第2个if处的startReg函数为Java虚拟机注册JNI方法

第1个for循环上面的classNameStr的值是com.android.internal.os.ZygoteInit，并紧接着在for循环下面赋值给slashClassName

之后又以slashClassName为参数调用FindClass方法找到ZygoteInit，又在第1个else块中找到main方法。最终会在第2个else块中通过JNI调用main方法

使用JNI是因为ZygoteInit的main方法是用Java写的，当前逻辑在Native中，需要通过JNI调用Java，这样就从Native进入了Java框架层，即Zygote开创了Java框架层，该main方法如下

ZygoteInit的main

public static void main(String argv[]) {
        ZygoteServer zygoteServer = new ZygoteServer();
···
            zygoteServer.registerServerSocketFromEnv(socketName);
            // In some configurations, we avoid preloading resources and classes eagerly.
            // In such cases, we will preload things prior to our first fork.
            if (!enableLazyPreload) {
                bootTimingsTraceLog.traceBegin("ZygotePreload");
                EventLog.writeEvent(LOG_BOOT_PROGRESS_PRELOAD_START,
                    SystemClock.uptimeMillis());
                preload(bootTimingsTraceLog);
                EventLog.writeEvent(LOG_BOOT_PROGRESS_PRELOAD_END,
                    SystemClock.uptimeMillis());
                bootTimingsTraceLog.traceEnd(); // ZygotePreload
            } else {
                Zygote.resetNicePriority();
            }
···
            if (startSystemServer) {
                Runnable r = forkSystemServer(abiList, socketName, zygoteServer);

                // {@code r == null} in the parent (zygote) process, and {@code r != null} in the
                // child (system_server) process.
                if (r != null) {
                    r.run();
                    return;
                }
            }

            Log.i(TAG, "Accepting command socket connections");

            // The select loop returns early in the child process after a fork and
            // loops forever in the zygote.
            caller = zygoteServer.runSelectLoop(abiList);
        } catch (Throwable ex) {
            Log.e(TAG, "System zygote died with exception", ex);
            throw ex;
        } finally {
            zygoteServer.closeServerSocket();
        }

        // We're in the child process and have exited the select loop. Proceed to execute the
        // command.
        if (caller != null) {
            caller.run();
        }
    }

开头处，zygoteServer调用registerServerSocketFromEnv创建一个Server端的Socket

第1个if处，preload方法预加载类和资源

第2个if处，forkSystemServer创建SystemServer进程，这样系统的服务也会由SystemServer启动

第1个catch语句上调用runSelectLoop方法等待AMS请求创建新的应用程序进程

由此，我们可知ZygoteInit的main方法主要做了4件事

创建一个Server端的Socket
预加载类资源
启动SystemServer进程
等待AMS请求创建新的应用程序进程

这里我们主要分析第1，3，4步

第1步创建一个Server端的Socket

看看registerServerSocketFromEnv做了什么

ZygoteServer的registerServerSocketFromEnv

void registerServerSocketFromEnv(String socketName) {
        if (mServerSocket == null) {
            int fileDesc;
            final String fullSocketName = ANDROID_SOCKET_PREFIX + socketName;
            try {
                String env = System.getenv(fullSocketName);
                fileDesc = Integer.parseInt(env);
            } catch (RuntimeException ex) {
                throw new RuntimeException(fullSocketName + " unset or invalid", ex);
            }

            try {
                FileDescriptor fd = new FileDescriptor();
                fd.setInt$(fileDesc);
                mServerSocket = new LocalServerSocket(fd);
                mCloseSocketFd = true;
            } catch (IOException ex) {
                throw new RuntimeException(
                        "Error binding to local socket '" + fileDesc + "'", ex);
            }
        }
    }

开头处拼接SOCKET名称，其中ANDROID_SOCKET_PREFIX的值为ANDROID_SOCKET_，socketName为zygote，因此fullSocketName的值为ANDROID_SOCKET_zygote

第1个if处，将fullSocketName转为环境变量值，并紧接着parseInt转换为文件描述符参数

第2个try处，创建文件操作符，传入上述文件描述符参数。紧接着创建LocalServerSocket，也就是服务端Socket，将文件操作符作为参数传入

在Zygote进程将SystemServer进程启动后，就会在这个服务器端的Socket上等待AMS请求Zygote进程来创建新的应用程序进程

第3步启动SystemServer进程

看看forkSystemServer做了什么

ZygoteInit的forkSystemServer

private static Runnable forkSystemServer(String abiList, String socketName,
            ZygoteServer zygoteServer) {
···        
        /* Hardcoded command line to start the system server */
        String[] args = {
                "--setuid=1000",
                "--setgid=1000",
                "--setgroups=1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1018,1021,1023,"
                        + "1024,1032,1065,3001,3002,3003,3006,3007,3009,3010,3011",
                "--capabilities=" + capabilities + "," + capabilities,
                "--nice-name=system_server",
                "--runtime-args",
                "--target-sdk-version=" + VMRuntime.SDK_VERSION_CUR_DEVELOPMENT,
                "com.android.server.SystemServer",
        };
        ZygoteArguments parsedArgs;

        int pid;

        try {
            ZygoteCommandBuffer commandBuffer = new ZygoteCommandBuffer(args);
            try {
                parsedArgs = ZygoteArguments.getInstance(commandBuffer);
            } catch (EOFException e) {
                throw new AssertionError("Unexpected argument error for forking system server", e);
            }
···
            /* Request to fork the system server process */
            pid = Zygote.forkSystemServer(
                    parsedArgs.mUid, parsedArgs.mGid,
                    parsedArgs.mGids,
                    parsedArgs.mRuntimeFlags,
                    null,
                    parsedArgs.mPermittedCapabilities,
                    parsedArgs.mEffectiveCapabilities);
        } catch (IllegalArgumentException ex) {
            throw new RuntimeException(ex);
        }

        /* For child process */
        if (pid == 0) {
            if (hasSecondZygote(abiList)) {
                waitForSecondaryZygote(socketName);
            }

            zygoteServer.closeServerSocket();
            return handleSystemServerProcess(parsedArgs);
        }

        return null;
    }

最开始有个args数组，保存启动SystemServer的参数

第2个try处，用parsedArgs将args数组封装并供给下面的Zygote调用forkSystemServer方法，其内部最终会通过fork函数在当前进程里创建一个子进程，也就是SystemServer进程

如果其返回的pid为0，则代表当前代码运行在新创建的子进程中，通过return的handleSystemServerProcess来处理SystemServer进程

第4步等待AMS请求创建新的应用程序进程

ZygoteServer的runSelectLoop

启动SystemServer进程后，会执行runSelectLoop方法，如下

Runnable runSelectLoop(String abiList) {
        ArrayList fds = new ArrayList();
        ArrayList peers = new ArrayList();

        fds.add(mServerSocket.getFileDescriptor());
        peers.add(null);

        while (true) {
            StructPollfd[] pollFds = new StructPollfd[fds.size()];
            for (int i = 0; i < pollFds.length; ++i) {
                pollFds[i] = new StructPollfd();
                pollFds[i].fd = fds.get(i);
                pollFds[i].events = (short) POLLIN;
            }
            try {
                Os.poll(pollFds, -1);
            } catch (ErrnoException ex) {
                throw new RuntimeException("poll failed", ex);
            }
            for (int i = pollFds.length - 1; i >= 0; --i) {
                if ((pollFds[i].revents & POLLIN) == 0) {
                    continue;
                }

                if (i == 0) {
                    ZygoteConnection newPeer = acceptCommandPeer(abiList);
                    peers.add(newPeer);
                    fds.add(newPeer.getFileDesciptor());
                } else {
                    try {
                        ZygoteConnection connection = peers.get(i);
                        final Runnable command = connection.processOneCommand(this);

                        if (mIsForkChild) {
                            // We're in the child. We should always have a command to run at this
                            // stage if processOneCommand hasn't called "exec".
                            if (command == null) {
                                throw new IllegalStateException("command == null");
                            }

                            return command;
                        } else {
                            // We're in the server - we should never have any commands to run.
                            if (command != null) {
                                throw new IllegalStateException("command != null");
                            }

                            // We don't know whether the remote side of the socket was closed or
                            // not until we attempt to read from it from processOneCommand. This shows up as
                            // a regular POLLIN event in our regular processing loop.
                            if (connection.isClosedByPeer()) {
                                connection.closeSocket();
                                peers.remove(i);
                                fds.remove(i);
                            }
                        }
                    } catch (Exception e) {
                        if (!mIsForkChild) {
                            // We're in the server so any exception here is one that has taken place
                            // pre-fork while processing commands or reading / writing from the
                            // control socket. Make a loud noise about any such exceptions so that
                            // we know exactly what failed and why.

                            Slog.e(TAG, "Exception executing zygote command: ", e);

                            // Make sure the socket is closed so that the other end knows immediately
                            // that something has gone wrong and doesn't time out waiting for a
                            // response.
                            ZygoteConnection conn = peers.remove(i);
                            conn.closeSocket();

                            fds.remove(i);
                        } else {
                            // We're in the child so any exception caught here has happened post
                            // fork and before we execute ActivityThread.main (or any other main()
                            // method). Log the details of the exception and bring down the process.
                            Log.e(TAG, "Caught post-fork exception in child process.", e);
                            throw e;
                        }
                    } finally {
                        // Reset the child flag, in the event that the child process is a child-
                        // zygote. The flag will not be consulted this loop pass after the Runnable
                        // is returned.
                        mIsForkChild = false;
                    }
                }
            }
        }
    }

开头创建了两个集合，紧接着添加进一个mServerSocket，就是我们上面第一步创建的服务端Socket，进入while循环无限等待AMS的请求

第1个for循环处，通过遍历fds存储的信息转移到pollFds数组中

第2个for循环处，又遍历pollFds数组，如果i为0，说明服务器和Socket链接上了，即AMS与Zygote进程建立了链接

第2个for循环的第2个if处，通过acceptCommandPeer方法得到ZygoteConnection类并添加到Socket链接列表peers中，接着将ZygoteConnection的fd添加到fds列表中，以便接收AMS发送的请求

如果i不为0，下面则分别进行了在子线程或服务端的异常处理

2.4 Zygote进程启动总结

Zygote进程启动共做了如下几件事

创建AppRuntime并调用其start方法，启动Zygote进程
创建Java虚拟机并为Java虚拟机注册JNI方法
通过JNI调用ZygoteInit的main函数进入Zygote的Java框架层
通过registerZygoteSocket方法创建服务器端Socket，并通过runSelectLoop方法等待AMS的请求来创建新的应用程序进程
启动SystemServer进程

3. SystemServer处理过程

SystemServer进程主要用于创建系统服务，我们熟知的AMS、WMS和PMS都是由它创建的

3.1 Zygote处理SystemServer进程

上文讲解了Zygote启动SystemServer的进程，接下来学习Zygote是如何处理SystemServer进程的，先看看时序图

image.png

上文提到过，在ZygoteInit的forkSystemServer方法中，启动了SystemServer进程

ZygoteInit的forkSystemServer

private static Runnable forkSystemServer(String abiList, String socketName,
            ZygoteServer zygoteServer) {
···
    if (pid == 0) {
            if (hasSecondZygote(abiList)) {
                waitForSecondaryZygote(socketName);
            }

            zygoteServer.closeServerSocket();
            return handleSystemServerProcess(parsedArgs);
        }

        return null;
    }

SystemServer进程复制了Zygote进程的地址空间，因此也会用到Zygote进程创建的Socket，但是由于Socket对SystemServer没有用处，所以需要调用closeServerSocket关闭该Socket，紧接着调用handleSystemServerProcess方法

ZygoteInit的handleSystemServerProcess

private static Runnable handleSystemServerProcess(ZygoteConnection.Arguments parsedArgs) {
···
        if (parsedArgs.invokeWith != null) {
···
        } else {
            ClassLoader cl = null;
            if (systemServerClasspath != null) {
                cl = createPathClassLoader(systemServerClasspath, parsedArgs.targetSdkVersion);

                Thread.currentThread().setContextClassLoader(cl);
            }

            /*
             * Pass the remaining arguments to SystemServer.
             */
            return ZygoteInit.zygoteInit(parsedArgs.targetSdkVersion, parsedArgs.remainingArgs, cl);
        }

        /* should never reach here */
    }

在else块中调用createPathClassLoader方法创建PathClassLoader，最后返回了zygoteInit方法，代码如下

ZygoteInit的zygoteInit

public static final Runnable zygoteInit(int targetSdkVersion, String[] argv, ClassLoader classLoader) {
        if (RuntimeInit.DEBUG) {
            Slog.d(RuntimeInit.TAG, "RuntimeInit: Starting application from zygote");
        }

        Trace.traceBegin(Trace.TRACE_TAG_ACTIVITY_MANAGER, "ZygoteInit");
        RuntimeInit.redirectLogStreams();

        RuntimeInit.commonInit();
        ZygoteInit.nativeZygoteInit();
        return RuntimeInit.applicationInit(targetSdkVersion, argv, classLoader);
    }

return的上一句调用nativeZygoteInit方法，用于启动Binder线程池，这样，SystemServer进程就可以使用Binder与其他进程进行通信

return语句调用的方法用于进入SystemServer的main方法

下面展开介绍这两部分

启动Binder线程池

nativeZygoteInit是一个Native方法，因此我们先了解其对应的JNI文件，即AndroidRuntime.cpp，如下

AndroidRuntime的ZygoteInit

int register_com_android_internal_os_ZygoteInit_nativeZygoteInit(JNIEnv* env)
{
    const JNINativeMethod methods[] = {
        { "nativeZygoteInit", "()V",
            (void*) com_android_internal_os_ZygoteInit_nativeZygoteInit },
    };
    return jniRegisterNativeMethods(env, "com/android/internal/os/ZygoteInit",
        methods, NELEM(methods));
}

通过JNI的gMethods数组，可以看出nativeZygoteInit方法对应的是JNI文件AndroidRuntime.cpp的com_android_internal_os_ZygoteInit_nativeZygoteInit函数，如下

AndroidRuntime的nativeZygoteInit

static void com_android_internal_os_ZygoteInit_nativeZygoteInit(JNIEnv* env, jobject clazz)
{
    gCurRuntime->onZygoteInit();
}

这里gCurRuntime是AndroidRuntime类型的指针，具体指向AndroidRuntime的子类AppRuntime，它在app_main.cpp中定义，我们来看AppRuntime的onZygoteInit方法，如下

app_main的onZygoteInit

virtual void onZygoteInit()
    {
        sp proc = ProcessState::self();
        ALOGV("App process: starting thread pool.\n");
        proc->startThreadPool();
    }

startThreadPool用于启动一个Binder线程池，这样SystemServer进程就可以使用Binder与其他进程进行通信

所以我们知道ZygoteInit的nativeZygoteInit函数主要用于启动Binder线程池

进入SystemServer的main方法

回到ZygoteInit的zygoteInit函数的代码，最后return的是RuntimeInit的
applicationInit函数，代码如下

RuntimeInit的applicationInit

protected static Runnable applicationInit(int targetSdkVersion, long[] disabledCompatChanges,
            String[] argv, ClassLoader classLoader) {
        // If the application calls System.exit(), terminate the process
        // immediately without running any shutdown hooks.  It is not possible to
        // shutdown an Android application gracefully.  Among other things, the
        // Android runtime shutdown hooks close the Binder driver, which can cause
        // leftover running threads to crash before the process actually exits.
        nativeSetExitWithoutCleanup(true);

        VMRuntime.getRuntime().setTargetSdkVersion(targetSdkVersion);
        VMRuntime.getRuntime().setDisabledCompatChanges(disabledCompatChanges);

        final Arguments args = new Arguments(argv);

        // The end of of the RuntimeInit event (see #zygoteInit).
        Trace.traceEnd(Trace.TRACE_TAG_ACTIVITY_MANAGER);

        // Remaining arguments are passed to the start class's static main
        return findStaticMain(args.startClass, args.startArgs, classLoader);
    }

注释的大意就是在应用程序调用System.exit()时应立即终止程序，不然会出现问题，最后返回的是findStaticMain函数，我们查看一下

RuntimeInit的findStaticMain

protected static Runnable findStaticMain(String className, String[] argv,
            ClassLoader classLoader) {
        Class cl;

        try {
            cl = Class.forName(className, true, classLoader);
        } catch (ClassNotFoundException ex) {
            throw new RuntimeException(
                    "Missing class when invoking static main " + className,
                    ex);
        }

        Method m;
        try {
            m = cl.getMethod("main", new Class[] { String[].class });
        } catch (NoSuchMethodException ex) {
            throw new RuntimeException(
                    "Missing static main on " + className, ex);
        } catch (SecurityException ex) {
            throw new RuntimeException(
                    "Problem getting static main on " + className, ex);
        }

        int modifiers = m.getModifiers();
        if (! (Modifier.isStatic(modifiers) && Modifier.isPublic(modifiers))) {
            throw new RuntimeException(
                    "Main method is not public and static on " + className);
        }

        /*
         * This throw gets caught in ZygoteInit.main(), which responds
         * by invoking the exception's run() method. This arrangement
         * clears up all the stack frames that were required in setting
         * up the process.
         */
        return new MethodAndArgsCaller(m, argv);
    }

第1个try语句中，反射forName得到的className是com.android.server.SystemServer，返回的cl为SystemServer中的main方法

第2个try语句中，cl.getMethod找到的是SystemServer里的main方法，赋值给m

最后return处又根据m调用MethodAndArgsCaller方法

RuntimeInit的MethodAndArgsCaller

static class MethodAndArgsCaller implements Runnable {
        /** method to call */
        private final Method mMethod;

        /** argument array */
        private final String[] mArgs;

        public MethodAndArgsCaller(Method method, String[] args) {
            mMethod = method;
            mArgs = args;
        }

        public void run() {
            try {
                mMethod.invoke(null, new Object[] { mArgs });
            } catch (IllegalAccessException ex) {
                throw new RuntimeException(ex);
            } catch (InvocationTargetException ex) {
                Throwable cause = ex.getCause();
                if (cause instanceof RuntimeException) {
                    throw (RuntimeException) cause;
                } else if (cause instanceof Error) {
                    throw (Error) cause;
                }
                throw new RuntimeException(ex);
            }
        }
    }

这里是一个静态内部类，构造器中的mMethd就是SystemServer的main方法，在run方法中调用invoke后，SystemServer的main方法就会被动态调用，SystemServer进程就进入了SystemServer的main方法

3.2 解析SystemServer进程

下面查看SystemServer的main方法

SystemServer的main

public static void main(String[] args) {
        new SystemServer().run();
    }

跟进run方法

SystemServer的run

private void run() {
        TimingsTraceAndSlog t = new TimingsTraceAndSlog();
        try {
···
            Looper.prepareMainLooper();
            Looper.getMainLooper().setSlowLogThresholdMs(
                    SLOW_DISPATCH_THRESHOLD_MS, SLOW_DELIVERY_THRESHOLD_MS);

            SystemServiceRegistry.sEnableServiceNotFoundWtf = true;

            // Initialize native services.
            System.loadLibrary("android_servers");

            // Allow heap / perf profiling.
            initZygoteChildHeapProfiling();
···
            // Create the system service manager.
            mSystemServiceManager = new SystemServiceManager(mSystemContext);
            mSystemServiceManager.setStartInfo(mRuntimeRestart,
                    mRuntimeStartElapsedTime, mRuntimeStartUptime);
            LocalServices.addService(SystemServiceManager.class, mSystemServiceManager);
            // Prepare the thread pool for init tasks that can be parallelized
            SystemServerInitThreadPool.start();
            // Attach JVMTI agent if this is a debuggable build and the system property is set.
···
        } finally {
            t.traceEnd();  // InitBeforeStartServices
        }

        // Setup the default WTF handler
        RuntimeInit.setDefaultApplicationWtfHandler(SystemServer::handleEarlySystemWtf);

        // Start services.
        try {
            t.traceBegin("StartServices");
            startBootstrapServices(t);
            startCoreServices(t);
            startOtherServices(t);
        } catch (Throwable ex) {
            Slog.e("System", "******************************************");
            Slog.e("System", "************ Failure starting system services", ex);
            throw ex;
        } finally {
            t.traceEnd(); // StartServices
        }
···
        // Loop forever.
        Looper.loop();
        throw new RuntimeException("Main thread loop unexpectedly exited");
    }

开头调用prepareMainLooper创建Looper，紧接着loadLibary加载动态库libandroid_service.so

下面创建的SystemServiceManager会对系统服务进行创建、启动和生命周期管理

第2个try语句中，startBootstrapServices方法用SystemServiceManager启动了ActivityManagerService，PackageManagerService、PowerManagerService等服务

startCoreServices方法中则启动了DropBoxManagerService、BatteryServices、UsageStatsServices和WebViewUpdateServices

startOtherServices方法中则启动了CameraServices、AlarmManagerServices、VrManagerServices等服务

上述服务的父类均为SystemService。由方法名可以看出官方把系统服务分为了三种类型：引导服务、核心服务和其他服务

这些启动服务的逻辑是类似的，这里举例PowerManagerService，代码如下

PowerManagerService

mPowerManagerService = mSystemServiceManager.startService(PowerManagerService.class);

可以看出SystemServiceManager的startService方法启动了PowerManagerService，startService方法如下

public void startService(@NonNull final SystemService service) {
        // Register it.
        mServices.add(service);
        // Start it.
        long time = SystemClock.elapsedRealtime();
        try {
            service.onStart();
        } catch (RuntimeException ex) {
            throw new RuntimeException("Failed to start service " + service.getClass().getName()
                    + ": onStart threw an exception", ex);
        }
        warnIfTooLong(SystemClock.elapsedRealtime() - time, service, "onStart");
    }

开头处调用add方法将PowerManagerService添加到mServices中，完成注册工作

try语句中调用onStart函数完成启动PowerManagerService

3.3 SystemServer进程总结

SystemServer进程创建后，主要做了3件事

启动Binder线程池，与其他进程通信
创建SystemServiceManager，用于对系统的服务进行创建、启动和生命周期管理
启动各种系统服务

4. Launcher启动过程

到了最后一步，来学习Launcher的启动流程

4.1 Launcher概述

系统启动的最后一步是启动一个应用程序用来显示系统中已经安装的应用，这个应用程序就叫做Launcher。启动过程中会请求PackageManagerService返回系统中已经安装应用的信息，并将这些信息封装成一个快捷图标列表显示在系统屏幕上，这样用户就可以通过点击快捷图标来启动相应的应用程序

所以通俗来说，Launcher就是Android系统的桌面，它的作用主要有以下两点

作为Android系统启动器，启动应用程序
作为Android系统桌面，用于显示和管理应用程序的快捷图标或者其他桌面组件

4.2 Launcher启动过程介绍

SystemServer启动过程中会启动PackageManagerService，PackageManagerService启动后会将系统中的应用程序安装完成。在此之前已经启动的AMS会将Launcher启动起来，时序图如下

image.png

启动Launcher的入口为AMS的systemReady方法，在SystemServer的startOtherServices方法中被调用

SystemServer的startOtherServices

private void startOtherServices(@NonNull TimingsTraceAndSlog t) {
···
            mActivityManagerService.systemReady(() -> {
            Slog.i(TAG, "Making services ready");
            t.traceBegin("StartActivityManagerReadyPhase");
            mSystemServiceManager.startBootPhase(t, SystemService.PHASE_ACTIVITY_MANAGER_READY);
···

下面查看AMS的systemReady方法做了什么

ActivityManagerService的systemReady

public void systemReady(final Runnable goingCallback, TimingsTraceLog traceLog) {
···
            mStackSupervisor.resumeFocusedStackTopActivityLocked();
            mUserController.sendUserSwitchBroadcasts(-1, currentUserId);
···
}

调用了ActivityStackSupervisor的resumeFocusedStackTopActivityLocked方法

ActivityStackSupervisor的resumeFocusedStackTopActivityLocked

boolean resumeFocusedStackTopActivityLocked(
            ActivityStack targetStack, ActivityRecord target, ActivityOptions targetOptions) {
···
        if (r == null || !r.isState(RESUMED)) {
            mFocusedStack.resumeTopActivityUncheckedLocked(null, null);
        }
···
        return false;
    }

调用ActivityStack的resumeTopActivityUncheckedLocked方法，ActivityStack是用来描述堆栈的，方法如下

ActivityStack的resumeTopActivityUncheckedLocked

boolean resumeTopActivityUncheckedLocked(ActivityRecord prev, ActivityOptions options) {
        if (mStackSupervisor.inResumeTopActivity) {
            // Don't even start recursing.
            return false;
        }

        boolean result = false;
        try {
            // Protect against recursion.
            mStackSupervisor.inResumeTopActivity = true;
            result = resumeTopActivityInnerLocked(prev, options);

            // When resuming the top activity, it may be necessary to pause the top activity (for
            // example, returning to the lock screen. We suppress the normal pause logic in
            // {@link #resumeTopActivityUncheckedLocked}, since the top activity is resumed at the
            // end. We call the {@link ActivityStackSupervisor#checkReadyForSleepLocked} again here
            // to ensure any necessary pause logic occurs. In the case where the Activity will be
            // shown regardless of the lock screen, the call to
            // {@link ActivityStackSupervisor#checkReadyForSleepLocked} is skipped.
            final ActivityRecord next = topRunningActivityLocked(true /* focusableOnly */);
            if (next == null || !next.canTurnScreenOn()) {
                checkReadyForSleep();
            }
        } finally {
            mStackSupervisor.inResumeTopActivity = false;
        }

        return result;
    }

try语句中把resumeTopActivityInnerLocked方法的值给了result

ActivityStack的resumeTopActivityInnerLocked

private boolean resumeTopActivityInnerLocked(ActivityRecord prev, ActivityOptions options) {
···
         return isOnHomeDisplay() &&
                mStackSupervisor.resumeHomeStackTask(prev, reason);
    }
···

调用ActivityStackSupervisor的resumeHomeStackTask方法

ActivityStackSupervisor的resumeHomeStackTask

boolean resumeHomeStackTask(ActivityRecord prev, String reason) {
···
        if (r != null && !r.finishing) {
            moveFocusableActivityStackToFrontLocked(r, myReason);
            return resumeFocusedStackTopActivityLocked(mHomeStack, prev, null);
        }
        return mService.startHomeActivityLocked(mCurrentUser, myReason);
    }

调用了AMS的startHomeActivityLocked方法，如下

ActivityManagerService的startHomeActivityLocked

boolean startHomeActivityLocked(int userId, String reason) {
        if (mFactoryTest == FactoryTest.FACTORY_TEST_LOW_LEVEL
                && mTopAction == null) {
            // We are running in factory test mode, but unable to find
            // the factory test app, so just sit around displaying the
            // error message and don't try to start anything.
            return false;
        }
        Intent intent = getHomeIntent();
        ActivityInfo aInfo = resolveActivityInfo(intent, STOCK_PM_FLAGS, userId);
        if (aInfo != null) {
            intent.setComponent(new ComponentName(aInfo.applicationInfo.packageName, aInfo.name));
            // Don't do this if the home app is currently being
            // instrumented.
            aInfo = new ActivityInfo(aInfo);
            aInfo.applicationInfo = getAppInfoForUser(aInfo.applicationInfo, userId);
            ProcessRecord app = getProcessRecordLocked(aInfo.processName,
                    aInfo.applicationInfo.uid, true);
            if (app == null || app.instr == null) {
                intent.setFlags(intent.getFlags() | FLAG_ACTIVITY_NEW_TASK);
                final int resolvedUserId = UserHandle.getUserId(aInfo.applicationInfo.uid);
                // For ANR debugging to verify if the user activity is the one that actually
                // launched.
                final String myReason = reason + ":" + userId + ":" + resolvedUserId;
                mActivityStartController.startHomeActivity(intent, aInfo, myReason);
            }
        } else {
            Slog.wtf(TAG, "No home screen found for " + intent, new Throwable());
        }

        return true;
    }

第一个if处的mFactoryTest代表系统的运行模式，系统运行模式分为三种：工厂模式、低级工厂模式、高级工厂模式。mTopAction用来描述第一个被启动Activiy组件的Action，默认值为Intent.ACTION_MAIN。因此if的条件含义为当mFactoryTest的模式为低级工厂模式，且mTopAction为空时，返回false

下面的Intent调用了getHomeIntent方法，代码如下

ActivityManagerService的getHomeIntent

Intent getHomeIntent() {
        Intent intent = new Intent(mTopAction, mTopData != null ? Uri.parse(mTopData) : null);
        intent.setComponent(mTopComponent);
        intent.addFlags(Intent.FLAG_DEBUG_TRIAGED_MISSING);
        if (mFactoryTest != FactoryTest.FACTORY_TEST_LOW_LEVEL) {
            intent.addCategory(Intent.CATEGORY_HOME);
        }
        return intent;
    }

这里创建intent，并将mTopAction和mTopData传入。mTopAction的值为Intent.ACTION_MAIN，并且如果系统运行模式不是低级工厂模式，则将intent的Category设置为Intent.CATEGORY_HOME，最后返回该Intent

回到ActivityManagerService的startHomeActivityLocked方法，我们假设系统运行模式不是低级工厂模式，在第3个if处判断符合Action为Intent.ACTION_MAIN、Category为Intent.CATEGORY_HOME的应用程序是否已经启动，如果没启动则调用下面的startHomeActivity启动该应用程序。这个被启动的应用程序就是Launcher，不信我们可以查看Launcher的AndroidMenifest文件

Launcher的AndroidManifest

我们看到标签里设置了android.intent.category.HOME属性，这样他就成了主Activity

再次回到上面ActivityManagerService的startHomeActivityLocked，我们知道如果Launcher未启动则调用startHomeActivity启动Launcher，代码如下

ActivityStartController的startHomeActivity

void startHomeActivity(Intent intent, ActivityInfo aInfo, String reason) {
        mSupervisor.moveHomeStackTaskToTop(reason);
        mLastHomeActivityStartResult = obtainStarter(intent, "startHomeActivity: " + reason)
                .setOutActivity(tmpOutRecord)
                .setCallingUid(0)
                .setActivityInfo(aInfo)
                .execute();
···
        }
    }

这里将Launcher放入HomeStack中，HomeStack是在ActivityStackSupervisor中定义的用于存储Launcher的变量，接着调用obtainStarter来启动Launcher，后面的过程与Activity的启动类似，最后进入Launcher的onCreate方法，到这里Launcher完成了启动

4.3 Launcher中应用图标显示过程

Launcher启动后会做很多工作，作为桌面它会显示应用程序图标，与应用程序开发有所关联，图标作为应用程序的入口，我们有必要接着来学习

先看Launcher的onCreate方法

Launcher的onCreate

protected void onCreate(Bundle savedInstanceState) {
···
        LauncherAppState app = LauncherAppState.getInstance(this);
        mOldConfig = new Configuration(getResources().getConfiguration());
        mModel = app.setLauncher(this);
        initDeviceProfile(app.getInvariantDeviceProfile());
···
        if (!mModel.startLoader(currentScreen)) {
            if (!internalStateHandled) {
                // If we are not binding synchronously, show a fade in animation when
                // the first page bind completes.
                mDragLayer.getAlphaProperty(ALPHA_INDEX_LAUNCHER_LOAD).setValue(0);
            }
        }
···
}

开头处获取LauncherAppState的实例，并调用setLauncher传给mModel，setLauncher方法如下

LauncherAppState的setLauncher

LauncherModel setLauncher(Launcher launcher) {
        getLocalProvider(mContext).setLauncherProviderChangeListener(launcher);
        mModel.initialize(launcher);
        return mModel;
    }

mModel会调用initialize方法，如下

LauncherModel的initialize

public void initialize(Callbacks callbacks) {
        synchronized (mLock) {
            Preconditions.assertUIThread();
            mCallbacks = new WeakReference<>(callbacks);
        }
    }

这里会将Callbacks，也就是传入的Launcher封装成弱引用

回到Launcher的onCreate方法，末尾的if语句中调用了startLoader方法，如下

LauncherModel的startLoader

public boolean startLoader(int synchronousBindPage) {
        // Enable queue before starting loader. It will get disabled in Launcher#finishBindingItems
        InstallShortcutReceiver.enableInstallQueue(InstallShortcutReceiver.FLAG_LOADER_RUNNING);
        synchronized (mLock) {
            // Don't bother to start the thread if we know it's not going to do anything
            if (mCallbacks != null && mCallbacks.get() != null) {
                final Callbacks oldCallbacks = mCallbacks.get();
                // Clear any pending bind-runnables from the synchronized load process.
                mUiExecutor.execute(oldCallbacks::clearPendingBinds);

                // If there is already one running, tell it to stop.
                stopLoader();
                LoaderResults loaderResults = new LoaderResults(mApp, sBgDataModel,
                        mBgAllAppsList, synchronousBindPage, mCallbacks);
                if (mModelLoaded && !mIsLoaderTaskRunning) {
                    // Divide the set of loaded items into those that we are binding synchronously,
                    // and everything else that is to be bound normally (asynchronously).
                    loaderResults.bindWorkspace();
                    // For now, continue posting the binding of AllApps as there are other
                    // issues that arise from that.
                    loaderResults.bindAllApps();
                    loaderResults.bindDeepShortcuts();
                    loaderResults.bindWidgets();
                    return true;
                } else {
                    startLoaderForResults(loaderResults);
                }
            }
        }
        return false;
    }

开头创建了一个队列，接着在同步块中进行操作，调用execute方法从同步加载过程中清除所有挂起的绑定可运行对象

第2个if处，会根据当前是否已经加载过数据，而决定是直接绑定UI，还是去加载数据。如果已经加载过数据，则调用bindWorkspace、bindAllApps、bindDeepShortcuts、bindWidgets四个重要方法，分别加载Workspace（即桌面启动后看到的主界面）、AllApps（即当前安装的所有app）、DeepShortcuts（即长按图标后悬浮的界面）、Widgets（天气温度等小部件）。我们来看看没有加载过数据时对应的最后一个else语句里的startLoaderForResults方法做了什么

LauncherModel的startLoaderForResults

public void startLoaderForResults(LoaderResults results) {
        synchronized (mLock) {
            stopLoader();
            mLoaderTask = new LoaderTask(mApp, mBgAllAppsList, sBgDataModel, results);
            runOnWorkerThread(mLoaderTask);
        }
    }

这个LoaderTask是个Runnable方法，我们看看这个类

LoaderTask的run

public void run() {
···
        TraceHelper.beginSection(TAG);
        try (LauncherModel.LoaderTransaction transaction = mApp.getModel().beginLoader(this)) {
            TraceHelper.partitionSection(TAG, "step 1.1: loading workspace");
            loadWorkspace();

            verifyNotStopped();
            TraceHelper.partitionSection(TAG, "step 1.2: bind workspace workspace");
            mResults.bindWorkspace();

            // Notify the installer packages of packages with active installs on the first screen.
            TraceHelper.partitionSection(TAG, "step 1.3: send first screen broadcast");
            sendFirstScreenActiveInstallsBroadcast();

            // Take a break
            TraceHelper.partitionSection(TAG, "step 1 completed, wait for idle");
            waitForIdle();
            verifyNotStopped();

            // second step
            TraceHelper.partitionSection(TAG, "step 2.1: loading all apps");
            loadAllApps();

            TraceHelper.partitionSection(TAG, "step 2.2: Binding all apps");
            verifyNotStopped();
            mResults.bindAllApps();

            verifyNotStopped();
            TraceHelper.partitionSection(TAG, "step 2.3: Update icon cache");
            updateIconCache();

            // Take a break
            TraceHelper.partitionSection(TAG, "step 2 completed, wait for idle");
            waitForIdle();
            verifyNotStopped();

            // third step
            TraceHelper.partitionSection(TAG, "step 3.1: loading deep shortcuts");
            loadDeepShortcuts();

            verifyNotStopped();
            TraceHelper.partitionSection(TAG, "step 3.2: bind deep shortcuts");
            mResults.bindDeepShortcuts();

            // Take a break
            TraceHelper.partitionSection(TAG, "step 3 completed, wait for idle");
            waitForIdle();
            verifyNotStopped();

            // fourth step
            TraceHelper.partitionSection(TAG, "step 4.1: loading widgets");
            mBgDataModel.widgetsModel.update(mApp, null);

            verifyNotStopped();
            TraceHelper.partitionSection(TAG, "step 4.2: Binding widgets");
            mResults.bindWidgets();

            transaction.commit();
        } catch (CancellationException e) {
            // Loader stopped, ignore
            TraceHelper.partitionSection(TAG, "Cancelled");
        }
        TraceHelper.endSection(TAG);
    }

我们可以看到每一个partitionSection方法里都标明了步骤，这里不赘述，观察一下还可以发现步骤和步骤之间基本都有waitForIdle方法和verifyNotStopped方法

waitForIdle方法用于等待上个步骤的UI线程加载完毕，即上个步骤UI没展示前，不进行下一步的数据准备

verifyNotStopped用于检查该加载过程是否已经被取消，这意味着这个加载过程是可以中途取消的

最后调用commit方法提交，然后endSection结束

由于我们介绍的是App应用图标的显示，所以我们再次进入LoaderTask类的loadAllApps方法查看

LoaderTask的loadAllApps

private void loadAllApps() {
        final List profiles = mUserManager.getUserProfiles();

        // Clear the list of apps
        mBgAllAppsList.clear();
        for (UserHandle user : profiles) {
            // Query for the set of apps
            final List apps = mLauncherApps.getActivityList(null, user);
            // Fail if we don't have any apps
            // TODO: Fix this. Only fail for the current user.
            if (apps == null || apps.isEmpty()) {
                return;
            }
            boolean quietMode = mUserManager.isQuietModeEnabled(user);
            // Create the ApplicationInfos
            for (int i = 0; i < apps.size(); i++) {
                LauncherActivityInfo app = apps.get(i);
                // This builds the icon bitmaps.
                mBgAllAppsList.add(new AppInfo(app, user, quietMode), app);
            }
        }

        if (FeatureFlags.LAUNCHER3_PROMISE_APPS_IN_ALL_APPS) {
            // get all active sessions and add them to the all apps list
            for (PackageInstaller.SessionInfo info :
                    mPackageInstaller.getAllVerifiedSessions()) {
                mBgAllAppsList.addPromiseApp(mApp.getContext(),
                        PackageInstallerCompat.PackageInstallInfo.fromInstallingState(info));
            }
        }

        mBgAllAppsList.added = new ArrayList<>();
    }

开头for循环查找app集合，对于无app的特殊情况进行了处理，接着就是把图标bitmaps放入集合中

最后一个if就是获取所有活动的会话放入集合

我们回到上面看看LauncherModel的startLoader中的bindAllApps方法做了什么

LoaderResults的bindAllApps

public void bindAllApps() {
        // shallow copy
        @SuppressWarnings("unchecked")
        final ArrayList list = (ArrayList) mBgAllAppsList.data.clone();

        Runnable r = new Runnable() {
            public void run() {
                Callbacks callbacks = mCallbacks.get();
                if (callbacks != null) {
                    callbacks.bindAllApplications(list);
                }
            }
        };
        mUiExecutor.execute(r);
    }

run方法里如果callbacks不为空则调用bindAllApplications方法绑定，其中的参数list就是开始创建的ArrayList，集合的泛型恰好就是AppInfo，和LoaderTask的loadAllApps中的集合泛型一致，我们回顾上文可知callback指向的是Launcher，我们查看Launcher的bindAllApplications方法

Launcher的bindAllApplications

public void bindAllApplications(ArrayList apps) {
        mAppsView.getAppsStore().setApps(apps);

        if (mLauncherCallbacks != null) {
            mLauncherCallbacks.bindAllApplications(apps);
        }
    }

mAppsView是AllAppsContainerView的对象，调用了getAppsStore方法，我们来看看这个方法做了什么

AllAppsContainerView的getAppsStore

public AllAppsStore getAppsStore() {
        return mAllAppsStore;
    }

返回了一个mAllAppsStore对象，是AllAppsStore类的实例，所以上面的setApps方法就在这个类中，跟进

AllAppsStore的setApps

public void setApps(List apps) {
        mComponentToAppMap.clear();
        addOrUpdateApps(apps);
    }

点进addOrUpdateApps方法

AllAppsStore的addOrUpdateApps

public void addOrUpdateApps(List apps) {
        for (AppInfo app : apps) {
            mComponentToAppMap.put(app.toComponentKey(), app);
        }
        notifyUpdate();
    }

最后调用addOrUpdateApps把包含应用信息的列表放入一个Map中

回到AllAppsContainerView类中，还有一个适配器Adpater，里面还有一个setup方法，我们看看这个方法的内容

AllAppsContainerView的setup

void setup(@NonNull View rv, @Nullable ItemInfoMatcher matcher) {
            appsList.updateItemFilter(matcher);
            recyclerView = (AllAppsRecyclerView) rv;
            recyclerView.setEdgeEffectFactory(createEdgeEffectFactory());
            recyclerView.setApps(appsList, mUsingTabs);
            recyclerView.setLayoutManager(layoutManager);
            recyclerView.setAdapter(adapter);
            recyclerView.setHasFixedSize(true);
            // No animations will occur when changes occur to the items in this RecyclerView.
            recyclerView.setItemAnimator(null);
            FocusedItemDecorator focusedItemDecorator = new FocusedItemDecorator(recyclerView);
            recyclerView.addItemDecoration(focusedItemDecorator);
            adapter.setIconFocusListener(focusedItemDecorator.getFocusListener());
            applyVerticalFadingEdgeEnabled(verticalFadingEdge);
            applyPadding();
        }

可以看出其实我们的界面也是recyclerView，显示app列表，调用setApps方法显示应用程序快捷图标的列表就会显示在屏幕上

到这里，Launcher的启动流程就结束了

5. Android系统启动流程

结合前面四点，简单总结流程如下

启动电源以及系统启动

电源按下时引导芯片代码从预定义处开始执行。加载引导程序BootLoader到RAM中，然后执行

引导程序BootLoader

是Android操作系统运行前的一个小程序，主要用于拉起系统OS并运行

Linux内核启动

此时设置缓存、被保护存储器、计划列表、加载驱动。内核完成系统设置后首先在系统文件中寻找init.rc文件，并启动init进程

init进程启动

主要用于初始化启动属性服务，也用来启动Zygote进程

Zygote进程启动

创建Java虚拟机并为Java虚拟机注册JNI方法，创建服务器端Socket，启动SystemServer进程

SystemServer进程启动

启动Binder线程池和SystemServiceManager，并且启动各种系统服务

Launcher启动

被SystemServer进程启动的AMS会启动Launcher，Launcher启动后会将已经安装的应用的快捷图标显示到桌面

流程图

image.png

欢迎指正。

Android系统启动学习记录

1. init进程启动

1.1 引入init进程

启动电源以及系统启动

引导程序BootLoader

Linux内核启动

init进程启动

1.2 init进程的入口函数

main（第1部分）

main（第2部分）

延申：僵尸进程

main（第3部分）

1.3 解析init.rc

init.rc

zygote64.rc

1.4 解析Service类型语句

service

service的AddService

1.5 init启动Zygote

init.rc

builtins的do_class_start

service的StartIfNotDisabled

service的Start

app_main

1.6 属性服务

init的main

属性服务配置的初始化和启动

property_service的property_init

property_service的start_property_service

服务处理客户端请求

property_service的handle_property_set_fd

property_service的HandlePropertySet

property_service的PropertySet

1.7 init进程启动总结

2. Zygote进程启动过程

2.1 Zygote概述

2.2 Zygote启动脚本

init.zygote32.rc

init.zygote32_64.rc

2.3 Zygote进程启动过程介绍

app_main的main

AndroidRuntime的start

ZygoteInit的main

第1步 创建一个Server端的Socket

ZygoteServer的registerServerSocketFromEnv

第3步 启动SystemServer进程

ZygoteInit的forkSystemServer

第4步 等待AMS请求创建新的应用程序进程

ZygoteServer的runSelectLoop

2.4 Zygote进程启动总结

3. SystemServer处理过程

3.1 Zygote处理SystemServer进程

ZygoteInit的forkSystemServer

ZygoteInit的handleSystemServerProcess

ZygoteInit的zygoteInit

启动Binder线程池

AndroidRuntime的ZygoteInit

AndroidRuntime的nativeZygoteInit

app_main的onZygoteInit

进入SystemServer的main方法

RuntimeInit的applicationInit

RuntimeInit的findStaticMain

RuntimeInit的MethodAndArgsCaller

3.2 解析SystemServer进程

SystemServer的main

SystemServer的run

PowerManagerService

3.3 SystemServer进程总结

4. Launcher启动过程

4.1 Launcher概述

4.2 Launcher启动过程介绍

SystemServer的startOtherServices

ActivityManagerService的systemReady

ActivityStackSupervisor的resumeFocusedStackTopActivityLocked

ActivityStack的resumeTopActivityUncheckedLocked

ActivityStack的resumeTopActivityInnerLocked

ActivityStackSupervisor的resumeHomeStackTask

ActivityManagerService的startHomeActivityLocked

ActivityManagerService的getHomeIntent

Launcher的AndroidManifest

第1步创建一个Server端的Socket

第3步启动SystemServer进程

第4步等待AMS请求创建新的应用程序进程