App启动过程 - dyld加载动态库

开头

在MacOS和iOS上，可执行程序的启动依赖于xnu内核进程运作和动态链接加载器dyld。其中后者的执行时长可以通过开发阶段中在Xcode的schema指定环境变量 DYLD_PRINT_STATISTICS 为true来指示在调试环境下打印，结果如下：

Total pre-main time: 847.45 milliseconds (100.0%)
         dylib loading time:  82.91 milliseconds (9.7%)
        rebase/binding time: 600.04 milliseconds (70.8%)
            ObjC setup time:  68.04 milliseconds (8.0%)
           initializer time:  96.24 milliseconds (11.3%)
           slowest intializers :
             libSystem.B.dylib :  10.75 milliseconds (1.2%)
    libMainThreadChecker.dylib :  17.94 milliseconds (2.1%)
           MeridianLBSTestDemo : 109.76 milliseconds (12.9%)

通过输出日志，可以知道一些main函数执行之前的事件，以及各个事件的耗时分布情况。这些事件的背后负责人就是dyld，它会将App依赖的动态库和App文件加载到内存以后执行，动态库不是可执行文件，无法独自执行。当点击App的时候，系统在内核态完成一些必要配置，从App的MachO文件解析出dyld的地址，这里会记录在MachO的LC_LOAD_DYLINKER命令中，内容参考如下：

          cmd LC_LOAD_DYLINKER
      cmdsize 28
         name /usr/lib/dyld (offset 12)
Load command 8
     cmd LC_UUID
 cmdsize 24
    uuid DF0F9B2D-A4D7-37D0-BC6B-DB0297766CE8
Load command 9
      cmd LC_VERSION_MIN_IPHONEOS

dyld的地址在 /usr/lib/dyld中找到。解析加载完毕之后会执行dyld，运行在用户态进程（此处不做深入）。dyld的源码是开源的，参考官方的551版，下面会对从dyld加载库文件到app的main函数被执行这段过程做简单分析总结，便于充分了解App的启动原理。

__dyld_start

系统内核在加载动态库前，会加载dyld，然后调用去执行__dyld_start（汇编语言实现的方法）。该函数会执行dyldbootstrap::start()，后者会执行_main()函数，dyld的加载动态库的代码就是从_main()开始执行的。这里可以查看 dyldStartup.s的部分内容（以x86_x64架构做参考)，其中标出了 _dyld_start()与 dyldbootstrap的start方法。


#if __x86_64__
#if !TARGET_IPHONE_SIMULATOR
    .data
    .align 3
__dyld_start_static: 
    .quad   __dyld_start
#endif


#if !TARGET_IPHONE_SIMULATOR
    .text
    .align 2,0x90
    .globl __dyld_start
__dyld_start:
    popq    %rdi        # param1 = mh of app
    pushq   $0      # push a zero for debugger end of frames marker
    movq    %rsp,%rbp   # pointer to base of kernel frame
    andq    $-16,%rsp       # force SSE alignment
    subq    $16,%rsp    # room for local variables
    
    # call dyldbootstrap::start(app_mh, argc, argv, slide, dyld_mh, &startGlue)
    movl    8(%rbp),%esi    # param2 = argc into %esi
    leaq    16(%rbp),%rdx   # param3 = &argv[0] into %rdx
    movq    __dyld_start_static(%rip), %r8
    leaq    __dyld_start(%rip), %rcx
    subq     %r8, %rcx  # param4 = slide into %rcx
    leaq    ___dso_handle(%rip),%r8 # param5 = dyldsMachHeader
    leaq    -8(%rbp),%r9
    call    __ZN13dyldbootstrap5startEPK12macho_headeriPPKclS2_Pm
    movq    -8(%rbp),%rdi
    cmpq    $0,%rdi
    jne Lnew

start方法的核心方法是dyld::_main，在这里完成app的程序入口的配置。下面会对main方法做详细分析。


//  This is code to bootstrap dyld.  This work in normally done for a program by dyld and crt.
//  In dyld we have to do this manually.
//
uintptr_t start(const struct macho_header* appsMachHeader, int argc, const char* argv[], 
                intptr_t slide, const struct macho_header* dyldsMachHeader,
                uintptr_t* startGlue)
{
    // if kernel had to slide dyld, we need to fix up load sensitive locations
    // we have to do this before using any global variables
    if ( slide != 0 ) {
        rebaseDyld(dyldsMachHeader, slide);
    }

    // allow dyld to use mach messaging
    mach_init();
  
  //其他配置
  // ...
    // now that we are done bootstrapping dyld, call dyld's main
  //ASLR
    uintptr_t appsSlide = slideOfMainExecutable(appsMachHeader);
    return dyld::_main(appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue);
}

dyld::_main

下面开始深入分析_main方法（只对方法中的关注部分做递归深入，其余部分不做分析）。


// Entry point for dyld.  The kernel loads dyld and jumps to __dyld_start which
// sets up some registers and call this function.
//
// Returns address of main() in target program which __dyld_start jumps to
uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, 
        int argc, const char* argv[], const char* envp[], const char* apple[], 
        uintptr_t* startGlue)
{
    dyld3::kdebug_trace_dyld_signpost(DBG_DYLD_SIGNPOST_START_DYLD, 0, 0);

    // Grab the cdHash of the main executable from the environment
    uint8_t mainExecutableCDHashBuffer[20];
    const uint8_t* mainExecutableCDHash = nullptr;
    if ( hexToBytes(_simple_getenv(apple, "executable_cdhash"), 40, mainExecutableCDHashBuffer) )
        mainExecutableCDHash = mainExecutableCDHashBuffer;

    // Trace dyld's load
    notifyKernelAboutImage((macho_header*)&__dso_handle, _simple_getenv(apple, "dyld_file"));
#if !TARGET_IPHONE_SIMULATOR
    // Trace the main executable's load
    notifyKernelAboutImage(mainExecutableMH, _simple_getenv(apple, "executable_file"));
#endif

    uintptr_t result = 0;
    sMainExecutableMachHeader = mainExecutableMH;
    sMainExecutableSlide = mainExecutableSlide;
#if __MAC_OS_X_VERSION_MIN_REQUIRED
    // if this is host dyld, check to see if iOS simulator is being run
    const char* rootPath = _simple_getenv(envp, "DYLD_ROOT_PATH");
    if ( rootPath != NULL ) {

        // look to see if simulator has its own dyld
        char simDyldPath[PATH_MAX]; 
        strlcpy(simDyldPath, rootPath, PATH_MAX);
        strlcat(simDyldPath, "/usr/lib/dyld_sim", PATH_MAX);
        int fd = my_open(simDyldPath, O_RDONLY, 0);
        if ( fd != -1 ) {
            const char* errMessage = useSimulatorDyld(fd, mainExecutableMH, simDyldPath, argc, argv, envp, apple, startGlue, &result);
            if ( errMessage != NULL )
                halt(errMessage);
            return result;
        }
    }
#endif

    CRSetCrashLogMessage("dyld: launch started");

    setContext(mainExecutableMH, argc, argv, envp, apple);

    // Pickup the pointer to the exec path.
    sExecPath = _simple_getenv(apple, "executable_path");

    //  Remove interim apple[0] transition code from dyld
    if (!sExecPath) sExecPath = apple[0];
    
    if ( sExecPath[0] != '/' ) {
        // have relative path, use cwd to make absolute
        char cwdbuff[MAXPATHLEN];
        if ( getcwd(cwdbuff, MAXPATHLEN) != NULL ) {
            // maybe use static buffer to avoid calling malloc so early...
            char* s = new char[strlen(cwdbuff) + strlen(sExecPath) + 2];
            strcpy(s, cwdbuff);
            strcat(s, "/");
            strcat(s, sExecPath);
            sExecPath = s;
        }
    }

    // Remember short name of process for later logging
    sExecShortName = ::strrchr(sExecPath, '/');
    if ( sExecShortName != NULL )
        ++sExecShortName;
    else
        sExecShortName = sExecPath;

    configureProcessRestrictions(mainExecutableMH);

#if __MAC_OS_X_VERSION_MIN_REQUIRED
    if ( gLinkContext.processIsRestricted ) {
        pruneEnvironmentVariables(envp, &apple);
        // set again because envp and apple may have changed or moved
        setContext(mainExecutableMH, argc, argv, envp, apple);
    }
    else
#endif
    {
        checkEnvironmentVariables(envp);
        defaultUninitializedFallbackPaths(envp);
    }
    if ( sEnv.DYLD_PRINT_OPTS )
        printOptions(argv);
    if ( sEnv.DYLD_PRINT_ENV ) 
        printEnvironmentVariables(envp);
    getHostInfo(mainExecutableMH, mainExecutableSlide);
  // ...todo

第一步：设置运行环境，配置环境变量。

开始的时候，告知内核dyld的uuid和主App的uuid信息。将传入的变量mainExecutableMH和mainExecutableSlide赋值给sMainExecutableMachHeader和sMainExecutableSlide保存，这两个分别是mach_header类型结构体和long类型数据，表示当前App的Mach-O头部信息和ASLR位移长度；有了头部信息，加载器就可以从头开始，遍历整个Mach-O文件的信息。接着执行了setContext()，此方法设置了全局一个链接上下文，包括一些回调函数、参数与标志设置信息，其中的context结构体实例、回调函数都是dyld自己的实现。代码片断如下：


    // setContext ...
    gLinkContext.loadLibrary            = &libraryLocator;
    gLinkContext.terminationRecorder    = &terminationRecorder;
    gLinkContext.flatExportFinder       = &flatFindExportedSymbol;
    gLinkContext.coalescedExportFinder  = &findCoalescedExportedSymbol;
    gLinkContext.getCoalescedImages     = &getCoalescedImages;
  //...

接着执行configureProcessRestrictions函数，根据当前进程是否受限，再次配置链接上下文以及其他环境参数; 对于进程受限类型以及各自的处理原因不做深入分析，感兴趣的话可以自行Google。在main函数中会发现很多 'DYLD_' 开头的环境变量，我们在Xcode的Edit Schema的Argument中配置的环境变量会被保存到EnvironmentVariables类型的结构体实例中。

第二步：加载共享缓存

551版对应dyld3，与dyld不同点在main方法中可以看出，在老的main方法中，完成第一步以后会初始化主App，然后加载共享缓存。到了dyld3，对他们的顺序做了调整：加载缓存的步骤可以划分为 mapSharedCache和checkVersionedPaths，在dyld3中，会先执行mapSharedCache，然后加载主App，最后checkVersionedPaths；这里的差异原因有待深入调研。（苹果在2017年发布的dyld3，视频链接参考这里

对于共享缓存的理解：dyld加载时，为了优化程序启动，启用了共享缓存（shared cache）技术。共享缓存会在进程启动时被dyld映射到内存中，之后，当任何Mach-O映像加载时，dyld首先会检查该Mach-O映像与所需的动态库是否在共享缓存中，如果存在，则直接将它在共享内存中的内存地址映射到进程的内存地址空间。mapSharedCache参考如下：


static void mapSharedCache()
{
    dyld3::SharedCacheOptions opts;
    opts.cacheDirOverride   = sSharedCacheOverrideDir;
    opts.forcePrivate       = (gLinkContext.sharedRegionMode == ImageLoader::kUsePrivateSharedRegion);
#if __x86_64__ && !TARGET_IPHONE_SIMULATOR
    opts.useHaswell         = sHaswell;
#else
    opts.useHaswell         = false;
#endif
    opts.verbose            = gLinkContext.verboseMapping;
    loadDyldCache(opts, &sSharedCacheLoadInfo);

    // update global state
    if ( sSharedCacheLoadInfo.loadAddress != nullptr ) {
        dyld::gProcessInfo->processDetachedFromSharedRegion = opts.forcePrivate;
        dyld::gProcessInfo->sharedCacheSlide                = sSharedCacheLoadInfo.slide;
        dyld::gProcessInfo->sharedCacheBaseAddress          = (unsigned long)sSharedCacheLoadInfo.loadAddress;
        sSharedCacheLoadInfo.loadAddress->getUUID(dyld::gProcessInfo->sharedCacheUUID);
        dyld3::kdebug_trace_dyld_image(DBG_DYLD_UUID_SHARED_CACHE_A, (const uuid_t *)&dyld::gProcessInfo->sharedCacheUUID[0], {0,0}, {{ 0, 0 }}, (const mach_header *)sSharedCacheLoadInfo.loadAddress);
    }
}

这里会执行核心方法loadDyldCache, 对该方法简要说下：


bool loadDyldCache(const SharedCacheOptions& options, SharedCacheLoadInfo* results) {
/**
* 省略
**/
#if TARGET_IPHONE_SIMULATOR
    // simulator only supports mmap()ing cache privately into process
    return mapCachePrivate(options, results);
#else
    if ( options.forcePrivate ) {
        // mmap cache into this process only
        return mapCachePrivate(options, results);
    }
    else {
        // fast path: when cache is already mapped into shared region
        if ( reuseExistingCache(options, results) )
            return (results->errorMessage != nullptr);
        // slow path: this is first process to load cache
        return mapCacheSystemWide(options, results);
    }
#endif
}

在loadDyldCache会有关键判断：

是否运行在模拟器，模拟器有单独处理
如果缓存已经映射到了共享区域下，就把其在共享区域的地址映射到本进程的地址空间。感兴趣的可以深入研究 reuseExistingCache。如果没有，就加载缓存，并映射。
被加载的缓存位于 /System/Library/Caches/com.apple.dyld下的若干组，每个cpu架构代表一组。

第三步：初始化主App

系统会对已经映射到进程空间的主程序（在XNU解析MachO阶段就完成了映射操作）创建一个imageLoader，再将其加入到master list中（sAllImages）。如果加载的MachO的硬件架构与本设备相符，就执行imageLoader的创建和添加操作。其中主要实现是ImageLoaderMachO::instantiateMainExecutable方法，该方法将主App的MachHeader，ASLR，文件路径和前面提到的链接上下文作为参数，做imageLoader的实例化操作。下面重点看下 instantiateMainExecutable方法。

在instantiateMainExecutable方法中

对代码签名、MachO加密, 动态库数量，段的数量相关信息的loadCommand做解析，提取出command数据 ---- sniffLoadCommands方法
根据步骤1的结果，决定是否执行Compressed模式下的instantiateMainExecutable方法还是Classic模式的instantiateMainExecutable方法。

这里主要做地址检查，解析剩下的loadCommand，设置动态链接和符号表信息


//CPU架构是否匹配 
if ( isCompatibleMachO((const uint8_t*)mh, path) ) {
    //macho header, ASLR, 执行路径, 链接上下文
        ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext);
   //分配主程序image的内存，更新。
        addImage(image);
        return (ImageLoaderMachO*)image;
    }

// create image for main executable
ImageLoader* ImageLoaderMachO::instantiateMainExecutable(const macho_header* mh, uintptr_t slide, const char* path, const LinkContext& context) {
    //dyld::log("ImageLoader=%ld, ImageLoaderMachO=%ld, ImageLoaderMachOClassic=%ld, ImageLoaderMachOCompressed=%ld\n",
    //  sizeof(ImageLoader), sizeof(ImageLoaderMachO), sizeof(ImageLoaderMachOClassic), sizeof(ImageLoaderMachOCompressed));
    bool compressed;
    unsigned int segCount;
    unsigned int libCount;
    const linkedit_data_command* codeSigCmd;
    const encryption_info_command* encryptCmd;
    sniffLoadCommands(mh, path, false, &compressed, &segCount, &libCount, context, &codeSigCmd, &encryptCmd);
    // instantiate concrete class based on content of load commands
    if ( compressed ) 
        return ImageLoaderMachOCompressed::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
    else
#if SUPPORT_CLASSIC_MACHO
        return ImageLoaderMachOClassic::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
#else
        throw "missing LC_DYLD_INFO load command";
#endif
}

最后会走到上面提到的 checkVersionedPaths方法，设置加载的动态库版本，这里的动态库还没有包括经 DYLD_INSERT_LIBRARIES插入的库。

第四步：加载插入的动态库

如果 DYLD_INSERT_LIBRARIES不为空，就调 loadInsertedDylib方法去加载。


static void loadInsertedDylib(const char* path)
{
    try {
        LoadContext context;
        // configure loading context
    //load
        image = load(path, context, cacheIndex);
    }
    catch (const char* msg) { /**/ }
}

/**
做路径展开，搜索查找，排重，以及缓存查找工作。其中路径的展开和搜索分几个阶段（phase）
*/

ImageLoader* load(const char* path, const LoadContext& context, unsigned& cacheIndex)
{
    // try all path permutations and check against existing loaded images
    ImageLoader* image = loadPhase0(path, orgPath, context, cacheIndex, NULL);

    // try all path permutations and try open() until first success
    std::vector exceptions;
    image = loadPhase0(path, orgPath, context, cacheIndex, &exceptions);
#if !TARGET_IPHONE_SIMULATOR
    //  support symlinks on disk to a path in dyld shared cache
    if ( image == NULL)
        image = loadPhase2cache(path, orgPath, context, cacheIndex, &exceptions);
#endif
}
// create image by mapping in a mach-o file
ImageLoaderMachOCompressed* ImageLoaderMachOCompressed::instantiateFromFile(const char* path, int fd, const uint8_t* fileData, size_t lenFileData,
                                                            uint64_t offsetInFat, uint64_t lenInFat, const struct stat& info, 
                                                            unsigned int segCount, unsigned int libCount, 
                                                            const struct linkedit_data_command* codeSigCmd, 
                                                            const struct encryption_info_command* encryptCmd, 
                                                            const LinkContext& context)
{
    ImageLoaderMachOCompressed* image = ImageLoaderMachOCompressed::instantiateStart((macho_header*)fileData, path, segCount, libCount);

    try {
        // record info about file  
        image->setFileInfo(info.st_dev, info.st_ino, info.st_mtime);

        // if this image is code signed, let kernel validate signature before mapping any pages from image
        image->loadCodeSignature(codeSigCmd, fd, offsetInFat, context);
        
        // Validate that first data we read with pread actually matches with code signature
        image->validateFirstPages(codeSigCmd, fd, fileData, lenFileData, offsetInFat, context);

        // mmap segments
        image->mapSegments(fd, offsetInFat, lenInFat, info.st_size, context);

        // if framework is FairPlay encrypted, register with kernel
        image->registerEncryption(encryptCmd, context);
        
        // probe to see if code signed correctly
        image->crashIfInvalidCodeSignature();

        // finish construction
        image->instantiateFinish(context);
        
        // if path happens to be same as in LC_DYLIB_ID load command use that, otherwise malloc a copy of the path

#if __MAC_OS_X_VERSION_MIN_REQUIRED
        //  app crashes when libSystem cannot be found
        else if ( (installName != NULL) && (strcmp(path, "/usr/lib/libgcc_s.1.dylib") == 0) && (strcmp(installName, "/usr/lib/libSystem.B.dylib") == 0) )
            image->setPathUnowned("/usr/lib/libSystem.B.dylib");
#endif
        else if ( (path[0] != '/') || (strstr(path, "../") != NULL) ) {
            // rdar://problem/10733082 Fix up @rpath based paths during introspection
            // rdar://problem/5135363 turn relative paths into absolute paths so gdb, Symbolication can later find them
        }
    }
    catch (...) {
        // ImageLoader::setMapped() can throw an exception to block loading of image
        //  Leaked fSegmentsArray and image segments during failed dlopen_preflight
    }
    
    return image;
}

load方法不仅被 loadInsertedDylib调用，也会被 dlopen等运行时加载动态库的方法使用。里面的核心方法是 loadPhase0， loadPhase1~6; 这些phase的搜索路径对应各个环境变量如下：

DYLD_ROOT_PATH->LD_LIBRARY_PATH->DYLD_FRAMEWORK_PATH->原始路径->DYLD_FALLBACK_LIBRARY_PATH。在loadPhase6会走 ImageLoaderMachO::instantiateFromFile方法加载实例化imageLoader；再走checkAndAddImage方法去验证，然后加入到master list中（sAllImages）。
如果loadPhase0返回的是空地址，则走 loadPhase2cache方法去缓存中查找，找到以后从 ImageLoaderMachO::instantiateFromCache方法去实例化，否则抛异常。
ImageLoaderMachO的两个方法 instantiateFromFile、instantiateFromCache是loader将 MachO文件解析映射到内存的核心方法，两个都会进入Compressed和Classic的分叉步骤。以Compressed下的instantiateFromFile来分析，其中会有几个我们需要留意的步骤：
1. 交给内核去验证动态库的代码签名 loadCodeSignature。
2. 映射到内存的first page, （4k大小）与代码签名是否match。在这里会执行沙盒，签名认证，对于在线上运行时加载动态库的需求，可以重点研究这里。
3. 根据 DYLD_ENCRYPTION_INFO，让内核去注册加密信息 registerEncryption。在该方法中，会调用内核方法 mremap_encrypted，传入加密数据的地址和长度等数据，查看了内核代码，应该是根据cryptid是否为1做了解密操作。
4. 如果走到phase6, 会调xmap函数将动态库从本地mmap到用户态内存空间。

根据上面的分析，主程序imageLoader在全局image表的首位，后面的是插入的动态库的imageLoader，每个动态库对应一个loader。

第五步：链接主程序 -- Rebase & Bind

链接所有动态库，进行符号修正绑定工作。这里的主worker是ImageLoader的link方法。


void ImageLoader::link(const LinkContext& context, bool forceLazysBound, bool preflightOnly, bool neverUnload, const RPathChain& loaderRPaths, const char* imagePath)
{
    uint64_t t0 = mach_absolute_time();
    this->recursiveLoadLibraries(context, preflightOnly, loaderRPaths, imagePath);
    context.notifyBatch(dyld_image_state_dependents_mapped, preflightOnly);

    // we only do the loading step for preflights
    if ( preflightOnly )
        return;
        
    uint64_t t1 = mach_absolute_time();
    context.clearAllDepths();
    this->recursiveUpdateDepth(context.imageCount());

    uint64_t t2 = mach_absolute_time();
    this->recursiveRebase(context);
    context.notifyBatch(dyld_image_state_rebased, false);
    
    uint64_t t3 = mach_absolute_time();
    this->recursiveBind(context, forceLazysBound, neverUnload);

    uint64_t t4 = mach_absolute_time();
    if ( !context.linkingMainExecutable )
        this->weakBind(context);
    uint64_t t5 = mach_absolute_time(); 

    context.notifyBatch(dyld_image_state_bound, false);
    uint64_t t6 = mach_absolute_time(); 

    std::vector dofs;
    this->recursiveGetDOFSections(context, dofs);
    context.registerDOFs(dofs);
    uint64_t t7 = mach_absolute_time(); 

    // interpose any dynamically loaded images
    if ( !context.linkingMainExecutable && (fgInterposingTuples.size() != 0) ) {
        this->recursiveApplyInterposing(context);
    }
    
    // clear error strings
    (*context.setErrorStrings)(0, NULL, NULL, NULL);

    fgTotalLoadLibrariesTime += t1 - t0;
    fgTotalRebaseTime += t3 - t2;
    fgTotalBindTime += t4 - t3;
    fgTotalWeakBindTime += t5 - t4;
    fgTotalDOF += t7 - t6;
    
    // done with initial dylib loads
    fgNextPIEDylibAddress = 0;
}

link方法主要做了以下操作：

recursiveLoadLibraries递归加载所有依赖库，完成之后发送一个状态为 dyld_image_state_dependents_mapped的通知。(如果加载的动态库需要从硬盘读取，IO的开销就很大了）
recursiveRebase递归修正自己和依赖库的基地址，因为ASLR（上文中已经提到过）的原因，需要根据随机slide修正基地址。
recursiveBind对于nolazy的符号进行递归绑定,lazy的符号会在运行时动态绑定（首次被调用才去绑定）。
weakBind 弱符号绑定，比如未初始化的全局变量就是弱符号。对弱符号和强符号感兴趣可以自行Google。
recursiveGetDOFSections和 registerDOFs递归获取和注册程序的DOF节区，dtrace会用其动态跟踪。

在步骤1里，递归加载主App在打包阶段就确定好的动态库的操作会使用前面提到的setContext里的链接上下文，调用它的loadLibrary方法；然后优先去加载依赖的动态库。loadLibary的实现在设置链接上下文的时候就已经赋值确定，即 libraryLocator，在这个方法里会用到上面提到的 load方法

在步骤3里，会有符号绑定的操作，在这里详细分析下。


void ImageLoader::recursiveBind(const LinkContext& context, bool forceLazysBound, bool neverUnload) {
    // Normally just non-lazy pointers are bound immediately.
    // The exceptions are:
    //   1) DYLD_BIND_AT_LAUNCH will cause lazy pointers to be bound immediately
    //   2) some API's (e.g. RTLD_NOW) can cause lazy pointers to be bound immediately
    if ( fState < dyld_image_state_bound ) {
        // break cycles
        fState = dyld_image_state_bound;
        try {
            // bind lower level libraries first
            for(unsigned int i=0; i < libraryCount(); ++i) {
                ImageLoader* dependentImage = libImage(i);
                if ( dependentImage != NULL )
                    dependentImage->recursiveBind(context, forceLazysBound, neverUnload);
            }
            // bind this image
            this->doBind(context, forceLazysBound); 
            // mark if lazys are also bound
            if ( forceLazysBound || this->usablePrebinding(context) )
                fAllLazyPointersBound = true;
            // mark as never-unload if requested
            if ( neverUnload )
                this->setNeverUnload();
                
            context.notifySingle(dyld_image_state_bound, this, NULL);
        }
        catch (const char* msg) {
        //...//
        }
    }
}

recursiveBind完成递归绑定符号表的操作。此处的符号表针对的是非延迟加载的符号表，对于DYLD_BIND_AT_LAUNCH等特殊情况下的non-lazy符号才执行立即绑定。该方法的核心是ImageLoaderMach的doBind，读取image的动态链接信息的bind_off与bind_size来确定需要绑定的数据偏移与大小，然后挨个对它们进行绑定，绑定操作具体使用bindAt函数；调用resolve解析完符号表后，调用bindLocation完成最终的绑定操作，需要绑定的符号信息有三种：

BIND_TYPE_POINTER：需要绑定的是一个指针。直接将计算好的新值屿值即可。

BIND_TYPE_TEXT_ABSOLUTE32：一个32位的值。取计算的值的低32位赋值过去。

BIND_TYPE_TEXT_PCREL32：重定位符号。需要使用新值减掉需要修正的地址值来计算出重定位值。

对延迟绑定的实现感兴趣的可以在Xcode中调试查看，或者参考这个

第六步：链接插入的动态库

这里参与链接的动态库根据第四部中加载的插入的动态库，从sAllImages的第二个imageLoader开始，依次取出并执行link方法和ImageLoaderMachO::registerInterposing方法来链接和动态库的数据插入操作（符号信息，__DATA段的__interpose，。。。），如果对插入动态库加载环节感兴趣，可以深入研究这个方法。link的工作原理前面已经描述过。

后面还有一个针对加速表下的插入操作和对主程序执行weak bind（弱绑定）的处理，如果支持加速表是不会去支持插入加载动态库的。对这部分感兴趣可以深入研究以下代码及调用环节.


        //  dyld should support interposition even without DYLD_INSERT_LIBRARIES
        for (long i=sInsertedDylibCount+1; i < sAllImages.size(); ++i) {
            image->registerInterposing();
        }
    #if SUPPORT_ACCELERATE_TABLES
        if ( (sAllCacheImagesProxy != NULL) && ImageLoader::haveInterposingTuples() ) {
            // Accelerator tables cannot be used with implicit interposing, so relaunch with accelerator tables disabled
            ImageLoader::clearInterposingTuples();
            // unmap all loaded dylibs (but not main executable)
            }
            // note: we don't need to worry about inserted images because if DYLD_INSERT_LIBRARIES was set we would not be using the accelerator table
            goto reloadAllImages;
        }
    #endif
        // apply interposing to initial set of images
        for(int i=0; i < sImageRoots.size(); ++i) {
            sImageRoots[i]->applyInterposing(gLinkContext);
        }
        gLinkContext.linkingMainExecutable = false;

第七步：初始化主程序 -- OC, C++全局变量初始化

初始化主程序触发App的Objective-C init的方法是 initializeMainExecutable。中间还有个对主程序image执行弱绑定的操作。


void initializeMainExecutable()
{
    // record that we've reached this step
    gLinkContext.startedInitializingMainExecutable = true;

    // run initialzers for any inserted dylibs
    ImageLoader::InitializerTimingList initializerTimes[allImagesCount()];
    initializerTimes[0].count = 0;
    const size_t rootCount = sImageRoots.size();
    if ( rootCount > 1 ) {
        for(size_t i=1; i < rootCount; ++i) {
            sImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0]);
        }
    }
    
    // run initializers for main executable and everything it brings up 
    sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]);
    
    // register cxa_atexit() handler to run static terminators in all loaded images when this process exits
    if ( gLibSystemHelpers != NULL ) 
        (*gLibSystemHelpers->cxa_atexit)(&runAllStaticTerminators, NULL, NULL);

    // dump info if requested
    if ( sEnv.DYLD_PRINT_STATISTICS )
        // output
    if ( sEnv.DYLD_PRINT_STATISTICS_DETAILS )
        // output
}

void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
    uint64_t t1 = mach_absolute_time();
    mach_port_t thisThread = mach_thread_self();
    ImageLoader::UninitedUpwards up;
    up.count = 1;
    up.images[0] = this;
    processInitializers(context, thisThread, timingInfo, up);
    context.notifyBatch(dyld_image_state_initialized, false);
    mach_port_deallocate(mach_task_self(), thisThread);
    uint64_t t2 = mach_absolute_time();
    fgTotalInitTime += (t2 - t1);
}

void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread,
                                     InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images)
{
    // Calling recursive init on all images in images list, building a new list of
    // uninitialized upward dependencies.
    for (uintptr_t i=0; i < images.count; ++i) {
        images.images[i]->recursiveInitialization(context, thisThread, images.images[i]->getPath(), timingInfo, ups);
    }
    // If any upward dependencies remain, init them.
    if ( ups.count > 0 )
        processInitializers(context, thisThread, timingInfo, ups);
}

从 initializeMainExecutable可以得到以下几个要点：

DYLD_PRINT_STATISTICS和 DYLD_PRINT_STATISTICS_DETAILS如果被设置，在初始化完毕以后会打印dyld启动App的各个重要时间节点信息（没有包括全部细节）
首先会对插入的动态库执行 runInitializers核心方法，保证他们在主程序image之前完成初始化；再对主image执行runInitializers。
每一步操作完毕会去通知观察者，notifySingle或者notifyBatch方法，发送的通知类型参考下面的枚举：

enum dyld_image_states
{
    dyld_image_state_mapped                 = 10,       // No batch notification for this
    dyld_image_state_dependents_mapped      = 20,       // Only batch notification for this
    dyld_image_state_rebased                = 30, 
    dyld_image_state_bound                  = 40,
    dyld_image_state_dependents_initialized = 45,       // Only single notification for this
    dyld_image_state_initialized            = 50,
    dyld_image_state_terminated             = 60        // Only single notification for this
};

runInitializer方法及后面的调用链流程如下，这里总结几点

runInitializer流程图

对于dumpdcrypted这一类注入方法实现功能的插件，他们添加的静态方法会在 doModInitFunctions方法中被解析出来，位置在MachO文件的 __DATA段的 __mod_init_func section。C++的全局对象也会出现在这个section中。
在递归初始化 (recursiveInitialization）中，如果当前执行的是主程序image，doInitialization完毕后会执行 notifySingle方法去通知观察者。在doInitialization方法前会发送 state为 dyld_image_state_dependents_initialized的通知，由这个通知，会调用 libobjc的 load_images，最后去依次调用各个OC类的load方法以及分类的load方法。
Objective-C的入口方法是 _objc_init，dyld唤起它的执行路径是从runInitializers -> recursiveInitialization -> doInitialization -> doModInitFunctions ->.. _objc_init。

void _objc_init(void)
{       
    // Register for unmap first, in case some +load unmaps something
    _dyld_register_func_for_remove_image(&unmap_image);
    dyld_register_image_state_change_handler(dyld_image_state_bound,
                                             1/*batch*/, &map_2_images);
    dyld_register_image_state_change_handler(dyld_image_state_dependents_initialized, 0/*not batch*/, &load_images);
}

_objc_init会在dyld中注册两个通知，对应的回调会分别执行将OC类加载到内存和调用load方法的操作。后面的就是OC类加载的经典方法map_2_images了。
从 recursiveInitialization的以下代码片段可以看出 load是在全局实例或者方法调用前被触发的。

            context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
            // initialize this image
            bool hasInitializers = this->doInitialization(context);
            // let anyone know we finished initializing this image
            fState = dyld_image_state_initialized;
            oldState = fState;
            context.notifySingle(dyld_image_state_initialized, this, NULL);

第八步：配置主程序入口

        // find entry point for main executable
        result = (uintptr_t)sMainExecutable->getThreadPC();

对主程序image执行getThreadPC去查找程序入口，getThread去拿到 LC_MAIN加载命令的数据，根据命令的entryoff（__TEXT段的main函数在MachO的偏移地址）+ MachO头在内存的地址的和。如果没有LC_MAIN命令，再从LC_UNIXTHREAD命令的内容取拿入口地址，最后返回给外界。

补充：dyld闭包

在第二步和第三步之间有一个查找闭包并以其结果作为程序入口返回的代码，这里是 WWDC 2017推出的dyld3中提出的一种优化App启动速度的技术；大致步骤如下：

如果满足条件：开启闭包 ( DYLD_USE_CLOSURES 环境变量为 1），App的路径在白名单中（目前只有系统App享有使用闭包的特权），共享缓存加载地址不为空，则往下执行。
去内存中查找闭包数据，这里的方法是 findClosure。如果内存中不存在，再去 /private/var/staged_system_apps路径下去查找硬盘数据，找到就返回结果。
如果没有闭包数据，就会调用socket通信走RPC去获取闭包数据，执行方法为 callClosureDaemon，感兴趣可以研究下
如果闭包数据不为空，就会走核心方法：launchWithClosure，基于闭包去启动App，并且返回该方法中获取的程序入口地址给外界。这个方法重复了上面的各个步骤。具体实现和内部的数据结构有待分析。

小结

官方的 dyld项目中除了 dyld和 dyld3的源码外，还有一些工具可以供我们使用，下面列出一些可以利用的工具

update_dyld_shared_cache_tool：在MacOS中有它的可执行程序，这里在项目中无法直接编译它。其作用是强制更新系统的共享缓存里动态库的版本。否则在系统更新和重装系统的时候，才会去更新。
dyld_closure_util：一个制作dyld闭包的工具；闭包在 WWDC2017的 dyld3介绍中被提出，用来缓存app在启动阶段，依赖的动态库路径、符号查找结果，代码签名等根据解析MachO头部获取数据，便于app的启动提速。该工具也需要修改才能编译。
dsc_extractor：用来从动态库共享缓存中提取出所有系统库，这个工程可以直接编译成可执行文件，用于后续的调研系统库实现的工作。

每个MachO都会由一个imageLoader来处理加载和依赖管理的操作，这里是由dyld来安排。主程序app的image的加载是由内核来完成的。其他的动态库的加载细节可以参考上面提到的link方法实现，当一个image加载完毕，dyld会发送 dyld_image_state_bound通知；著名的hook工具 fishhook的实现原理也是借助监听这个通知，在回调里完成hook操作的。