应用程序加载(一) -- dyld流程分析

应用程序加载(一) -- dyld流程分析
应用程序加载(二) -- dyld&objc关联以及类的加载初探
应用程序加载(三)-- 类的加载
应用程序加载(四)-- 分类的加载
应用程序加载(五)-- 类扩展和关联对象


1.从已知条件出发


举个例子:
vc中的load方法

@implementation ViewController

+ (void)load {
    NSLog(@"%s", __func__);
}

- (void)viewDidLoad {
    [super viewDidLoad];
    // Do any additional setup after loading the view.
}

@end

main中的实现

__attribute__((constructor)) void Func(){
    NSLog(@"%s", __func__);
}

int main(int argc, char * argv[]) {
    NSString * appDelegateClassName;
    
    NSLog(@"%s", __func__);
    
    @autoreleasepool {
        // Setup code that might create autoreleased objects goes here.
        appDelegateClassName = NSStringFromClass([AppDelegate class]);
    }
    return UIApplicationMain(argc, argv, nil, appDelegateClassName);
}

运行结果:

  • 通过运行结果可以看出来,先调用load方法,然后是c++相关内容,最后调用main函数

2.切入点 - load方法时的调用堆栈


从已知情况出发,load方法是最早被调用的,那么就以它为切入点。在load中下断点,然后使用lldbbt命令查看堆栈情况。

通过打印的信息可以看到入口是dyld_dyld_start函数。而最终是到libobjc.A.dylib中的load_images,接下来以这个堆栈信息为基础,查找一遍这个过程。

2.1扩展 - dyld

dylddynamic link editor的缩写,是苹果操作系统为我们提供的动态链接器。
dyld-750.6源码下载地址,接下来我们针对相关源码进行分析

3.开始出发 - _dyld_start


打开源码,搜索_dyld_start,结果中会有多个,主要是针对不同平台上的差异化进行不同过的处理,实现逻辑上来说都差不多,我们来看看arm平台的源码

#if __arm__
    .text
    .align 2
__dyld_start:
    mov r8, sp      // save stack pointer
    sub sp, #16     // make room for outgoing parameters
    bic     sp, sp, #15 // force 16-byte alignment

    // call dyldbootstrap::start(app_mh, argc, argv, dyld_mh, &startGlue)
    ldr r0, [r8]    // r0 = mach_header
    ldr r1, [r8, #4]    // r1 = argc
    add r2, r8, #8  // r2 = argv
    adr r3, __dyld_start
    sub r3 ,r3, #0x1000 // r3 = dyld_mh
    add r4, sp, #12
    str r4, [sp, #0]    // [sp] = &startGlue

    bl  __ZN13dyldbootstrap5startEPKN5dyld311MachOLoadedEiPPKcS3_Pm
    ldr r5, [sp, #12]
    cmp r5, #0
    bne Lnew

    // traditional case, clean up stack and jump to result
    add sp, r8, #4  // remove the mach_header argument.
    bx  r0      // jump to the program's entry point

    // LC_MAIN case, set up stack for call to main()
Lnew:   mov lr, r5          // simulate return address into _start in libdyld
    mov r5, r0          // save address of main() for later use
    ldr r0, [r8, #4]        // main param1 = argc
    add r1, r8, #8      // main param2 = argv
    add r2, r1, r0, lsl #2
    add r2, r2, #4      // main param3 = &env[0]
    mov r3, r2
Lapple: ldr r4, [r3]
    add r3, #4
    cmp r4, #0
    bne Lapple          // main param4 = apple
    bx  r5

#endif /* __arm__ */
  • 源码是用汇编实现,看起来比较难懂,可以直接看注释
    • 调用dyldbootstrap::start,做一些main之前的工作。后续的研究重点。
    • 清理堆栈,为后续的工作提供干净的环境
    • 最后一步:为调用main(),准备好堆栈。

现在只要在dyldbootstrap::start中找到load方法的调用点即可。接下来进入dyldbootstrap::start流程

4.dyldbootstrap::start


uintptr_t start(const dyld3::MachOLoaded* appsMachHeader, int argc, const char* argv[],
                const dyld3::MachOLoaded* dyldsMachHeader, uintptr_t* startGlue)
{

    // 省略相关准备工作代码和容错处理代码
    ......
    
    return dyld::_main((macho_header*)appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue);
}
  • 省略相关准备工作代码和容错处理的代码
  • 最后调用dyld::_main函数

5.dyld::_main


这个函数有几百行代码,信息量很足,这里就不贴了,只说明大概的实现逻辑,感兴趣的朋友可以自行下载代码进行查看。(推荐阅读时间:夜深人静。搭配上red cow和更好)

5.1 准备环境

对环境以及平台等信息进行区别处理,比如:平台判断、最低支持的版本等等。

5.2 共享缓存

共享缓存是以动态库为基础的一种技术方案。首先了解一下静态库动态库

  • 库:通常会把公用的函数做成一个函数库,提供给程序使用。
  • 静态库:在编译阶段将静态库与我们编写的代码一起“打包”成一个可执行的目标文件。运行时就不再需要静态库。但是静态库有个缺点,就是会有重复,多占用空间,增加目标文件的大小。
  • 动态库:不会再编译时链接到目标文件中,在运行被载入的时候,对动态库进行链接。对比静态库,这样处理可以减少空间的浪费。但是时间上会有一定的损耗。
  • 共享缓存技术:系统将加载过的动态库进行缓存,当有新的进程加载到内存时,先到缓存中读取,这样会节省重新加载动态库消耗的时间。

源码中调用mapSharedCache()函数,进行共享缓存加载。

5.3 实例化主程序

实例化主程序相关代码:sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
此处相当于把主程序架子搭建起来,为后续的工作做好准备。

5.4 link阶段

相关源码:

link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
sMainExecutable->setNeverUnloadRecursive();
if ( sMainExecutable->forceFlat() ) {
    gLinkContext.bindFlat = true;
    gLinkContext.prebindUsage = ImageLoader::kUseNoPrebinding;
}

// link any inserted libraries
// do this after linking main executable so that any dylibs pulled in by inserted 
// dylibs (e.g. libSystem) will not be in front of dylibs the program uses
if ( sInsertedDylibCount > 0 ) {
    for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
        ImageLoader* image = sAllImages[i+1];
        link(image, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
        image->setNeverUnloadRecursive();
    }
    if ( gLinkContext.allowInterposing ) {
        // only INSERTED libraries can interpose
        // register interposing info after all inserted libraries are bound so chaining works
        for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
            ImageLoader* image = sAllImages[i+1];
            image->registerInterposing(gLinkContext);
        }
    }
}

if ( gLinkContext.allowInterposing ) {
    //  dyld should support interposition even without DYLD_INSERT_LIBRARIES
    for (long i=sInsertedDylibCount+1; i < sAllImages.size(); ++i) {
        ImageLoader* image = sAllImages[i];
        if ( image->inSharedCache() )
            continue;
        image->registerInterposing(gLinkContext);
    }
}
  • 首先链接住程序link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
  • 在循环链接需要的image,也就是动态库。并且缓存可以缓存的动态库。

5.5 弱绑定链接库

相关源码:sMainExecutable->weakBind(gLinkContext);
所有库链接结束后,做一个弱引用的绑定。

5.6 初始化主程序

相关源码:initializeMainExecutable();
此处利用前面实例化主程序搭起的架子和链接等相关的操作后,给主程序设置“初始值”。后文会详细讲解。

5.7 发送通知

相关源码:notifyMonitoringDyldMain();
准备已经结束,通知可以进入dyldmain阶段。

5.8 dyld::_main小结

代码量比较大,但是理顺思路后,看起来就容易了很多。其实与我们正常开发流程中的MVC很像:

  • 搭建好VC架子(实例化主程序)
  • 拿到M(共享缓存)
  • MVC进行相互联系起来(link动态库)
  • M初始化V(初始化主程序&发送通知)

6.初始化主程序initializeMainExecutable源码分析


void initializeMainExecutable()
{
    // record that we've reached this step
    gLinkContext.startedInitializingMainExecutable = true;

    // run initialzers for any inserted dylibs
    ImageLoader::InitializerTimingList initializerTimes[allImagesCount()];
    initializerTimes[0].count = 0;
    const size_t rootCount = sImageRoots.size();
    if ( rootCount > 1 ) {
        for(size_t i=1; i < rootCount; ++i) {
            sImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0]);
        }
    }
    
    // run initializers for main executable and everything it brings up 
    sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]);
    
    // 省略无关代码
    ......
}

重点代码是runInitializers

  • 首先用for循环对已经加入的动态库进行初始化
  • 然后对主程序进行初始化
void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
    uint64_t t1 = mach_absolute_time();
    mach_port_t thisThread = mach_thread_self();
    ImageLoader::UninitedUpwards up;
    up.count = 1;
    up.imagesAndPaths[0] = { this, this->getPath() };
    processInitializers(context, thisThread, timingInfo, up);
    context.notifyBatch(dyld_image_state_initialized, false);
    mach_port_deallocate(mach_task_self(), thisThread);
    uint64_t t2 = mach_absolute_time();
    fgTotalInitTime += (t2 - t1);
}
  • 重点代码是processInitializers函数调用。接下来看看它的实现
void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread,
                                     InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images)
{
    uint32_t maxImageCount = context.imageCount()+2;
    ImageLoader::UninitedUpwards upsBuffer[maxImageCount];
    ImageLoader::UninitedUpwards& ups = upsBuffer[0];
    ups.count = 0;
    // Calling recursive init on all images in images list, building a new list of
    // uninitialized upward dependencies.
    for (uintptr_t i=0; i < images.count; ++i) {
        images.imagesAndPaths[i].first->recursiveInitialization(context, thisThread, images.imagesAndPaths[i].second, timingInfo, ups);
    }
    // If any upward dependencies remain, init them.
    if ( ups.count > 0 )
        processInitializers(context, thisThread, timingInfo, ups);
}
  • 重点代码是for循环中的recursiveInitialization函数调用。通过函数名了解到是一个递归操作。
void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize,
                                          InitializerTimingList& timingInfo, UninitedUpwards& uninitUps)
{
    recursive_lock lock_info(this_thread);
    recursiveSpinLock(lock_info);

    //省略部分代码
    ......
    
    // let objc know we are about to initialize this image
    uint64_t t1 = mach_absolute_time();
    fState = dyld_image_state_dependents_initialized;
    oldState = fState;
    context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
    
    // initialize this image
    bool hasInitializers = this->doInitialization(context);

    // let anyone know we finished initializing this image
    fState = dyld_image_state_initialized;
    oldState = fState;
    context.notifySingle(dyld_image_state_initialized, this, NULL);
            
    //省略部分代码
    ......
    
    recursiveSpinUnLock();
}
  • 源码中会调用两次context.notifySingle
  • 第一次调用:通知让objc知道我们将初始化当前的镜像文件
  • 第二次调用:通知让所有人知道当前的镜像文件初始化完成
  • 两次调用的中间是初始化镜像函数bool hasInitializers = this->doInitialization(context);
static void notifySingle(dyld_image_states state, const ImageLoader* image, ImageLoader::InitializerTimingList* timingInfo)
{
    //dyld::log("notifySingle(state=%d, image=%s)\n", state, image->getPath());
    std::vector* handlers = stateToHandlers(state, sSingleHandlers);
    if ( handlers != NULL ) {
        dyld_image_info info;
        info.imageLoadAddress   = image->machHeader();
        info.imageFilePath      = image->getRealPath();
        info.imageFileModDate   = image->lastModified();
        for (std::vector::iterator it = handlers->begin(); it != handlers->end(); ++it) {
            const char* result = (*it)(state, 1, &info);
            if ( (result != NULL) && (state == dyld_image_state_mapped) ) {
                //fprintf(stderr, "  image rejected by handler=%p\n", *it);
                // make copy of thrown string so that later catch clauses can free it
                const char* str = strdup(result);
                throw str;
            }
        }
    }
    if ( state == dyld_image_state_mapped ) {
        //  Save load addr + UUID for images from outside the shared cache
        if ( !image->inSharedCache() ) {
            dyld_uuid_info info;
            if ( image->getUUID(info.imageUUID) ) {
                info.imageLoadAddress = image->machHeader();
                addNonSharedCacheImageUUID(info);
            }
        }
    }
    if ( (state == dyld_image_state_dependents_initialized) && (sNotifyObjCInit != NULL) && image->notifyObjC() ) {
        uint64_t t0 = mach_absolute_time();
        dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0);
        (*sNotifyObjCInit)(image->getRealPath(), image->machHeader());
        uint64_t t1 = mach_absolute_time();
        uint64_t t2 = mach_absolute_time();
        uint64_t timeInObjC = t1-t0;
        uint64_t emptyTime = (t2-t1)*100;
        if ( (timeInObjC > emptyTime) && (timingInfo != NULL) ) {
            timingInfo->addTime(image->getShortName(), timeInObjC);
        }
    }
    // mach message csdlc about dynamically unloaded images
    if ( image->addFuncNotified() && (state == dyld_image_state_terminated) ) {
        notifyKernel(*image, false);
        const struct mach_header* loadAddress[] = { image->machHeader() };
        const char* loadPath[] = { image->getPath() };
        notifyMonitoringDyld(true, 1, loadAddress, loadPath);
    }
}
  • 代码中总体分为四个if
  • 第一个if:做一些异常处理
  • 第二个if:动态库的共享缓存处理
  • 第三个if:里面有一个函数指针*sNotifyObjCInit,并且对它进行是调用时间打点,做了一个时间消耗上的记录。
  • 第四个if:通知发送的相关处理

接下来我们来看看函数指针*sNotifyObjCInit在何时赋值的!

void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
    // record functions to call
    sNotifyObjCMapped   = mapped;
    sNotifyObjCInit     = init;
    sNotifyObjCUnmapped = unmapped;
    //省略部分代码
    ......
}

void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped)
{
    dyld::registerObjCNotifiers(mapped, init, unmapped);
}
  • registerObjCNotifiers函数中对sNotifyObjCInit进行的赋值,并且是第二个参数init进行的赋值。
  • registerObjCNotifiers函数是在_dyld_objc_notify_register函数中被调用的
  • _dyld_objc_notify_register函数的三个参数都是函数指针。

7.寻找_dyld_objc_notify_register调用位置


此时全局搜索_dyld_objc_notify_register函数,找不到调用的地方。但是在声明中的注释中得到了一些信息:

// Note: only for use by objc runtime
// Register handlers to be called when objc images are mapped, unmapped, and initialized.
// Dyld will call back the "mapped" function with an array of images that contain an objc-image-info section.
// Those images that are dylibs will have the ref-counts automatically bumped, so objc will no longer need to
// call dlopen() on them to keep them from being unloaded.  During the call to _dyld_objc_notify_register(),
// dyld will call the "mapped" function with already loaded objc images.  During any later dlopen() call,
// dyld will also call the "mapped" function.  Dyld will call the "init" function when dyld would be called
// initializers in that image.  This is when objc calls any +load methods in that image.
//
void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped);
  • 注释中的第一句:仅仅在objc的runtime中使用。说明这个方法是在objc源码中调用的。
  • 注释中的最后两句:当镜像文件被dyld调用的时候,“init”函数指针同时也会被dyld调用。这个时机是在镜像中objc调用+load方法的时候。说明+load方法会提前触发镜像文件的加载。

在我们比较熟悉的Objc源码中找到了调用点:

/***********************************************************************
* _objc_init
* Bootstrap initialization. Registers our image notifier with dyld.
* Called by libSystem BEFORE library initialization time
**********************************************************************/

void _objc_init(void)
{
    static bool initialized = false;
    if (initialized) return;
    initialized = true;
    
    // fixme defer initialization until an objc-using image is found?
    environ_init();
    tls_init();
    static_init();
    runtime_init();
    exception_init();
    cache_init();
    _imp_implementationWithBlock_init();

    _dyld_objc_notify_register(&map_images, load_images, unmap_image);

#if __OBJC2__
    didCallDyldNotifyRegister = true;
#endif
}
  • 注释中有解释,Called by libSystem BEFORE library initialization timelibrary初始化之前被libSystem库调用。

8.到达目的地 - load_images


查看load_images源码实现

void
load_images(const char *path __unused, const struct mach_header *mh)
{
    if (!didInitialAttachCategories && didCallDyldNotifyRegister) {
        didInitialAttachCategories = true;
        loadAllCategories();
    }

    // Return without taking locks if there are no +load methods here.
    if (!hasLoadMethods((const headerType *)mh)) return;

    recursive_mutex_locker_t lock(loadMethodLock);

    // Discover load methods
    {
        mutex_locker_t lock2(runtimeLock);
        prepare_load_methods((const headerType *)mh);
    }

    // Call +load methods (without runtimeLock - re-entrant)
    call_load_methods();
}
  • 最后一行代码是调用所有的+load方法
void call_load_methods(void)
{
    static bool loading = NO;
    bool more_categories;

    loadMethodLock.assertLocked();

    // Re-entrant calls do nothing; the outermost call will finish the job.
    if (loading) return;
    loading = YES;

    void *pool = objc_autoreleasePoolPush();

    do {
        // 1. Repeatedly call class +loads until there aren't any more
        while (loadable_classes_used > 0) {
            call_class_loads();
        }

        // 2. Call category +loads ONCE
        more_categories = call_category_loads();

        // 3. Run more +loads if there are classes OR more untried categories
    } while (loadable_classes_used > 0  ||  more_categories);

    objc_autoreleasePoolPop(pool);

    loading = NO;
}

⏬⏬⏬

static void call_class_loads(void)
{
    int i;
    
    // Detach current loadable list.
    struct loadable_class *classes = loadable_classes;
    int used = loadable_classes_used;
    loadable_classes = nil;
    loadable_classes_allocated = 0;
    loadable_classes_used = 0;
    
    // Call all +loads for the detached list.
    for (i = 0; i < used; i++) {
        Class cls = classes[i].cls;
        load_method_t load_method = (load_method_t)classes[i].method;
        if (!cls) continue; 

        if (PrintLoading) {
            _objc_inform("LOAD: +[%s load]\n", cls->nameForLogging());
        }
        (*load_method)(cls, @selector(load));
    }
    
    // Destroy the detached list.
    if (classes) free(classes);
}
  • call_load_methods函数中循环调用类的+load方法
  • 然后会调用分类的+load方法

至此我们找到了load方法的调用位置,也大致了解到了应用程序的加载过程。接下来证明一下C++在流程中哪一步进行调用的。

9. c++调用堆栈


案例中c++函数也是在main之前,那么肯定也在上述流程中的某个位置中。我们还用堆栈的方式查看一下调用的函数

通过堆栈中的信息,看到是doInitialization中的doModInitFunctions函数。上文中我们找到了doInitialization函数调用的位置recursiveInitialization函数中,两次context.notifySingle之间处调用的

void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize,
                                          InitializerTimingList& timingInfo, UninitedUpwards& uninitUps)
{
    recursive_lock lock_info(this_thread);
    recursiveSpinLock(lock_info);

    //省略部分代码
    ......
    
    // let objc know we are about to initialize this image
    uint64_t t1 = mach_absolute_time();
    fState = dyld_image_state_dependents_initialized;
    oldState = fState;
    context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
    
    // initialize this image
    bool hasInitializers = this->doInitialization(context);

    // let anyone know we finished initializing this image
    fState = dyld_image_state_initialized;
    oldState = fState;
    context.notifySingle(dyld_image_state_initialized, this, NULL);
            
    //省略部分代码
    ......
    
    recursiveSpinUnLock();
}

⏬⏬⏬

bool ImageLoaderMachO::doInitialization(const LinkContext& context)
{
    CRSetCrashLogMessage2(this->getPath());

    // mach-o has -init and static initializers
    doImageInit(context);
    doModInitFunctions(context);//处理c++
    
    CRSetCrashLogMessage2(NULL);
    
    return (fHasDashInit || fHasInitializers);
}

找到调用点,也就说明了一切。

doModInitFunctions函数的实现源码比较多,感兴趣的朋友可以自行下载源码查看,此处只为证明调用时机,贴源码没有什么意义。

结束语


本文通过一个小案例,引出了应用程序的加载流程。希望对大家有所帮助!
dyld-750.6源码下载地址
小tips:查看源码的时候,建议抓住目标,定位重点。这样的效率会提高很多。不要被非目标代码所干扰。

你可能感兴趣的:(应用程序加载(一) -- dyld流程分析)