前言

关于iOS的runtime和dyld，网上资料一搜一大把，不过很多都是复制来复制去的。本来是不打算再整理一下关于这方面的资料了，但是想了想，别人的始终还是别人的，不自己整理一下写点什么，总感觉没法很好的消化吸收，让它们能保存到自己的脑子里。所以这里就简单记录一下自己在看了一些网上的资料和objc的源码后的一点点理解吧。不可否认的是，还是得亲自跟了源码后，才能对一些结论认识的更加透彻。如果有表述错误的地方，欢迎指正。

dyld 动态链接库 the dynamic link editor

dyld是一个操作系统级别的组件，负责iOS系统中每个App启动时的运行时环境初始化以及加载动态库到内存等一些列操作。

dyld主要干了哪些事情？

关于dyld的详细内容，可以参考：dyld详解
我这里只简单搬了一点点过来，因为我觉得，大概知道点这些内容应该够用了。

第一步：设置运行环境。
第二步：加载共享缓存。（dyld会把一些基础的共用lib缓存起来，这样就不需要每个app启动时都加载一次了）
第三步：实例化主程序。
第四步：加载插入的动态库。
第五步：链接主程序。
第六步：链接插入的动态库。
第七步：执行弱符号绑定
第八步：执行初始化方法。
第九步：查找入口点并返回。

Mach-O 文件

Mach-object文件格式的缩写，用于可执行文件、动态链接库、目标代码的文件格式。
每个iOS App或者动态库中都会有一个Mach-O格式的文件。
Mach-O 文件可以被dyld加载到内存中。
参考：Objective-C runtime机制(前传)——Mach-O格式

Mach-O文件结构

也是简单搬一点点过来。

header：对mach-o文件的概要说明，包括文件类型，支持的cpu类型，load command总数
load commands：指导dyld如何加载mach-o文件的data数据
data：mach-o的数据区，包含代码和数据。包含若干个segment，每个segment又包含若干个section。这些segment会根据对应的load command被dyld加载到内存中。

_objc_init

_objc_init是runtime的入口函数，它是在libSystem中被调用，libSystem是若干个系统lib的集合，也是由dyld动态加载的。
关于这个调用顺序，通过一个符号断点就可以看出来。至于怎么加这个断点，网上很多，这里就不再重复了。

_objc_init 方法的源码如下：
参考objc4源码： https://opensource.apple.com/tarballs/objc4/

/***********************************************************************
* _objc_init
* Bootstrap initialization. Registers our image notifier with dyld.
* Called by libSystem BEFORE library initialization time
**********************************************************************/

void _objc_init(void)
{
    static bool initialized = false;
    if (initialized) return;
    initialized = true;
    
    // fixme defer initialization until an objc-using image is found?
    environ_init();
    tls_init();
    static_init();
    lock_init();
    exception_init();

    _dyld_objc_notify_register(&map_images, load_images, unmap_image);
}

其中通过_dyld_objc_notify_register注册了3个回调方法。

这个方法在dyld中定义。参考dyld的源码：https://opensource.apple.com/tarballs/dyld/

//
// Note: only for use by objc runtime
// Register handlers to be called when objc images are mapped, unmapped, and initialized.
// Dyld will call back the "mapped" function with an array of images that contain an objc-image-info section.
// Those images that are dylibs will have the ref-counts automatically bumped, so objc will no longer need to
// call dlopen() on them to keep them from being unloaded.  During the call to _dyld_objc_notify_register(),
// dyld will call the "mapped" function with already loaded objc images.  During any later dlopen() call,
// dyld will also call the "mapped" function.  Dyld will call the "init" function when dyld would be called
// initializers in that image.  This is when objc calls any +load methods in that image.
//
void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped);

_dyld_objc_notify_mapped： objc images被加载到内存时的回调
_dyld_objc_notify_init： objc images被初始化时的回调
_dyld_objc_notify_unmapped： objc images被移除内存时的回调

dyld在加载每个image（可以理解为动态库或可执行文件）的过程中，都会在对应的时候调用objc注册的这几个回调函数，来让objc完成对runtime环境的构建。只有将runtime环境都准备好了，一个iOS app才有运行的环境。

我所关注的重点就在这几个回调上。

_dyld_objc_notify_mapped回调

对应到objc中的是map_images方法。
map_images会调用到_read_images方法，这个方法里的内容比较多，主要是读取Mach-O文件对应的objc的section，并根据内容初始化runtime的内存结构。
通过源码中的注释可以看出，其大概做了以下这些事：

Discover classes：从__objc_classlist section中读取class list，并将这些class信息保存到一个map中。在这个步骤中，会检测class名称是否重复，如果重复了会给个提示。
Fix up remapped classes：从__objc_classrefs section中读取class ref信息，并进行remap
Fix up @selector references：从__objc_selrefs section中读取selector ref信息
Fix up old objc_msgSend_fixup call sites：从__objc_msgrefs section中读取message ref信息，Repairs an old vtable dispatch call site
Discover protocols：从__objc_protolist section中读取protocol信息
Fix up @protocol references：从__objc_protorefs section中读取protocol ref信息，并进行remap
Realize non-lazy classes (for +load methods and static instances)：从__objc_nlclslist section中读取non-lazy classes信息，并初始化objc_class数据结构，实现meta class, super class, 设置isa指针，加载method list、property list、protocol list，附加categories
Discover categories：从__objc_catlist section中读取categories信息，如果category的target class已经实现了（在上一个步骤），会重新构建 class的method list。category中的method 会被添加到class的method list的前面。

我们可以在Xcode的scheme中添加一个环境变量OBJC_PRINT_IMAGE_TIMES：YES就可以在控制台中看到上面这些步骤每一步所花费的时间。

在最后一步Discover categories中，上面已经给了结论：category中的method 会被添加到class的method list的前面。这个结论是从何得来的呢？
看下面这段源码：

    void attachLists(List* const * addedLists, uint32_t addedCount) {
        if (addedCount == 0) return;
        if (hasArray()) {
            // many lists -> many lists
            uint32_t oldCount = array()->count;
            uint32_t newCount = oldCount + addedCount;
            setArray((array_t *)realloc(array(), array_t::byteSize(newCount)));
            array()->count = newCount;
            memmove(array()->lists + addedCount, array()->lists, 
                    oldCount * sizeof(array()->lists[0]));
            memcpy(array()->lists, addedLists, 
                   addedCount * sizeof(array()->lists[0]));
        }
        ...
    }

这里的重点是memmove和memcpy函数。

// 由src所指内存区域复制count个字节到dest所指内存区域
void    *memmove(void *__dst, const void *__src, size_t __len);

// 拷贝src所指的内存内容前n个字节到dst所指的内存地址上
void    *memcpy(void *__dst, const void *__src, size_t __n);

由此可见，objc先将class原有的method list向后移了addedCount位，然后将addedList添加到了前面空出的位置上。addedList就是在categories中定义的方法。
所以，结论就是：category中定义的方法会被添加method list的前面，从而造成一种category中的方法会“覆盖”class原有的方法的假象。“覆盖”是假的，但是目的却真实的达到了。

_dyld_objc_notify_init回调

对应到objc中的是load_images方法。

/***********************************************************************
* load_images
* Process +load in the given images which are being mapped in by dyld.
*
* Locking: write-locks runtimeLock and loadMethodLock
**********************************************************************/
void
load_images(const char *path __unused, const struct mach_header *mh)
{
    // Return without taking locks if there are no +load methods here.
    if (!hasLoadMethods((const headerType *)mh)) return;

    recursive_mutex_locker_t lock(loadMethodLock);

    // Discover load methods
    {
        mutex_locker_t lock2(runtimeLock);
        prepare_load_methods((const headerType *)mh);
    }

    // Call +load methods (without runtimeLock - re-entrant)
    call_load_methods();
}

可以看到，+ load方法正是在这里被调用的。

prepare_load_methods做了2件事情：

先遍历所有non lazy classes，依次调用schedule_class_load方法，检测每个class是否存在+load 方法，如果存在，将class和对应的IMP添加到loadable_classes这样一个list里等待调用。

/***********************************************************************
* prepare_load_methods
* Schedule +load for classes in this image, any un-+load-ed 
* superclasses in other images, and any categories in this image.
**********************************************************************/
// Recursively schedule +load for cls and any un-+load-ed superclasses.
// cls must already be connected.
static void schedule_class_load(Class cls)
{
    if (!cls) return;
    assert(cls->isRealized());  // _read_images should realize

    // 如果已经处理过，则不再处理，保证每个在class中定义的+load方法只被添加一次
    if (cls->data()->flags & RW_LOADED) return; 
    // Ensure superclass-first ordering
    schedule_class_load(cls->superclass);

    add_class_to_loadable_list(cls);
    cls->setInfo(RW_LOADED); //标识已经处理过
}

从这段代码可以看到，在检测每个class是否存在+load方法时，会优先检测class的superclass是否存在+load方法，如果存在，会先将super class的+load 方法添加到list里，这样就保证了super class的+load方法会先与class中的被调用。

/***********************************************************************
* add_class_to_loadable_list
* Class cls has just become connected. Schedule it for +load if
* it implements a +load method.
**********************************************************************/
void add_class_to_loadable_list(Class cls)
{
    IMP method;

    loadMethodLock.assertLocked();

    method = cls->getLoadMethod();
    if (!method) return;  // Don't bother if cls has no +load method
    ...
    // 这里是对loadable_classes进行扩容
    if (loadable_classes_used == loadable_classes_allocated) {
        loadable_classes_allocated = loadable_classes_allocated*2 + 16;
        loadable_classes = (struct loadable_class *)
            realloc(loadable_classes,
                              loadable_classes_allocated *
                              sizeof(struct loadable_class));
    }
    // 保存class和+load方法的IMP到list里
    loadable_classes[loadable_classes_used].cls = cls;
    loadable_classes[loadable_classes_used].method = method;
    loadable_classes_used++;
}

再遍历所有categories，检测每个category中是否存在+load方法，如果存在，将其添加到loadable_categories这样一个list里等待调用

/***********************************************************************
* add_category_to_loadable_list
* Category cat's parent class exists and the category has been attached
* to its class. Schedule this category for +load after its parent class
* becomes connected and has its own +load method called.
**********************************************************************/
void add_category_to_loadable_list(Category cat)
{
    IMP method;

    loadMethodLock.assertLocked();

    method = _category_getLoadMethod(cat);

    // Don't bother if cat has no +load method
    if (!method) return;
    ...
    if (loadable_categories_used == loadable_categories_allocated) {
        loadable_categories_allocated = loadable_categories_allocated*2 + 16;
        loadable_categories = (struct loadable_category *)
            realloc(loadable_categories,
                              loadable_categories_allocated *
                              sizeof(struct loadable_category));
    }

    loadable_categories[loadable_categories_used].cat = cat;
    loadable_categories[loadable_categories_used].method = method;
    loadable_categories_used++;
}

对比这2段代码可以看到，保存class的+load list和category的+load list是不一样的，并且这2个list都是以数组的形式存储，所以当一个class的多个category中都实现了+load方法时，这些+load方法都会被依次调用，调用顺序同category的编译顺序是一样的。

call_load_methods: 依次调用loadable_classes和loadable_categories中保存的class的+load方法。从这里可以看到，class本身的+load方法会先于category的+load方法调用。等所有classes的+load方法都调用完了，才会去调用category中定义的+load方法。

_dyld_objc_notify_unmapped 回调

对应到objc中是unmap_image方法。主要是做一些移除操作。

关于+load方法的几点总结

+load方法是在main函数之前被调用的，当它被调用时，其他class可能还没加载完，运行时环境并不完整。
+load方法是线程安全的，它里面用了锁。
+load方法的调用机制是通过方法地址直接调用，在prepare_load_methods中保存了每个class对应的+load方法的方法地址。这个方式是区别于+initialize的，+initialize是通过objc_msgSend的消息机制调用的。
每个class、category的+load方法都会被调用，且只会调用一次。
同一个class如果在多个category中都定义了+load方法，那这些+load方法都会被调用。
在+load方法中无需也不应该调用super load，因为每个+load方法都是由runtime来调用的。
调用顺序：
1. super class先于sub class
2. class先于category
3. 编译顺序在前的先于在后的

编译顺序指的是Compile sources中出现的顺序。

这里再补充几点关于+initialize的总结吧，人们往往很喜欢把这2个进行对比。

关于 +initialize 的几点总结

+initialize方法发生在runtime第一次向一个class通过objc_msgSend发送消息时，如果某个class从未被用到过，那它的+initialize就不会被调用。这跟+load方法不同，+load方法由runtime来调用，不管class有没有被使用，都会调用。
+initialize方法的调用机制是objc_msgSend，这跟+load有本质区别，+load方法是runtime直接查找到方法实现的内存地址，直接通过内存地址调用。
如果父类和子类都实现了+initialize方法，则父类的+initialize方法会先于子类的调用
如果子类未实现+initialize方法，而父类实现了，子类会把父类的+initialize方法继承过来调用一次，而在这之前父类的+initialize已经被调用了一次，所以父类的+initialize方法会被调用2次。
如果category和class中都实现了+initialize方法，则runtime会调用category中的+initialize方法，而不会调用class中的。这是因为category中的方法会被添加到class的method list的前面。
如果有多个category中都实现了+initialize方法，则处在编译顺序最后面的那个category的+initialize方法会被调用。

总结

在_objc_init中，runtime通过_dyld_objc_notify_register方法注册了3个回调，当某个image（二进制文件）被dyld加载到内存中的过程中，就会执行对应的回调方法，因此这3个回调可能会被调用多次。
等到dyld完成了所有的runtime环境准备后，就会调用main函数，让app启动起来。

参考

https://www.dllhook.com/post/238.html
https://blog.csdn.net/u013378438/article/details/80353267
https://blog.csdn.net/u013378438/article/details/86614815

我所理解的iOS runtime 和 dyld

前言