iOS Class实现原理-结构解析

本文会阐述下面几个问题

1、Class是什么
2、Class的内存布局
3、class_rw_t与class_ro_t的设计哲学
4、分类与class_rw_t的关系

查看源码（源码版本objc4-781.2）

源码地址
打开objc-private.h查看源码，发现Class是一个结构体指针

typedef struct objc_class *Class;

我们继续在源码中搜索“struct objc_class”，如图，发现有5个头文件都有定义，最终确认objc-runtime-new.h中是OC2.0中生效的，其他几个文件都有相关宏定义做了限定

searchobjc_class.jpg

objc_class结构体简略定义如下

struct objc_class : objc_object {
    // Class ISA;
    Class superclass;
    cache_t cache;             // formerly cache pointer and vtable
    class_data_bits_t bits;    // class_rw_t * plus custom rr/alloc flags

    class_rw_t *data() const {
        return bits.data();
    }
    ...
};

发现objc_class继承自objc_object（c++对c结构体做了扩展，允许定义函数，允许继承并且默认访问权限为public这与c++中的class是不同的），我们再看下objc_object的定义

struct objc_object {
private:
    isa_t isa;

public:

    // ISA() assumes this is NOT a tagged pointer object
    Class ISA();

    // rawISA() assumes this is NOT a tagged pointer object or a non pointer ISA
    Class rawISA();

    // getIsa() allows this to be a tagged pointer object
    Class getIsa();
    ...
};

所以现在可以理解为这个结构体大概长这样

struct objc_class {
    // Class ISA;
    Class superclass;
    cache_t cache;             // formerly cache pointer and vtable
    class_data_bits_t bits;    // class_rw_t * plus custom rr/alloc flags

    class_rw_t *data() const {
        return bits.data();
    }
    ...
    Class ISA();

    // rawISA() assumes this is NOT a tagged pointer object or a non pointer ISA
    Class rawISA();

    // getIsa() allows this to be a tagged pointer object
    Class getIsa();
    ...
};

下面这个东西是私有成员，所以类内部操作isa的地方使用的是objc_object里面封装的一系列函数，嗯~这很符合开闭原则

private:
    isa_t isa;

我们从上到下梳理一下:

定义了一个Class类型的superclass指针，定义了一个cache_t类型的对象，class_data_bits_t类型的对象，注意这里的用词，在OC里面对象即指针，struct则不同，结构体指针在64位系统占8个字节，结构体对象占用的内存大小是内部所有成员变量的字节数总和，当然还要考虑内存对齐原则，iOS系统会按照8字节对齐，16字节为一个开辟单元，嗯~为了访问效率

cache_t 结构解析

cache_t的简略定义如下，保留了所有的成员变量，省略了函数

struct cache_t {
#if CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_OUTLINED
    explicit_atomic _buckets;
    explicit_atomic _mask;
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16
    explicit_atomic _maskAndBuckets;
    mask_t _mask_unused;
    
    // How much the mask is shifted by.
    static constexpr uintptr_t maskShift = 48;
    
    // Additional bits after the mask which must be zero. msgSend
    // takes advantage of these additional bits to construct the value
    // `mask << 4` from `_maskAndBuckets` in a single instruction.
    static constexpr uintptr_t maskZeroBits = 4;
    
    // The largest mask value we can store.
    static constexpr uintptr_t maxMask = ((uintptr_t)1 << (64 - maskShift)) - 1;
    
    // The mask applied to `_maskAndBuckets` to retrieve the buckets pointer.
    static constexpr uintptr_t bucketsMask = ((uintptr_t)1 << (maskShift - maskZeroBits)) - 1;
    
    // Ensure we have enough bits for the buckets pointer.
    static_assert(bucketsMask >= MACH_VM_MAX_ADDRESS, "Bucket field doesn't have enough bits for arbitrary pointers.");
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_LOW_4
    // _maskAndBuckets stores the mask shift in the low 4 bits, and
    // the buckets pointer in the remainder of the value. The mask
    // shift is the value where (0xffff >> shift) produces the correct
    // mask. This is equal to 16 - log2(cache_size).
    explicit_atomic _maskAndBuckets;
    mask_t _mask_unused;

    static constexpr uintptr_t maskBits = 4;
    static constexpr uintptr_t maskMask = (1 << maskBits) - 1;
    static constexpr uintptr_t bucketsMask = ~maskMask;
#else
#error Unknown cache mask storage type.
#endif
    
#if __LP64__
    uint16_t _flags;
#endif
    uint16_t _occupied;

public:
    ...
};

嗯~还是有点长，我们来解读一下，里面有一些条件编译指令，还有一些static变量，我们知道如下条件编译只会走一个分支，静态变量存储在静态区，结构体不会为其分配内存空间，所以cache_t对象到底占多大内存呢？我们再次精简下结构

struct cache_t {
#if CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_OUTLINED
    explicit_atomic _buckets;
    explicit_atomic _mask;
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16
    explicit_atomic _maskAndBuckets;
    mask_t _mask_unused;
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_LOW_4
    explicit_atomic _maskAndBuckets;
    mask_t _mask_unused;
#else
#error Unknown cache mask storage type.
#endif
    
#if __LP64__
    uint16_t _flags;
#endif
    uint16_t _occupied;

public:
    ...
};

explicit_atomic是个结构体模板，大小是传入参数的大小

template 
struct explicit_atomic : public std::atomic {
    explicit explicit_atomic(T initial) noexcept : std::atomic(std::move(initial)) {}
    operator T() const = delete;
    
    T load(std::memory_order order) const noexcept {
        return std::atomic::load(order);
    }
    void store(T desired, std::memory_order order) noexcept {
        std::atomic::store(desired, order);
    }
    static explicit_atomic *from_pointer(T *ptr) {
        static_assert(sizeof(explicit_atomic *) == sizeof(T *),
                      "Size of atomic must match size of original");
        explicit_atomic *atomic = (explicit_atomic *)ptr;
        ASSERT(atomic->is_lock_free());
        return atomic;
    }
};

mask_t又是什么呢？嗯~32位无符号整形，占4个字节

typedef uint32_t mask_t;

uintptr_t又是如何定义的呢？嗯~64位系统占8个字节

typedef unsigned long int uintptr_t;

所以我们来计算一下cache_t的大小，8+4+2+2，嗯~16个字节

class_data_bits_t 结构解析

内部只有一个成员变量，嗯~8个字节

struct class_data_bits_t {
    uintptr_t bits;
};

接下来看这个常函数，返回值是class_rw_t指针

class_rw_t *data() const {
    return bits.data();
}

class_rw_t 结构解析

class_rw_t简略定义如下，嗯~终于看到核心的东西了，如下四个函数依次返回了class_ro_t类型的结构体指针、method_array_t、property_array_t、protocol_array_t类型的对象

struct class_rw_t {
    const class_ro_t *ro() const {...}
    const method_array_t methods() const {...}
    const property_array_t properties() const {...}
    const protocol_array_t protocols() const {...}
};

先抛开class_ro_t不说，我们继续阅读源码，发现如下事实,他们同时继承于模板类list_array_tt，内部实现了添加、存储、释放等管理函数

class method_array_t : public list_array_tt 
{
    ...
};

class property_array_t : public list_array_tt 
{
    ...
};

class protocol_array_t : public list_array_tt 
{
    ...
};

我们要重点阅读下这个:

模板类的attachLists函数，这是OC支持动态性的核心函数，if有多个元素，则通过memmove函数把old数据移动到array()->lists，再通过memcpy函数将addedLists数据拷贝过来,else if 本来list为空则直接赋值为addedLists，else做了一对多合并，所以从数据结构来讲method、property、protocol都支持了动态更新

template 
class list_array_tt {
    void attachLists(List* const * addedLists, uint32_t addedCount) {
        if (addedCount == 0) return;

        if (hasArray()) {
            // many lists -> many lists
            uint32_t oldCount = array()->count;
            uint32_t newCount = oldCount + addedCount;
            setArray((array_t *)realloc(array(), array_t::byteSize(newCount)));
            array()->count = newCount;
            memmove(array()->lists + addedCount, array()->lists, 
                    oldCount * sizeof(array()->lists[0]));
            memcpy(array()->lists, addedLists, 
                   addedCount * sizeof(array()->lists[0]));
        }
        else if (!list  &&  addedCount == 1) {
            // 0 lists -> 1 list
            list = addedLists[0];
        } 
        else {
            // 1 list -> many lists
            List* oldList = list;
            uint32_t oldCount = oldList ? 1 : 0;
            uint32_t newCount = oldCount + addedCount;
            setArray((array_t *)malloc(array_t::byteSize(newCount)));
            array()->count = newCount;
            if (oldList) array()->lists[addedCount] = oldList;
            memcpy(array()->lists, addedLists, 
                   addedCount * sizeof(array()->lists[0]));
        }
    }
};

class_ro_t 结构解析

看着是不是很眼熟，嗯~没错，就是上面提到的oldList，同样有方法、属性、协议还有成员变量

struct class_ro_t {
    uint32_t flags;
    uint32_t instanceStart;
    uint32_t instanceSize;
#ifdef __LP64__
    uint32_t reserved;
#endif

    const uint8_t * ivarLayout;
    
    const char * name;
    method_list_t * baseMethodList;
    protocol_list_t * baseProtocols;
    const ivar_list_t * ivars;

    const uint8_t * weakIvarLayout;
    property_list_t *baseProperties;
}

何以见得class_ro_t中的属性、方法等成员变量就是oldLists呢，再看一段源码

/***********************************************************************
* realizeClassWithoutSwift
* Performs first-time initialization on class cls, 
* including allocating its read-write data.
* Does not perform any Swift-side initialization.
* Returns the real class structure for the class. 
* Locking: runtimeLock must be write-locked by the caller
**********************************************************************/
static Class realizeClassWithoutSwift(Class cls, Class previously)
{
    runtimeLock.assertLocked();

    class_rw_t *rw;
    Class supercls;
    Class metacls;

    if (!cls) return nil;
    if (cls->isRealized()) return cls;
    ASSERT(cls == remapClass(cls));

    // fixme verify class is not in an un-dlopened part of the shared cache?

    auto ro = (const class_ro_t *)cls->data();
    auto isMeta = ro->flags & RO_META;
    if (ro->flags & RO_FUTURE) {
        // This was a future class. rw data is already allocated.
        rw = cls->data();
        ro = cls->data()->ro();
        ASSERT(!isMeta);
        cls->changeInfo(RW_REALIZED|RW_REALIZING, RW_FUTURE);
    } else {
        // Normal class. Allocate writeable class data.
        rw = objc::zalloc();
        rw->set_ro(ro);
        rw->flags = RW_REALIZED|RW_REALIZING|isMeta;
        cls->setData(rw);
    }
    ...
}

嗯~Apple给我们做的注释很清楚了Performs first-time initialization on class cls，类第一次初始化的时候，都会执行如上函数，类的初始信息存储在class_ro_t中，经过一顿操作，将初始信息ro赋值给rw中的ro，bits.data()返回的就是rw指针，bits是什么呢。是不是还是很眼熟，回顾一下，嗯~就是class_data_bits_t

struct objc_class {
    // Class ISA;
    Class superclass;
    cache_t cache;             // formerly cache pointer and vtable
    class_data_bits_t bits;    // class_rw_t * plus custom rr/alloc flags

    class_rw_t *data() const {
        return bits.data();
    }
    ...
};

下面再看一段源码，嗯~没有删减，看到了吧，类对象初始化的时候会执行到extAlloc函数，从ro中取出method_list_t、property_list_t、protocol_list_t然后执行attachLists方法合并到rw

class_rw_ext_t *class_rw_t::extAlloc(const class_ro_t *ro, bool deepCopy)
{
    runtimeLock.assertLocked();

    auto rwe = objc::zalloc();

    rwe->version = (ro->flags & RO_META) ? 7 : 0;

    method_list_t *list = ro->baseMethods();
    if (list) {
        if (deepCopy) list = list->duplicate();
        rwe->methods.attachLists(&list, 1);
    }

    // See comments in objc_duplicateClass
    // property lists and protocol lists historically
    // have not been deep-copied
    //
    // This is probably wrong and ought to be fixed some day
    property_list_t *proplist = ro->baseProperties;
    if (proplist) {
        rwe->properties.attachLists(&proplist, 1);
    }

    protocol_list_t *protolist = ro->baseProtocols;
    if (protolist) {
        rwe->protocols.attachLists(&protolist, 1);
    }

    set_ro_or_rwe(rwe, ro);
    return rwe;
}

class_rw_t与class_ro_t的设计哲学

apple为什么会定义两个结构差不多的结构体来实现Class呢？ro：read only，rw：read write，原因是class_ro_t是编译期的产物，类源文件中的属性、方法、协议、成员变量在编译期就存在class_ro_t中，而class_rw_t则是运行时的产物，class_rw_t的设计就是为了支撑Class的动态性，运行时将class_ro_t中的属性、协议、方法动态合并到对应的数据结构

分类真的可以添加属性

那么category呢，源码很长，但是还是忍不住全贴出来了，看见了吧，在调用attachCategories之前一定会调用一句auto rwe = cls->data()->extAllocIfNeeded();，而extAllocIfNeeded()则会调用到extAlloc()函数，extAlloc()内部会执行拷贝ro到rw，所以我们总说category里面的与原类中同名的方法会被优先调用到，原因就在此，以此类推，一个类的多个分类后被加载的分类同名方法总是优先被查询到

static void
attachCategories(Class cls, const locstamped_category_t *cats_list, uint32_t cats_count,
                 int flags)
{
    if (slowpath(PrintReplacedMethods)) {
        printReplacements(cls, cats_list, cats_count);
    }
    if (slowpath(PrintConnecting)) {
        _objc_inform("CLASS: attaching %d categories to%s class '%s'%s",
                     cats_count, (flags & ATTACH_EXISTING) ? " existing" : "",
                     cls->nameForLogging(), (flags & ATTACH_METACLASS) ? " (meta)" : "");
    }

    /*
     * Only a few classes have more than 64 categories during launch.
     * This uses a little stack, and avoids malloc.
     *
     * Categories must be added in the proper order, which is back
     * to front. To do that with the chunking, we iterate cats_list
     * from front to back, build up the local buffers backwards,
     * and call attachLists on the chunks. attachLists prepends the
     * lists, so the final result is in the expected order.
     */
    constexpr uint32_t ATTACH_BUFSIZ = 64;
    method_list_t   *mlists[ATTACH_BUFSIZ];
    property_list_t *proplists[ATTACH_BUFSIZ];
    protocol_list_t *protolists[ATTACH_BUFSIZ];

    uint32_t mcount = 0;
    uint32_t propcount = 0;
    uint32_t protocount = 0;
    bool fromBundle = NO;
    bool isMeta = (flags & ATTACH_METACLASS);
    auto rwe = cls->data()->extAllocIfNeeded();

    for (uint32_t i = 0; i < cats_count; i++) {
        auto& entry = cats_list[i];

        method_list_t *mlist = entry.cat->methodsForMeta(isMeta);
        if (mlist) {
            if (mcount == ATTACH_BUFSIZ) {
                prepareMethodLists(cls, mlists, mcount, NO, fromBundle);
                rwe->methods.attachLists(mlists, mcount);
                mcount = 0;
            }
            mlists[ATTACH_BUFSIZ - ++mcount] = mlist;
            fromBundle |= entry.hi->isBundle();
        }

        property_list_t *proplist =
            entry.cat->propertiesForMeta(isMeta, entry.hi);
        if (proplist) {
            if (propcount == ATTACH_BUFSIZ) {
                rwe->properties.attachLists(proplists, propcount);
                propcount = 0;
            }
            proplists[ATTACH_BUFSIZ - ++propcount] = proplist;
        }

        protocol_list_t *protolist = entry.cat->protocolsForMeta(isMeta);
        if (protolist) {
            if (protocount == ATTACH_BUFSIZ) {
                rwe->protocols.attachLists(protolists, protocount);
                protocount = 0;
            }
            protolists[ATTACH_BUFSIZ - ++protocount] = protolist;
        }
    }

    if (mcount > 0) {
        prepareMethodLists(cls, mlists + ATTACH_BUFSIZ - mcount, mcount, NO, fromBundle);
        rwe->methods.attachLists(mlists + ATTACH_BUFSIZ - mcount, mcount);
        if (flags & ATTACH_EXISTING) flushCaches(cls);
    }

    rwe->properties.attachLists(proplists + ATTACH_BUFSIZ - propcount, propcount);

    rwe->protocols.attachLists(protolists + ATTACH_BUFSIZ - protocount, protocount);
}

如上，我们看分类里面的属性也会被添加到类的属性列表里，那为什么我们说，分类不能添加属性呢？明明添加进去了啊：

嗯~这是因为我们访问属性需要通过点语法，最终是通过get方法访问成员变量，而分类添加的属性不会生成get/set方法，并且成员变成是存在于class_ro_t中，分类并不会动态添加成员变量，更无法通过下划线访问，因为成员变量不存在所以都不能通过编译，那么如何让分类里面添加的属性生效呢，就是需要手动实现getter和setter方法，并且模拟添加成员变量

总结

Class的实现细节较多，本文只讨论了内存结构，下篇打算讨论下isa~

1、Class是什么

继承objc_object的结构体，objc_class类型的指针

2、Class的内存布局

isa

指向类对象，32位下是一个cls指针，64位下会存储类的很多相关信息，如：是否有自定义c++析构函数，是否有关联对象，是否有弱引用，是否用sidetable存储优化引用计数等
superclass
父类的指针
cache
缓存调用过的本类方法列表

class_rw_t

存储动态数据类型的结构体，通过attachLists函数，支持method、property、protocol的动态更新

class_ro_t

静态数据类型，类的初始信息存储在class_ro_t中，运行时，从ro中取出method_list_t、property_list_t、protocol_list_t然后执行attachLists方法合并到rw

3、class_rw_t与class_ro_t的设计哲学

class_rw_t的设计就是为了支撑Class的动态性，运行时将class_ro_t中的属性、协议、方法动态合并到对应的数据结构，注意：不包括成员变量(动态添加删除成员变量会造成内存地址混乱)

4、分类与class_rw_t的关系

attachCategories函数，负责将分类中的方法合并到class_rw_t，再此之前已经将ro合并到rw，因此category里面的与原类中同名的方法会被优先调用到，以此类推，一个类的多个分类后被加载的分类同名方法总是优先被查询到