Redis源码剖析之robj(redisObject)

我们在之前的文章中已经了解过一部分Redis的数据结构了，尤其是dict 中讲到，可以把redis看做一个hashtable，存储了一堆的key-value，今天就来看下key-value中value的主要存储结构redisObject(后文统称robj)。
robj的详细代码见object.c

字段详解

相对与其他几个数据结构，robj相对简单，因为只包含了几个字段，含义都很明确。

typedef struct redisObject {
    unsigned type:4;       // 数据类型  integer  string  list  set
    unsigned encoding:4;
    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
                            * LFU data (least significant 8 bits frequency
                            * and most significant 16 bits access time). 
                            * redis用24个位来保存LRU和LFU的信息，当使用LRU时保存上次
                            * 读写的时间戳(秒),使用LFU时保存上次时间戳(16位 min级) 保存近似统计数8位 */
    int refcount;          // 引用计数 
    void *ptr;              // 指针指向具体存储的值，类型用type区分
} robj;

核心就五个字段，我们分别来介绍下。

type(4位)

type是表示当然robj里所存储的数据类型，目前redis中包含以下几种类型。

标识符	值	含义
OBJ_STRING	0	字符串(string)
OBJ_LIST	1	列表(list)
OBJ_SET	2	集合(set)
OBJ_ZSET	3	有序集(zset)
OBJ_HASH	4	哈希表(hash)
OBJ_MODULE	5	模块(module)
OBJ_STREAM	6	流(stream)

encoding(4位)

编码方式，如果说每个类型只有一种方式，那么其实type和encoding两个字段只需要保留一个即可，但redis为了在各种情况下尽可能介绍内存，对每种类型的数据在不同情况下有不同的编码格式，所以这里需要用额外的字段标识出来。目前有以下几种编码(redis 6.2)。

标识符	值	含义
OBJ_ENCODING_RAW	0	最原始的标识方式，只有string才会用到
OBJ_ENCODING_INT	1	整数
OBJ_ENCODING_HT	2	dict
OBJ_ENCODING_ZIPMAP	3	zipmap 目前已经不再使用
OBJ_ENCODING_LINKEDLIST	4	就的链表，现在已经不再使用了
OBJ_ENCODING_ZIPLIST	5	ziplist
OBJ_ENCODING_INTSET	6	intset
OBJ_ENCODING_SKIPLIST	7	跳表 skiplist
OBJ_ENCODING_EMBSTR	8	嵌入式的sds
OBJ_ENCODING_QUICKLIST	9	快表 quicklist
OBJ_ENCODING_STREAM	10	流 stream

这里有个OBJ_ENCODING_EMBSTR，这里着重介绍下。

robj *createEmbeddedStringObject(const char *ptr, size_t len) {
    robj *o = zmalloc(sizeof(robj)+sizeof(struct sdshdr8)+len+1);
    struct sdshdr8 *sh = (void*)(o+1);

    o->type = OBJ_STRING;
    o->encoding = OBJ_ENCODING_EMBSTR;
    o->ptr = sh+1;
    o->refcount = 1;
    if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
        o->lru = (LFUGetTimeInMinutes()<<8) | LFU_INIT_VAL;
    } else {
        o->lru = LRU_CLOCK();
    }

    sh->len = len;
    sh->alloc = len;
    sh->flags = SDS_TYPE_8;
    if (ptr == SDS_NOINIT)
        sh->buf[len] = '\0';
    else if (ptr) {
        memcpy(sh->buf,ptr,len);
        sh->buf[len] = '\0';
    } else {
        memset(sh->buf,0,len+1);
    }
    return o;
}

从上面代码就可以看出，它是robj和sds的一个结合，将sds直接放在robj里，这里限制最多可以存放44字节长度的字符串。因为robj占16字节，sdshdr8头占3字节，'\0'一个字节，限制字符串最长为44就可以保证在64个字节里存放下所有内容（16+3+1+44==64）。

lru(24位)

众所周知，redis提供了过期数据自动淘汰的策略，如何知道数据是否已经过期？按照什么样的策略淘汰数据？这俩问题的答案都和 lru 这个字段有关。redis给了lru这个字段24位，但千万别以为字段名叫lru就认为它只是LRU淘汰策略中才会使用的，其实LFU用的也是这个字段。 我估计是redis作者先写了lru策略，所以直接就叫lru了，后来再加lfu策略的时候直接复用这个字段了。
lru字段在不同淘汰策略时有不同的含义。当使用LRU时，它就是一个24位的秒级unix时间戳，代表这个数据在第多少秒被更新过。但使用LFU策略时，24位会被分为两部分，16位的分钟级时间戳和8位的特殊计数器，这里就不再详解了，更具体可以关注我后续的博文。

refcount

引用计数，表示这个robj目前被多少个地方应用，refcount的出现为对象复用提供了基础。了解过垃圾回收的同学都知道有中回收策略就是采用计数器的方式，当refcount为0时，说明该对象已经没用了，就可以被回收掉了，redis的作者也实现了这种引用回收的策略。

*ptr

这个就很简单了，前面几个字段是为当然robj提供meta信息，那这个字段就是数据具体所在地址。

robj的编解码

redis向来将内存空间节省做到了极致，这里redis的作者又对字符串类型的robj做了特殊的编码处理，以达到节省内存的目的，编码过程的代码及注释如下：

/* 将string类型的robj做特殊编码，以节省存储空间  */
robj *tryObjectEncoding(robj *o) {
    long value;
    sds s = o->ptr;
    size_t len;

    /* Make sure this is a string object, the only type we encode
     * in this function. Other types use encoded memory efficient
     * representations but are handled by the commands implementing
     * the type. 
     * 这里只编码string对象，其他类型的的编码都由其对应的实现处理 */
    serverAssertWithInfo(NULL,o,o->type == OBJ_STRING);

    /* We try some specialized encoding only for objects that are
     * RAW or EMBSTR encoded, in other words objects that are still
     * in represented by an actually array of chars.
     * 非sds string直接返回原数据 */
    if (!sdsEncodedObject(o)) return o;

    /* It's not safe to encode shared objects: shared objects can be shared
     * everywhere in the "object space" of Redis and may end in places where
     * they are not handled. We handle them only as values in the keyspace. 
     * 如果是共享的对象，不能编码，因为可能会影响到其他地方的使用*/
     if (o->refcount > 1) return o;

    /* Check if we can represent this string as a long integer.
     * Note that we are sure that a string larger than 20 chars is not
     * representable as a 32 nor 64 bit integer. 
     * 检查是否可以把字符串表示为一个长整型数。注意如果长度大于20个字符的字符串是
     * 不能被表示为32或者64位的整数的*/
    len = sdslen(s);
    if (len <= 20 && string2l(s,len,&value)) {
        /* This object is encodable as a long. Try to use a shared object.
         * Note that we avoid using shared integers when maxmemory is used
         * because every object needs to have a private LRU field for the LRU
         * algorithm to work well. 
         * 如果可以被编码为long型，且编码后的值小于OBJ_SHARED_INTEGERS(10000)，且未配
         * 置LRU替换淘汰策略, 就使用这个数的共享对象，相当于所有小于10000的数都是用的同一个robj*/
        if ((server.maxmemory == 0 ||
            !(server.maxmemory_policy & MAXMEMORY_FLAG_NO_SHARED_INTEGERS)) &&
            value >= 0 &&
            value < OBJ_SHARED_INTEGERS)
        {
            decrRefCount(o);
            incrRefCount(shared.integers[value]);
            return shared.integers[value];
        } else {
            /* 否则原来如果是RAW类型，直接转为OBJ_ENCODING_INT类型，然后用long来直接存储字符串 */    
            if (o->encoding == OBJ_ENCODING_RAW) {
                sdsfree(o->ptr);
                o->encoding = OBJ_ENCODING_INT;
                o->ptr = (void*) value;
                return o;
            /*如果是OBJ_ENCODING_EMBSTR，也会转化为OBJ_ENCODING_INT，并用long存储字符串*/
            } else if (o->encoding == OBJ_ENCODING_EMBSTR) {
                decrRefCount(o);
                return createStringObjectFromLongLongForValue(value);
            }
        }
    }
    // 对于那些无法转为long的字符串，做如下处理

    /* If the string is small and is still RAW encoded,
     * try the EMBSTR encoding which is more efficient.
     * In this representation the object and the SDS string are allocated
     * in the same chunk of memory to save space and cache misses. 
     * 如果字符串太小，长度小于等于44，直接转为OBJ_ENCODING_EMBSTR*/
    if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT) {
        robj *emb;

        if (o->encoding == OBJ_ENCODING_EMBSTR) return o;
        emb = createEmbeddedStringObject(s,sdslen(s));
        decrRefCount(o);
        return emb;
    }

    /* We can't encode the object...
     *
     * Do the last try, and at least optimize the SDS string inside
     * the string object to require little space, in case there
     * is more than 10% of free space at the end of the SDS string.
     *
     * We do that only for relatively large strings as this branch
     * is only entered if the length of the string is greater than
     * OBJ_ENCODING_EMBSTR_SIZE_LIMIT. 
     * 
     * 如果前面没有编码成功，这里做最后一次尝试，如果sds有超过10%的可用空闲空间，
     * 且字符长度大于OBJ_ENCODING_EMBSTR_SIZE_LIMIT(44)那尝试释放sds中多余
     * 的空间以节省内存。
     **/
    trimStringObjectIfNeeded(o);

    /* 直接返回原始对象. */
    return o;
}

检查是否是字符串，如果不是直接返回。
检查是否是共享对象(refcount > 1)，被共享的对象不做编码。
如果字符串长度小于等于20，直接可以编码为一个long型的整数，这里小于10000的long对象都是共享的。
如果字符串长度小于等于44，直接用OBJ_ENCODING_EMBSTR存储。
如果没有被编码，且字符串长度超过44，且sds中的空闲空间超过10%，则清除空闲空间，以节省内存。

当然有编码就有解码，代码及如下，相对比较简单：

/* Get a decoded version of an encoded object (returned as a new object).
 * If the object is already raw-encoded just increment the ref count.
 * 获取解码后的对象(返回的是有个新对象)，如果这个对象是个原始类型，只是把引用加一。 */
robj *getDecodedObject(robj *o) {
    robj *dec;

    if (sdsEncodedObject(o)) {
        incrRefCount(o);
        return o;
    }
    if (o->type == OBJ_STRING && o->encoding == OBJ_ENCODING_INT) {
        char buf[32];

        ll2string(buf,32,(long)o->ptr);
        dec = createStringObject(buf,strlen(buf));
        return dec;
    } else {
        serverPanic("Unknown encoding type");
    }
}

引用计数和自动清理

上文已经说到了，redis为了节省空间，会复用一些对象，没有引用的对象会被自动清理。作者用了引用计数的方式来实现gc，代码也比较简单，如下：

void incrRefCount(robj *o) {
    if (o->refcount < OBJ_FIRST_SPECIAL_REFCOUNT) {
        o->refcount++;
    } else {
        if (o->refcount == OBJ_SHARED_REFCOUNT) {
            /* Nothing to do: this refcount is immutable. */
        } else if (o->refcount == OBJ_STATIC_REFCOUNT) {
            serverPanic("You tried to retain an object allocated in the stack");
        }
    }
}
/* 减少引用计数，如果没有引用了就释放内存空间 */
void decrRefCount(robj *o) {
    // 清理空间 
    if (o->refcount == 1) {
        switch(o->type) {
        case OBJ_STRING: freeStringObject(o); break;
        case OBJ_LIST: freeListObject(o); break;
        case OBJ_SET: freeSetObject(o); break;
        case OBJ_ZSET: freeZsetObject(o); break;
        case OBJ_HASH: freeHashObject(o); break;
        case OBJ_MODULE: freeModuleObject(o); break;
        case OBJ_STREAM: freeStreamObject(o); break;
        default: serverPanic("Unknown object type"); break;
        }
        zfree(o);
    } else {
        if (o->refcount <= 0) serverPanic("decrRefCount against refcount <= 0");
        if (o->refcount != OBJ_SHARED_REFCOUNT) o->refcount--;
    }
}

总结

总结下，可以认为robj有这样几个作用。

为所有类型的value提供一个统一的封装。
为数据淘汰保存必要的信息。
实现数据复用，和自动gc功能。

本文是Redis源码剖析系列博文，同时也有与之对应的Redis中文注释版，有想深入学习Redis的同学，欢迎star和关注。
Redis中文注解版仓库：https://github.com/xindoo/Redis
Redis源码剖析专栏：https://zxs.io/s/1h
如果觉得本文对你有用，欢迎一键三连。
本文来自https://blog.csdn.net/xindoo