redis有序集合zset的底层实现——跳跃表skiplist

redis有序集合zset的底层实现——跳跃表skiplist

redis作为一种内存KV数据库,提供了string, hash, list, set, zset等多种数据结构。其中有序集合zset在增删改查的性质上类似于C++ stl的map和Java的TreeMap,提供了一组“键-值”对,并且“键”按照“值”的顺序排序。但是与C++ stl或Java的红黑树实现不同的是,redis中有序集合的实现采用了另一种数据结构——跳跃表。跳跃表是有序单链表的一种改进,其查询、插入、删除也是O(logN)的时间复杂度。

跳跃表的原理

跳跃表的思想来自于一篇论文:Skip Lists: A Probabilistic Alternative to Balanced Trees. 如果想要深入了解跳跃表,可以阅读论文原文。这里引用论文中的一幅图对跳跃表的原理作一个简单的说明。

redis有序集合zset的底层实现——跳跃表skiplist_第1张图片

上图用a,b,c,d,e五种有序链表及其变式(变式的名字是我随便起的)说明了跳跃表的motivation.

  • [a]单链表:查询时间复杂度O(n)
  • [b]level-2单链表:每隔一个节点为一个level-2节点,每个level-2节点有2个后继指针,分别指向单链表中的下一个节点和下一个level-2节点。查询时间复杂度为O(n/2)
  • [c]level-3单链表:每隔一个节点为一个level-2节点,每隔4个节点为一个level-3节点,查询时间复杂度O(n/4)
  • [d]指数式单链表:每2^i个节点的level为i+1,查询时间复杂度为O(log2N)
  • [e]跳跃表:各个level的节点个数同指数式单链表,但出现的位置随机,查询复杂度仍然是O(log2N)吗

之所以这里关心查询复杂度,因为有序链表的插入和删除复杂度等于查询复杂度。

作为一种概率性算法,文章证明了跳跃表查询复杂度的期望是O(logN).

redis有序集合采用跳跃表的原因

为什么redis的有序集合采用跳跃表而不是红黑树呢?对于这个问题,可以在https://news.ycombinator.com/item?id=1171423找到作者本人的一个回答

  1. They are not very memory intensive. It’s up to you basically. Changing parameters about the probability of a node to have a given number of levels will make then less memory intensive than btrees.

  2. A sorted set is often target of many ZRANGE or ZREVRANGE operations, that is, traversing the skip list as a linked list. With this operation the cache locality of skip lists is at least as good as with other kind of balanced trees.

  3. They are simpler to implement, debug, and so forth. For instance thanks to the skip list simplicity I received a patch (already in Redis master) with augmented skip lists implementing ZRANK in O(log(N)). It required little changes to the code.

About the Append Only durability & speed, I don’t think it is a good idea to optimize Redis at cost of more code and more complexity for a use case that IMHO should be rare for the Redis target (fsync() at every command). Almost no one is using this feature even with ACID SQL databases, as the performance hint is big anyway.

About threads: our experience shows that Redis is mostly I/O bound. I’m using threads to serve things from Virtual Memory. The long term solution to exploit all the cores, assuming your link is so fast that you can saturate a single core, is running multiple instances of Redis (no locks, almost fully scalable linearly with number of cores), and using the “Redis Cluster” solution that I plan to develop in the future.

可以看到redis选择跳跃表而非红黑树作为有序集合实现方式的原因并非是基于并发上的考虑,因为redis是单线程的,选用跳跃表的原因仅仅是因为跳跃表的实现相较于红黑树更加简洁。

redis跳跃表的源码

跳跃表节点,跳跃表和zset结构体的定义在server.h

/* ZSETs use a specialized version of Skiplists */
typedef struct zskiplistNode {
    sds ele;
    double score;
    struct zskiplistNode *backward;
    struct zskiplistLevel {
        struct zskiplistNode *forward;
        unsigned long span;
    } level[];
} zskiplistNode;

typedef struct zskiplist {
    struct zskiplistNode *header, *tail;
    unsigned long length;
    int level;
} zskiplist;

typedef struct zset {
    dict *dict;
    zskiplist *zsl;
} zset;

skiplist相关函数定义在t_zset.c中。跳跃表中,一个节点的level符合一定的概率,决定一个新增节点的level的函数是zslRandomLevel

/* Returns a random level for the new skiplist node we are going to create.
 * The return value of this function is between 1 and ZSKIPLIST_MAXLEVEL
 * (both inclusive), with a powerlaw-alike distribution where higher
 * levels are less likely to be returned. */
int zslRandomLevel(void) {
    int level = 1;
    while ((random()&0xFFFF) < (ZSKIPLIST_P * 0xFFFF))
        level += 1;
    return (level<ZSKIPLIST_MAXLEVEL) ? level : ZSKIPLIST_MAXLEVEL;
}

该函数被用于skiplist的节点插入函数zslInsert.

你可能感兴趣的:(redis)