触发操作同时需要检测server是否有持久化操作,即检测持久化进程是否存在,如果存在那么rehash过程不会操作。因为当有单独进程在进行持久化操作时,会引起数据差异化,即持久化进程所持有的的hash表数据,和主进程所持有的hash表数据会不同。只有在进程创建的那一刻两者的数据时一致的,这是在创建进程时的copy-on-write 引起的。
int htNeedsResize(dict *dict) {
long long size, used;
size = dictSlots(dict);
used = dictSize(dict);
return (size && used && size > DICT_HT_INITIAL_SIZE &&
(used*100/size < REDIS_HT_MINFILL));
/* If the percentage of used slots in the HT reaches REDIS_HT_MINFILL
* we resize the hash table to save memory */
void tryResizeHashTables(int dbid) {
if (htNeedsResize(server.db[dbid].dict))
if (htNeedsResize(server.db[dbid].expires))
哈希表中元素的数量大于槽的数量或者元素的数量/槽的数量大于dict_force_resize_ratio时触发 扩大操作。
/* Expand the hash table if needed */
static int _dictExpandIfNeeded(dict *d)
/* Incremental rehashing already in progress. Return. */
if (dictIsRehashing(d)) return DICT_OK;
/* If the hash table is empty expand it to the initial size. */
if (d->ht[0].size == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE);
/* If we reached the 1:1 ratio, and we are allowed to resize the hash
* table (global setting) or we should avoid it but the ratio between
* elements/buckets is over the "safe" threshold, we resize doubling
* the number of buckets. */
//哈希表中元素的数量大于槽的数量或者元素的数量/槽的数量大于dict_force_resize_ratio时触发 扩大操作
if (d->ht[0].used >= d->ht[0].size &&
(dict_can_resize ||
d->ht[0].used/d->ht[0].size > dict_force_resize_ratio))
return dictExpand(d, ((d->ht[0].size > d->ht[0].used) ?
d->ht[0].size : d->ht[0].used)*2);
return DICT_OK;
从上面的代码看出来,无论缩小还是扩大,都调用了int dictExpand(dict *d, size_t size)函数
int dictExpand(dict *d, size_t size)
dictht n; /* the new hash table */
size_t realsize = _dictNextPower(size);
/* the size is invalid if it is smaller than the number of
* elements already inside the hash table */
if (dictIsRehashing(d) || d->ht[0].used > size)
return DICT_ERR;
/* Allocate the new hash table and initialize all pointers to NULL */
n.size = realsize;
n.sizemask = realsize-1;
n.table = zcalloc(realsize*sizeof(dictEntry*));
n.used = (size_t) 0;
/* Is this the first initialization? If so it's not really a rehashing
* we just set the first hash table so that it can accept keys. */
if (d->ht[0].table == NULL) {
d->ht[0] = n;
return DICT_OK;
/* Prepare a second hash table for incremental rehashing */
d->ht[1] = n;
d->rehashidx = 0;
/* Expand or create the hash table */
return DICT_OK;
每个数据路结构都有两个哈希表。当没达到触发条件时,使用0号哈希表,接下来的set的数据都保存在0号哈希表中,当达到触发条件后,根据新的size创建1号哈希表,并设置d->rehashidx为非-1,意味着开始转移数据,此时新添加的数据都会放到1号哈希表中,旧数据会分为lazy rehash 和active rehashing 过程。
这是redis的有关性能的考虑,考虑到数据量很大时,一次就所有的旧数据转移,此时转移的过程中,新的客户端请求都会阻塞,会带来的较大的延时。lazy rehash就是每当有客户端请求时,检查d->rehashidx是否正在rehash,如果正在经历rehash过程,那么直rehash一个哈希表的槽。具体的执行函数是_dictRehashStep(d);
static void _dictRehashStep(dict *d) {
if (d->iterators == 0) dictRehash(d,1);
int dictRehash(dict *d, int n) {
if (!dictIsRehashing(d)) return 0;
while(n--) {
dictEntry *de, *nextde;
if (d->ht[0].used == 0) {
d->ht[0] = d->ht[1];
d->rehashidx = -1;
return 0;
/* Note that rehashidx can't overflow as we are sure there are more
* elements because ht[0].used != 0 */
assert(d->ht[0].size > (unsigned)d->rehashidx);
while(d->ht[0].table[d->rehashidx] == NULL) d->rehashidx++;
de = d->ht[0].table[d->rehashidx];
/* Move all the keys in this bucket from the old to the new hash HT */
while(de) {
unsigned int h;
nextde = de->next;
/* Get the index in the new hash table */
h = dictHashKey(d, de->key) & d->ht[1].sizemask;
de->next = d->ht[1].table[h];
d->ht[1].table[h] = de;
de = nextde;
d->ht[0].table[d->rehashidx] = NULL;
return 1;
active发生在timeEvent事件中,在timeEvent中,事件函数是serverCron() ,redis中时间事件只注册一个。
(1)过期key的收集工作,收集的方式和rehash方式一样,分为active和lazy collection.
(3) update 一些静态数据
(4)rehash 哈希表,也就是上面介绍的
(5)触发BGSAVE / AOF读写,以及处理中断的子进程,BGSAVE / AOF进程主要是数据持久化的操作,后面针对这两个再分别写一篇文章
(6) 处理不同类型的客户端超时操作
(7) Replication reconnection 应该是和集群操作有关,后面会专门看redis的集群操作
active reshing的操作 主要在此函数中
void databasesCron(void) {
/* Expire keys by random sampling. Not required for slaves
* as master will synthesize DELs for us. */
if (server.active_expire_enabled && server.masterhost == NULL)
/* Perform hash tables rehashing if needed, but only if there are no
* other processes saving the DB on disk. Otherwise rehashing is bad
* as will cause a lot of copy-on-write of memory pages. */
if (server.rdb_child_pid == -1 && server.aof_child_pid == -1) {
/* We use global counters so if we stop the computation at a given
* DB we'll be able to start from the successive in the next
* cron loop iteration. */
static unsigned int resize_db = 0;
static unsigned int rehash_db = 0;
unsigned int dbs_per_call = REDIS_DBCRON_DBS_PER_CALL;
unsigned int j;
/* Don't test more DBs than we have. */
if (dbs_per_call > (unsigned)server.dbnum) dbs_per_call = server.dbnum;
/* Resize */
for (j = 0; j < dbs_per_call; j++) {
tryResizeHashTables(resize_db % server.dbnum);
/* Rehash */
if (server.activerehashing) {
for (j = 0; j < dbs_per_call; j++) {
int work_done = incrementallyRehash(rehash_db % server.dbnum);
if (work_done) {
/* If the function did some work, stop here, we'll do
* more at the next cron loop. */
同样考虑到性能考虑,给rehash操作的占用cpu的时间职位1毫秒,见下面,下面也就是每次 dictRehash 100个槽,如果while循环过程中,时间超过1ms,那么直接退出循环,进行其他数据。
/* Rehash for an amount of time between ms milliseconds and ms+1 milliseconds */
int dictRehashMilliseconds(dict *d, int ms) {
long long start = timeInMilliseconds();
int rehashes = 0;
while(dictRehash(d,100)) {
rehashes += 100;
if (timeInMilliseconds()-start > ms) break;
return rehashes;
这篇文章主要介绍了redis的resize和rehash 哈希表的过程,redis为了兼顾性能的考虑,分为lazy和active的两种rehash操作,同时进行,直到rehash完成。