如果你对rehash还不是很了解,请参考:图解rehash
Redis为了兼顾性能的考虑,分为lazy和active的两种rehash操作,同时进行,直到rehash完成。
函数的调用过程:_dictRehashStep–> dictRehash
(1)dict.c/_dictRehashStep函数源码如下
/* This function performs just a step of rehashing, and only if there are
* no safe iterators bound to our hash table. When we have iterators in the
* middle of a rehashing we can't mess with the two hash tables otherwise
* some element can be missed or duplicated.
*
* 在字典不存在安全迭代器的情况下,对字典进行单步 rehash 。
*
* 字典有安全迭代器的情况下不能进行 rehash ,
* 因为两种不同的迭代和修改操作可能会弄乱字典。
*
* This function is called by common lookup or update operations in the
* dictionary so that the hash table automatically migrates from H1 to H2
* while it is actively used.
*
* 这个函数被多个通用的查找、更新操作调用,
* 它可以让字典在被使用的同时进行 rehash 。
*
* T = O(1)
*/
static void _dictRehashStep(dict *d) {
if (d->iterators == 0) dictRehash(d,1);
}
(2)dict.c/dictRehash函数源码如下
/* Performs N steps of incremental rehashing. Returns 1 if there are still
* keys to move from the old to the new hash table, otherwise 0 is returned.
*
* 执行 N 步渐进式 rehash 。
*
* 返回 1 表示仍有键需要从 0 号哈希表移动到 1 号哈希表,
* 返回 0 则表示所有键都已经迁移完毕。
*
* Note that a rehashing step consists in moving a bucket (that may have more
* than one key as we use chaining) from the old to the new hash table.
*
* 注意,每步 rehash 都是以一个哈希表索引(桶)作为单位的,
* 一个桶里可能会有多个节点,
* 被 rehash 的桶里的所有节点都会被移动到新哈希表。
*
* T = O(N)
*/
int dictRehash(dict *d, int n) {
// 只可以在 rehash 进行中时执行
if (!dictIsRehashing(d)) return 0;
// 进行 N 步迁移
// T = O(N)
while(n--) {
dictEntry *de, *nextde;
/* Check if we already rehashed the whole table... */
// 如果 0 号哈希表为空,那么表示 rehash 执行完毕
// T = O(1)
if (d->ht[0].used == 0) {
// 释放 0 号哈希表
zfree(d->ht[0].table);
// 将原来的 1 号哈希表设置为新的 0 号哈希表
d->ht[0] = d->ht[1];
// 重置旧的 1 号哈希表
_dictReset(&d->ht[1]);
// 关闭 rehash 标识
d->rehashidx = -1;
// 返回 0 ,向调用者表示 rehash 已经完成
return 0;
}
/* Note that rehashidx can't overflow as we are sure there are more
* elements because ht[0].used != 0 */
// 确保 rehashidx 没有越界
assert(d->ht[0].size > (unsigned)d->rehashidx);
// 略过数组中为空的索引,找到下一个非空索引
while(d->ht[0].table[d->rehashidx] == NULL) d->rehashidx++;
// 指向该索引的链表表头节点
de = d->ht[0].table[d->rehashidx];
/* Move all the keys in this bucket from the old to the new hash HT */
// 将链表中的所有节点迁移到新哈希表
// T = O(1)
while(de) {
unsigned int h;
// 保存下个节点的指针
nextde = de->next;
/* Get the index in the new hash table */
// 计算新哈希表的哈希值,以及节点插入的索引位置
h = dictHashKey(d, de->key) & d->ht[1].sizemask;
// 插入节点到新哈希表
de->next = d->ht[1].table[h];
d->ht[1].table[h] = de;
// 更新计数器
d->ht[0].used--;
d->ht[1].used++;
// 继续处理下个节点
de = nextde;
}
// 将刚迁移完的哈希表索引的指针设为空
d->ht[0].table[d->rehashidx] = NULL;
// 更新 rehash 索引
d->rehashidx++;
}
return 1;
}
(3)源码分析
在_dictRehashStep函数中,会调用dictRehash方法,而_dictRehashStep每次仅会rehash一个值从ht[0]到 ht[1],但由于_dictRehashStep是被dictGetRandomKey、dictFind、 dictGenericDelete、dictAdd调用的,因此在每次dict增删查改时都会被调用,这无疑就加快了rehash过程。
在dictRehash函数中每次增量rehash n个元素,由于在自动调整大小时已设置好了ht[1]的大小,因此rehash的主要过程就是遍历ht[0],取得key,然后将该key按ht[1]的 桶的大小重新rehash,并在rehash完后将ht[0]指向ht[1],然后将ht[1]清空。在这个过程中rehashidx非常重要,它表示上次rehash时在ht[0]的下标位置。
函数调用的过程如下:
serverCron->databasesCron–>incrementallyRehash->dictRehashMilliseconds->dictRehash
(1)redis.c/serverCron函数的源码
/* This is our timer interrupt, called server.hz times per second.
*
* 这是 Redis 的时间中断器,每秒调用 server.hz 次。
*
* Here is where we do a number of things that need to be done asynchronously.
* For instance:
*
* 以下是需要异步执行的操作:
*
* - Active expired keys collection (it is also performed in a lazy way on
* lookup).
* 主动清除过期键。
*
* - Software watchdog.
* 更新软件 watchdog 的信息。
*
* - Update some statistic.
* 更新统计信息。
*
* - Incremental rehashing of the DBs hash tables.
* 对数据库进行渐增式 Rehash
*
* - Triggering BGSAVE / AOF rewrite, and handling of terminated children.
* 触发 BGSAVE 或者 AOF 重写,并处理之后由 BGSAVE 和 AOF 重写引发的子进程停止。
*
* - Clients timeout of different kinds.
* 处理客户端超时。
*
* - Replication reconnection.
* 复制重连
*
* - Many more...
* 等等。。。
*
* Everything directly called here will be called server.hz times per second,
* so in order to throttle execution of things we want to do less frequently
* a macro is used: run_with_period(milliseconds) { .... }
*
* 因为 serverCron 函数中的所有代码都会每秒调用 server.hz 次,
* 为了对部分代码的调用次数进行限制,
* 使用了一个宏 run_with_period(milliseconds) { ... } ,
* 这个宏可以将被包含代码的执行次数降低为每 milliseconds 执行一次。
*/
int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
int j;
REDIS_NOTUSED(eventLoop);
REDIS_NOTUSED(id);
REDIS_NOTUSED(clientData);
/* Software watchdog: deliver the SIGALRM that will reach the signal
* handler if we don't return here fast enough. */
if (server.watchdog_period) watchdogScheduleSignal(server.watchdog_period);
/* Update the time cache. */
updateCachedTime();
// 记录服务器执行命令的次数
run_with_period(100) trackOperationsPerSecond();
/* We have just REDIS_LRU_BITS bits per object for LRU information.
* So we use an (eventually wrapping) LRU clock.
*
* Note that even if the counter wraps it's not a big problem,
* everything will still work but some object will appear younger
* to Redis. However for this to happen a given object should never be
* touched for all the time needed to the counter to wrap, which is
* not likely.
*
* 即使服务器的时间最终比 1.5 年长也无所谓,
* 对象系统仍会正常运作,不过一些对象可能会比服务器本身的时钟更年轻。
* 不过这要这个对象在 1.5 年内都没有被访问过,才会出现这种现象。
*
* Note that you can change the resolution altering the
* REDIS_LRU_CLOCK_RESOLUTION define.
*
* LRU 时间的精度可以通过修改 REDIS_LRU_CLOCK_RESOLUTION 常量来改变。
*/
server.lruclock = getLRUClock();
/* Record the max memory used since the server was started. */
// 记录服务器的内存峰值
if (zmalloc_used_memory() > server.stat_peak_memory)
server.stat_peak_memory = zmalloc_used_memory();
/* Sample the RSS here since this is a relatively slow call. */
server.resident_set_size = zmalloc_get_rss();
/* We received a SIGTERM, shutting down here in a safe way, as it is
* not ok doing so inside the signal handler. */
// 服务器进程收到 SIGTERM 信号,关闭服务器
if (server.shutdown_asap) {
// 尝试关闭服务器
if (prepareForShutdown(0) == REDIS_OK) exit(0);
// 如果关闭失败,那么打印 LOG ,并移除关闭标识
redisLog(REDIS_WARNING,"SIGTERM received but errors trying to shut down the server, check the logs for more information");
server.shutdown_asap = 0;
}
/* Show some info about non-empty databases */
// 打印数据库的键值对信息
run_with_period(5000) {
for (j = 0; j < server.dbnum; j++) {
long long size, used, vkeys;
// 可用键值对的数量
size = dictSlots(server.db[j].dict);
// 已用键值对的数量
used = dictSize(server.db[j].dict);
// 带有过期时间的键值对数量
vkeys = dictSize(server.db[j].expires);
// 用 LOG 打印数量
if (used || vkeys) {
redisLog(REDIS_VERBOSE,"DB %d: %lld keys (%lld volatile) in %lld slots HT.",j,used,vkeys,size);
/* dictPrintStats(server.dict); */
}
}
}
/* Show information about connected clients */
// 如果服务器没有运行在 SENTINEL 模式下,那么打印客户端的连接信息
if (!server.sentinel_mode) {
run_with_period(5000) {
redisLog(REDIS_VERBOSE,
"%lu clients connected (%lu slaves), %zu bytes in use",
listLength(server.clients)-listLength(server.slaves),
listLength(server.slaves),
zmalloc_used_memory());
}
}
/* We need to do a few operations on clients asynchronously. */
// 检查客户端,关闭超时客户端,并释放客户端多余的缓冲区
clientsCron();
/* Handle background operations on Redis databases. */
// 对数据库执行各种操作
databasesCron();
/* Start a scheduled AOF rewrite if this was requested by the user while
* a BGSAVE was in progress. */
// 如果 BGSAVE 和 BGREWRITEAOF 都没有在执行
// 并且有一个 BGREWRITEAOF 在等待,那么执行 BGREWRITEAOF
if (server.rdb_child_pid == -1 && server.aof_child_pid == -1 &&
server.aof_rewrite_scheduled)
{
rewriteAppendOnlyFileBackground();
}
/* Check if a background saving or AOF rewrite in progress terminated. */
// 检查 BGSAVE 或者 BGREWRITEAOF 是否已经执行完毕
if (server.rdb_child_pid != -1 || server.aof_child_pid != -1) {
int statloc;
pid_t pid;
// 接收子进程发来的信号,非阻塞
if ((pid = wait3(&statloc,WNOHANG,NULL)) != 0) {
int exitcode = WEXITSTATUS(statloc);
int bysignal = 0;
if (WIFSIGNALED(statloc)) bysignal = WTERMSIG(statloc);
// BGSAVE 执行完毕
if (pid == server.rdb_child_pid) {
backgroundSaveDoneHandler(exitcode,bysignal);
// BGREWRITEAOF 执行完毕
} else if (pid == server.aof_child_pid) {
backgroundRewriteDoneHandler(exitcode,bysignal);
} else {
redisLog(REDIS_WARNING,
"Warning, detected child with unmatched pid: %ld",
(long)pid);
}
updateDictResizePolicy();
}
} else {
/* If there is not a background saving/rewrite in progress check if
* we have to save/rewrite now */
// 既然没有 BGSAVE 或者 BGREWRITEAOF 在执行,那么检查是否需要执行它们
// 遍历所有保存条件,看是否需要执行 BGSAVE 命令
for (j = 0; j < server.saveparamslen; j++) {
struct saveparam *sp = server.saveparams+j;
/* Save if we reached the given amount of changes,
* the given amount of seconds, and if the latest bgsave was
* successful or if, in case of an error, at least
* REDIS_BGSAVE_RETRY_DELAY seconds already elapsed. */
// 检查是否有某个保存条件已经满足了
if (server.dirty >= sp->changes &&
server.unixtime-server.lastsave > sp->seconds &&
(server.unixtime-server.lastbgsave_try >
REDIS_BGSAVE_RETRY_DELAY ||
server.lastbgsave_status == REDIS_OK))
{
redisLog(REDIS_NOTICE,"%d changes in %d seconds. Saving...",
sp->changes, (int)sp->seconds);
// 执行 BGSAVE
rdbSaveBackground(server.rdb_filename);
break;
}
}
/* Trigger an AOF rewrite if needed */
// 出发 BGREWRITEAOF
if (server.rdb_child_pid == -1 &&
server.aof_child_pid == -1 &&
server.aof_rewrite_perc &&
// AOF 文件的当前大小大于执行 BGREWRITEAOF 所需的最小大小
server.aof_current_size > server.aof_rewrite_min_size)
{
// 上一次完成 AOF 写入之后,AOF 文件的大小
long long base = server.aof_rewrite_base_size ?
server.aof_rewrite_base_size : 1;
// AOF 文件当前的体积相对于 base 的体积的百分比
long long growth = (server.aof_current_size*100/base) - 100;
// 如果增长体积的百分比超过了 growth ,那么执行 BGREWRITEAOF
if (growth >= server.aof_rewrite_perc) {
redisLog(REDIS_NOTICE,"Starting automatic rewriting of AOF on %lld%% growth",growth);
// 执行 BGREWRITEAOF
rewriteAppendOnlyFileBackground();
}
}
}
// 根据 AOF 政策,
// 考虑是否需要将 AOF 缓冲区中的内容写入到 AOF 文件中
/* AOF postponed flush: Try at every cron cycle if the slow fsync
* completed. */
if (server.aof_flush_postponed_start) flushAppendOnlyFile(0);
/* AOF write errors: in this case we have a buffer to flush as well and
* clear the AOF error in case of success to make the DB writable again,
* however to try every second is enough in case of 'hz' is set to
* an higher frequency. */
run_with_period(1000) {
if (server.aof_last_write_status == REDIS_ERR)
flushAppendOnlyFile(0);
}
/* Close clients that need to be closed asynchronous */
// 关闭那些需要异步关闭的客户端
freeClientsInAsyncFreeQueue();
/* Clear the paused clients flag if needed. */
clientsArePaused(); /* Don't check return value, just use the side effect. */
/* Replication cron function -- used to reconnect to master and
* to detect transfer failures. */
// 复制函数
// 重连接主服务器、向主服务器发送 ACK 、判断数据发送失败情况、断开本服务器超时的从服务器,等等
run_with_period(1000) replicationCron();
/* Run the Redis Cluster cron. */
// 如果服务器运行在集群模式下,那么执行集群操作
run_with_period(100) {
if (server.cluster_enabled) clusterCron();
}
/* Run the Sentinel timer if we are in sentinel mode. */
// 如果服务器运行在 sentinel 模式下,那么执行 SENTINEL 的主函数
run_with_period(100) {
if (server.sentinel_mode) sentinelTimer();
}
/* Cleanup expired MIGRATE cached sockets. */
// 集群。。。TODO
run_with_period(1000) {
migrateCloseTimedoutSockets();
}
// 增加 loop 计数器
server.cronloops++;
return 1000/server.hz;
}
(2)redis.c/databasesCron函数源码如下
/* This function handles 'background' operations we are required to do
* incrementally in Redis databases, such as active key expiring, resizing,
* rehashing. */
// 对数据库执行删除过期键,调整大小,以及主动和渐进式 rehash
void databasesCron(void) {
// 函数先从数据库中删除过期键,然后再对数据库的大小进行修改
/* Expire keys by random sampling. Not required for slaves
* as master will synthesize DELs for us. */
// 如果服务器不是从服务器,那么执行主动过期键清除
if (server.active_expire_enabled && server.masterhost == NULL)
// 清除模式为 CYCLE_SLOW ,这个模式会尽量多清除过期键
activeExpireCycle(ACTIVE_EXPIRE_CYCLE_SLOW);
/* Perform hash tables rehashing if needed, but only if there are no
* other processes saving the DB on disk. Otherwise rehashing is bad
* as will cause a lot of copy-on-write of memory pages. */
// 在没有 BGSAVE 或者 BGREWRITEAOF 执行时,对哈希表进行 rehash
if (server.rdb_child_pid == -1 && server.aof_child_pid == -1) {
/* We use global counters so if we stop the computation at a given
* DB we'll be able to start from the successive in the next
* cron loop iteration. */
static unsigned int resize_db = 0;
static unsigned int rehash_db = 0;
unsigned int dbs_per_call = REDIS_DBCRON_DBS_PER_CALL;
unsigned int j;
/* Don't test more DBs than we have. */
// 设定要测试的数据库数量
if (dbs_per_call > server.dbnum) dbs_per_call = server.dbnum;
/* Resize */
// 调整字典的大小
for (j = 0; j < dbs_per_call; j++) {
tryResizeHashTables(resize_db % server.dbnum);
resize_db++;
}
/* Rehash */
// 对字典进行渐进式 rehash
if (server.activerehashing) {
for (j = 0; j < dbs_per_call; j++) {
int work_done = incrementallyRehash(rehash_db % server.dbnum);
rehash_db++;
if (work_done) {
/* If the function did some work, stop here, we'll do
* more at the next cron loop. */
break;
}
}
}
}
}
(3)redis.c/incrementallyRehash函数源码
/* Our hash table implementation performs rehashing incrementally while
* we write/read from the hash table. Still if the server is idle, the hash
* table will use two tables for a long time. So we try to use 1 millisecond
* of CPU time at every call of this function to perform some rehahsing.
*
* 虽然服务器在对数据库执行读取/写入命令时会对数据库进行渐进式 rehash ,
* 但如果服务器长期没有执行命令的话,数据库字典的 rehash 就可能一直没办法完成,
* 为了防止出现这种情况,我们需要对数据库执行主动 rehash 。
*
* The function returns 1 if some rehashing was performed, otherwise 0
* is returned.
*
* 函数在执行了主动 rehash 时返回 1 ,否则返回 0 。
*/
int incrementallyRehash(int dbid) {
/* Keys dictionary */
if (dictIsRehashing(server.db[dbid].dict)) {
dictRehashMilliseconds(server.db[dbid].dict,1);
return 1; /* already used our millisecond for this loop... */
}
/* Expires */
if (dictIsRehashing(server.db[dbid].expires)) {
dictRehashMilliseconds(server.db[dbid].expires,1);
return 1; /* already used our millisecond for this loop... */
}
return 0;
}
(4)dict.c/dictRehashMilliseconds函数源码
/* Rehash for an amount of time between ms milliseconds and ms+1 milliseconds */
int dictRehashMilliseconds(dict *d, int ms) {
long long start = timeInMilliseconds();
int rehashes = 0;
while(dictRehash(d,100)) {
rehashes += 100;
if (timeInMilliseconds()-start > ms) break;
}
return rehashes;
}
(5)dictRehash函数源码在上面已经贴出。
(6)分析
在serverCron中,当没有后台子线程时,会先调用databasesCron函数,再调用incrementallyRehash函数,然后调用dictRehashMilliseconds函数,最终调用dictRehash函数。
incrementallyRehash的时间较长,rehash的个数也比较多。这里每次执行 1 millisecond rehash 操作;如果未完成 rehash,会在下一个 loop 里面继续执行。
本人才疏学浅,若有错,请指出,谢谢!
如果你有更好的建议,可以留言我们一起讨论,共同进步!
衷心的感谢您能耐心的读完本篇博文!
参考书籍:《Redis设计与实现(第二版)》—黄健宏
参考链接:Redis——resize源码解析