环境说明:redis源码版本 5.0.3;我在阅读源码过程做了注释,git地址:https://gitee.com/xiaoangg/redis_annotation
参考书籍:《redis的设计与实现》
目录
一. RDB的创建与载入
二 自动间隔性保存
三 RDB文件结构
RDB是什么,解决了什么问题,是如何实现的
因为redis是内存数据库,一旦服务器进程意外退出,数据库中数据也会丢失;
RDB就是为了解决这个问题,提供的数据持久化功能;
RDB是将某个时间点的数据库保存到一个RDB文件中,RDB文件是一个经过压缩了的二进制文件,通过该文件可以还原数据库状态;
1.创建
有两个命令可以生成RDB文件:SAVE,BGSAVE
SAVE命令会堵塞redis服务进程,直到RDB文件创建完毕,对于内存比较大的实例会造成长时间的堵塞,线上不建议使用;
BGSAVE会派生出一个子进程,由子进程负责创建RDB文件,父进程继续处理命令请求;
2.载入
redis没有专门的命令用来载入RDB文件,只要redis服务启动时检测到了RDB文件,他就会自动载入
tips:因为AOF文件的更新频率通常比RDB文件的更新评论更高,所以如果开启了AOF持久化功能,那么服务会优先使用AOF文件来还原数据;
用可以通过设置服务器配置的save选项,让服务器没间隔一段时间自动执行一次BGSAVE;
举个例子,如果在服务器配置项中写入如下配置:
save 900 1
save 300 10
save 60 10000
那么主要满足上述 三个条件中的任何一个,BGSAVE命令就会被执行:
在900秒内,对数据库修改了至少1次;
在300内至少对数据库修改了10次;
在60秒内,直达哟对数据库修改了10000次;
redisServer中RDB相关属性如下:
struct redisServer {
//.......
/* RDB persistence */
//RDB 持久化相关属性
long long dirty; /* Changes to DB from the last save */ //计数器 距离上次 save/bgsave后,服务器进行了多少次修改
long long dirty_before_bgsave; /* Used to restore dirty on failed BGSAVE */
pid_t rdb_child_pid; /* PID of RDB saving child */ //RDB持久化子进程id
struct saveparam *saveparams; /* Save points array for RDB *///触发RDB持久的的条件数组
int saveparamslen; /* Number of saving points */
char *rdb_filename; /* Name of RDB file */
int rdb_compression; /* Use compression in RDB? */
int rdb_checksum; /* Use RDB checksum? */
time_t lastsave; /* Unix time of last successful save */ //unix时间戳,记录上一次执行save/bgsave成功的时间
time_t lastbgsave_try; /* Unix time of last attempted bgsave */
time_t rdb_save_time_last; /* Time used by last RDB save run. */
time_t rdb_save_time_start; /* Current RDB save start time. */
int rdb_bgsave_scheduled; /* BGSAVE when possible if true. */
int rdb_child_type; /* Type of save by active child. */
int lastbgsave_status; /* C_OK or C_ERR */
int stop_writes_on_bgsave_err; /* Don't allow writes if can't BGSAVE */
int rdb_pipe_write_result_to_parent; /* RDB pipes used to return the state */
int rdb_pipe_read_result_from_child; /* of each slave in diskless SYNC. */
//......
}
/*
触发RDB持久化的条件
如(
seconds: 900
change:1
表示900秒内 数据修改过1一次
)
*/
struct saveparam {
time_t seconds; //秒
int changes; //修改次数
};
1.设置保存条件
可以指定配置文件或者传入启动参数 save选项;如果没有设置,服务器将使用默认条件:
save 900 1
save 300 10
save 60 10000
设置的保存条件将会保存到redisServer结构体中的saveparam属性中(saveparam是个数组);
2. dirty计数器和lastsave属性
dirty计数器记录了上次save/bgsave成功后,数据库修改的次数(增、删、改);
lastsave值是个unix时间戳,记录上次save和bgsave成功的时间;
3 检查是否满足保存条件
检查是否满足保存条件的入口位于server.c/serverCron函数中;
默认每间隔100m就会执行一次;
判断是否满足保存条件的代码如下:
/* If there is not a background saving/rewrite in progress check if
* we have to save/rewrite now. */
for (j = 0; j < server.saveparamslen; j++) {
struct saveparam *sp = server.saveparams+j;
/* Save if we reached the given amount of changes,
* the given amount of seconds, and if the latest bgsave was
* successful or if, in case of an error, at least
* CONFIG_BGSAVE_RETRY_DELAY seconds already elapsed. */
if (server.dirty >= sp->changes &&
server.unixtime-server.lastsave > sp->seconds &&
(server.unixtime-server.lastbgsave_try >
CONFIG_BGSAVE_RETRY_DELAY ||
server.lastbgsave_status == C_OK))
{
serverLog(LL_NOTICE,"%d changes in %d seconds. Saving...",
sp->changes, (int)sp->seconds);
rdbSaveInfo rsi, *rsiptr;
rsiptr = rdbPopulateSaveInfo(&rsi);
rdbSaveBackground(server.rdb_filename,rsiptr);
break;
}
}
下图是在RDB版本9的文件结构(如有错误欢迎指正)
RDB文件的开头是REDIS五个字符串,通过该开头可以快速检查载入的文件是否是RDB;
接下来是RDB_VERSION 记录的RDB文件版本(redis5.0.3的RDB版本是9),长度是4个字节带表的整数,所以该值是0009;
数据部分保存这多个非空数据库;记录了数据库的ID,数据库大小,有过期时间key的大小 等信息;
RDB存储实现位于rdb.c/rdbSaveRio,详细实现可以阅读源码;
//生成 RDB格式的数据库转储,并发送到指定的 流I/O
/* Produces a dump of the database in RDB format sending it to the specified
* Redis I/O channel. On success C_OK is returned, otherwise C_ERR
* is returned and part of the output, or all the output, can be
* missing because of I/O errors.
*
* When the function returns C_ERR and if 'error' is not NULL, the
* integer pointed by 'error' is set to the value of errno just after the I/O
* error. */
int rdbSaveRio(rio *rdb, int *error, int flags, rdbSaveInfo *rsi) {
dictIterator *di = NULL;
dictEntry *de;
char magic[10];
int j;
uint64_t cksum;
size_t processed = 0;
if (server.rdb_checksum)
rdb->update_cksum = rioGenericUpdateChecksum;
//RDB文件的最开头“REDIS”+4位RDB文件的版本号(当前版本是9) //所以magic是REDIS0009
snprintf(magic,sizeof(magic),"REDIS%04d",RDB_VERSION);
if (rdbWriteRaw(rdb,magic,9) == -1) goto werr;
//
if (rdbSaveInfoAuxFields(rdb,flags,rsi) == -1) goto werr;
for (j = 0; j < server.dbnum; j++) {
redisDb *db = server.db+j;
dict *d = db->dict;
if (dictSize(d) == 0) continue;
di = dictGetSafeIterator(d);
//写入OPCODE(RDB_OPCODE_SELECTDB=254), 选中的数据库
/* Write the SELECT DB opcode */
if (rdbSaveType(rdb,RDB_OPCODE_SELECTDB) == -1) goto werr;
if (rdbSaveLen(rdb,j) == -1) goto werr;
/* Write the RESIZE DB opcode. We trim the size to UINT32_MAX, which
* is currently the largest type we are able to represent in RDB sizes.
* However this does not limit the actual size of the DB to load since
* these sizes are just hints to resize the hash tables. */
uint64_t db_size, expires_size;
db_size = dictSize(db->dict);
expires_size = dictSize(db->expires);
if (rdbSaveType(rdb,RDB_OPCODE_RESIZEDB) == -1) goto werr;
if (rdbSaveLen(rdb,db_size) == -1) goto werr;
if (rdbSaveLen(rdb,expires_size) == -1) goto werr;
//遍历数据库的所有key
/* Iterate this DB writing every entry */
while((de = dictNext(di)) != NULL) {
sds keystr = dictGetKey(de);
robj key, *o = dictGetVal(de);
long long expire;
initStaticStringObject(key,keystr);
expire = getExpire(db,&key);
if (rdbSaveKeyValuePair(rdb,&key,o,expire) == -1) goto werr;
/* When this RDB is produced as part of an AOF rewrite, move
* accumulated diff from parent to child while rewriting in
* order to have a smaller final write. */
if (flags & RDB_SAVE_AOF_PREAMBLE &&
rdb->processed_bytes > processed+AOF_READ_DIFF_INTERVAL_BYTES)
{
processed = rdb->processed_bytes;
aofReadDiffFromParent();
}
}
dictReleaseIterator(di);
di = NULL; /* So that we don't release it again on error. */
}
/* If we are storing the replication information on disk, persist
* the script cache as well: on successful PSYNC after a restart, we need
* to be able to process any EVALSHA inside the replication backlog the
* master will send us. */
if (rsi && dictSize(server.lua_scripts)) {
di = dictGetIterator(server.lua_scripts);
while((de = dictNext(di)) != NULL) {
robj *body = dictGetVal(de);
if (rdbSaveAuxField(rdb,"lua",3,body->ptr,sdslen(body->ptr)) == -1)
goto werr;
}
dictReleaseIterator(di);
di = NULL; /* So that we don't release it again on error. */
}
/* EOF opcode */
if (rdbSaveType(rdb,RDB_OPCODE_EOF) == -1) goto werr;
/* CRC64 checksum. It will be zero if checksum computation is disabled, the
* loading code skips the check in this case. */
cksum = rdb->cksum;
memrev64ifbe(&cksum);
if (rioWrite(rdb,&cksum,8) == 0) goto werr;
return C_OK;
werr:
if (error) *error = errno;
if (di) dictReleaseIterator(di);
return C_ERR;
}