整体过程概述如下:
1. 初始化
配置好主从后,无论slave是初次还是重新连接到master, slave都会发送PSYNC命令到master。
如果是重新连接,且满足增量同步的条件(3.1中详述),那么redis会将内存缓存队列中的命令发给slave, 完成增量同步(Partial resynchronization)。否则进行全量同步。
2. 正常同步开始
任何对master的写操作都会以redis命令的方式,通过网络发送给slave。
环境:
- master 127.0.0.1:7779
- slave 127.0.0.1:9303 进程号10967 只有一个key
strace -p 10967 -s 1024 -o redis.strace.full
然后连接到slave, 执行slaveof 127.0.0.1 7779,从strace文件看到的同步过程中,slave侧的动作如下(只摘重要部分)
/*从client执行slaveof命令*/
read(6, "*3\r\n$7\r\nslaveof\r\n$9\r\n127.0.0.1\r\n$4\r\n7779\r\n", 16384) = 42
/*返回给client OK*/
write(6, "+OK\r\n", 5)
/*连接到master*/
connect(7, {sa_family=AF_INET, sin_port=htons(7779), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
/*以下判断master是活着的*/
write(7, "PING\r\n", 6)
read(7, "+", 1) = 1
read(7, "P", 1) = 1
read(7, "O", 1) = 1
read(7, "N", 1) = 1
read(7, "G", 1) = 1
read(7, "\r", 1) = 1
read(7, "\n", 1) = 1
/*同步开始,向master发PSYNC*/
write(7, "PSYNC ? -1\r\n", 12) = 12
/*master告诉salve要执行全量同步*/
read(7, "+", 1) = 1
read(7, "F", 1) = 1
read(7, "U", 1) = 1
read(7, "L", 1) = 1
read(7, "L", 1) = 1
read(7, "R", 1) = 1
read(7, "E", 1) = 1
read(7, "S", 1) = 1
read(7, "Y", 1) = 1
read(7, "N", 1) = 1
read(7, "C", 1) = 1
/*打开本地临时rdb文件*/
open("temp-1472206877.10967.rdb", O_WRONLY|O_CREAT|O_EXCL, 0644) = 8
/*接收master发来的rdb文件*/
read(7, "REDIS0006\376\0\0\4name\4xuan\376\1\r\16HOTEL_JUMP_NUM\33\33\0\0\0\30\0\0\0\4\0\0\320\325\2\220\6\6\365\2\320\334\230(\7\6\370\377\377\336\260\222\330\261\317\371\345", 77) = 77
/*将接收的rdb写入临时rdb*/
write(8, "REDIS0006\376\0\0\4name\4xuan\376\1\r\16HOTEL_JUMP_NUM\33\33\0\0\0\30\0\0\0\4\0\0\320\325\2\220\6\6\365\2\320\334\230(\7\6\370\377\377\336\260\222\330\261\317\371\345", 77) = 77
/*临时rdb文件重命名*/
rename("temp-1472206877.10967.rdb", "dump.rdb") = 0
/*打开本地rdb文件*/
open("dump.rdb", O_RDONLY) = 9
/* 从rdb文件加载数据到slave*/
read(9, "REDIS0006\376\0\0\4name\4xuan\376\1\r\16HOTEL_JUMP_NUM\33\33\0\0\0\30\0\0\0\4\0\0\320\325\2\220\6\6\365\2\320\334\230(\7\6\370\377\377\336\260\222\330\261\317\371\345", 4096) = 77
/*sync成功完成,记录日志*/
open("/tmp/redis.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 8
fstat(8, {st_mode=S_IFREG|0644, st_size=7627, ...}) = 0
write(8, "[10967] 26 Aug 18:21:17.450 * MASTER <-> SLAVE sync: Finished with success\n", 75) = 75
整个过程,与2.1所述一样,只是因为我们在同步过程中没对master做操作,所以strace没有体现出2.1中的第6步。
slave的redis.log也反应了上面的过程。
[10967] 26 Aug 18:21:17.250 * SLAVE OF 127.0.0.1:7779 enabled (user request)
[10967] 26 Aug 18:21:17.410 * Connecting to MASTER 127.0.0.1:7779
[10967] 26 Aug 18:21:17.413 * MASTER <-> SLAVE sync started
[10967] 26 Aug 18:21:17.415 * Non blocking connect for SYNC fired the event.
[10967] 26 Aug 18:21:17.418 * Master replied to PING, replication can continue...
[10967] 26 Aug 18:21:17.421 * Partial resynchronization not possible (no cached master)
[10967] 26 Aug 18:21:17.432 * Full resync from master: 1d13fbd06f644eeb4b50d65f11e65bffd9e596f6:43774
[10967] 26 Aug 18:21:17.444 * MASTER <-> SLAVE sync: receiving 77 bytes from master
[10967] 26 Aug 18:21:17.446 * MASTER <-> SLAVE sync: Flushing old data
[10967] 26 Aug 18:21:17.447 * MASTER <-> SLAVE sync: Loading DB in memory
[10967] 26 Aug 18:21:17.450 * MASTER <-> SLAVE sync: Finished with success
几个重要概念:
- 内存缓存队列(in-memory backlog):用于记录连接断开时master收到的写操作
- 复制偏移量(replication offset):master, slave都有一个偏移,记录当前同步记录的位置
- master服务器id(master run ID):master唯一标识,2.2的redis.log中的1d13fbd06f644eeb4b50d65f11e65bffd9e596f6,就是一个master服务器id。
现网络连接断开后,slave将尝试重连master。当满足下列条件时,重连后会进行增量同步:
1. slave记录的master服务器id和当前要连接的master服务器id相同
2. slave的复制偏移量比master的偏移量靠前。比如slave是1000, master是1100
3. slave的复制偏移量所指定的数据仍然保存在主服务器的内存缓存队列中
确认执行增量同步后,redis会将内存缓存队列中的命令通过网络发给slave, 完成增量同步
环境:
- master 10.136.30.144:7779
- slave 10.136.31.213 9303 有一个key “h”
首先我们strace slave的进程,然后,为了模拟网络断线,我们在master机器上增加iptables规则,扔掉了所有发往slave的包。
/sbin/iptables -A OUTPUT -d 10.136.31.213 -j DROP
然后,在master上删除key h
del h
最后,我们删除iptables规则,模拟出网络恢复的状况。
/sbin/iptables -F
我们先来看slave的日志
[25667] 26 Aug 15:29:33.241 # Connection with master lost.
[25667] 26 Aug 15:29:33.241 * Caching the disconnected master state.
[25667] 26 Aug 15:29:33.241 * Connecting to MASTER 10.136.30.144:7779
[25667] 26 Aug 15:29:33.241 * MASTER <-> SLAVE sync started
[25667] 26 Aug 15:29:54.240 # Error condition on socket for SYNC: Connection timed out
[25667] 26 Aug 15:29:54.262 * Connecting to MASTER 10.136.30.144:7779
[25667] 26 Aug 15:29:54.263 * MASTER <-> SLAVE sync started
[25667] 26 Aug 15:30:15.270 # Error condition on socket for SYNC: Connection timed out
[25667] 26 Aug 15:30:15.726 * Connecting to MASTER 10.136.30.144:7779
[25667] 26 Aug 15:30:15.726 * MASTER <-> SLAVE sync started
[25667] 26 Aug 15:30:36.728 # Error condition on socket for SYNC: Connection timed out
[25667] 26 Aug 15:30:37.272 * Connecting to MASTER 10.136.30.144:7779
[25667] 26 Aug 15:30:37.279 * MASTER <-> SLAVE sync started
[25667] 26 Aug 15:30:37.282 * Non blocking connect for SYNC fired the event.
[25667] 26 Aug 15:30:37.289 * Master replied to PING, replication can continue...
[25667] 26 Aug 15:30:37.293 * Trying a partial resynchronization (request 1d13fbd06f644eeb4b50d65f11e65bffd9e596f6:29265).
[25667] 26 Aug 15:30:37.300 * Successful partial resynchronization with master.
[25667] 26 Aug 15:30:37.302 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.
slave发现与master断开后,一直尝试重新连接master,直到连接成功后尝试增量同步(partial resynchronization)并最终完成了增量同步。
starce的结果同样反应了上面的过程,摘要如下:
/*重新连接master*/
connect(6, {sa_family=AF_INET, sin_port=htons(7779), sin_addr=inet_addr("10.136.30.144")}, 16) = -1 EINPROGRESS (Operation now in progress)
/*以下判断master是活着的*/
write(6, "PING\r\n", 6)
read(6, "+", 1) = 1
read(6, "P", 1) = 1
read(6, "O", 1) = 1
read(6, "N", 1) = 1
read(6, "G", 1) = 1
read(6, "\r", 1) = 1
read(6, "\n", 1) = 1
/*slave尝试增量同步,master表示同意*/
write(6, "PSYNC 1d13fbd06f644eeb4b50d65f11"..., 54) = 54
read(6, "+", 1) = 1
read(6, "C", 1) = 1
read(6, "O", 1) = 1
read(6, "N", 1) = 1
read(6, "T", 1) = 1
read(6, "I", 1) = 1
read(6, "N", 1) = 1
read(6, "U", 1) = 1
read(6, "E", 1) = 1
read(6, "\r", 1) = 1
read(6, "\n", 1) = 1
/*读取断线期间的增量命令: del h*/
read(6, "*1\r\n$4\r\nPING\r\n*2\r\n$3\r\ndel\r\n$1\r\nh"..., 16384) = 188