最近在公司wiki写了几篇还不错的wiki,感觉有分享的必要,所以贴出来和大家分享。
这个是redis的一个critical级别的bug,在目前的3.2.6的小版本都没有修复,只有在4.0rc2里面修了。就是说即使你们用12月发布的3.2.6都是有问题的。
redis slots迁移的时候,在迁移之后key数量会变少.
造了两个节点的集群:10.0.0.10:20003和10.0.0.10:20004,最大可使用内存200M,并保证在测试过程中不会导致内存满等其他问题
10.0.0.10:20004> cluster nodes
2aed426536067077179a3d23875b93b223802dea 10.0.0.10:20003 master - 0 1482132098352 6 connected
03a53320815d8b5f774810f2d41329007d60ebf4 10.0.0.10:20004 myself,master - 0 0 7 connected 0-16383
key格式:{test}_i, 保证所有的key使用同一个slot。{test}_i的slot为6918,并且测试前slot里面没有key.
10.0.0.10:20004> cluster keyslot {test}_i
(integer) 6918
10.0.0.10:20004> cluster countkeysinslot 6918
(integer) 0
10.0.0.10:20003> cluster countkeysinslot 6918
(integer) 0
# 迁移函数,配合redis-trib fix迁移
def migrate_from_4_to_3(slot):
cmd="cluster setslot %s migrating 2aed426536067077179a3d23875b93b223802dea" % (slot,)
cli4.execute_command(cmd)
cmd="cluster setslot %s importing 03a53320815d8b5f774810f2d41329007d60ebf4" % (slot,)
cli3.execute_command(cmd)
def migrate_from_3_to_4(slot):
cmd="cluster setslot %s migrating 03a53320815d8b5f774810f2d41329007d60ebf4" % (slot,)
cli3.execute_command(cmd)
cmd="cluster setslot %s importing 2aed426536067077179a3d23875b93b223802dea" % (slot,)
cli4.execute_command(cmd)
10.0.0.10:20003> cluster countkeysinslot 6918
(integer) 0
10.0.0.10:20004> cluster countkeysinslot 6918
(integer) 0
# 向10.0.0.10:20004写入20000个不带过期时间的key
for i in range(20000):
cmd="set {test}_%s %s" % (i, i)
print cmd, cli4.execute_command(cmd)
10.0.0.10:20004> cluster countkeysinslot 6918
(integer) 20000
# 迁移slot 6918
migrate_from_4_to_3(6918)
redis-trib fix 10.0.0.10:20004
#check
10.0.0.10:20003> cluster countkeysinslot 6918
(integer) 20000
10.0.0.10:20004> cluster countkeysinslot 6918
(integer) 0
结论: 20000个key全部迁移,没有问题。
使用和上面相同的方法,测试{20000个不过期的key,20000个带过期时间的key}的情况。使用了{test}_i的slot=6918和{bug}_i的slot=7910这两个slot进行了测试。
如果迁移的过程中没有key正在过期,发现迁移后key的数量也会减少
如果有迁移的过程中有key正在过期,那么迁移完成后key的数量少于20000,并且多次实验测试少的key的数量不同,有的时候少几百个,有的时候少2000多个。
说明,如果在迁移slot的过程中,如果有key过期,那么会对那么没有过期时间的key造成影响,导致丢失一些不过期的key.
redis-trib在判断是不是迁移完成时,只判断了getkeysinslot,当getkeysinslot返回空时就直接认为迁移完成了,直接退出。
所以在代码里面添加了countkeysinslot,当两者同时为0时,在尝试判断10次在退出试一下。
# 修改后的redis-trib.rb
...
# Migrate all the keys from source to target using the MIGRATE command
zerocounttime = 0
while true
keys = source.r.cluster("getkeysinslot",slot,o[:pipeline])
cntkeysinslot = source.r.cluster("countkeysinslot",slot)
if keys.length == 0 && cntkeysinslot == 0
while true
keys = source.r.cluster("getkeysinslot",slot,o[:pipeline])
cntkeysinslot = source.r.cluster("countkeysinslot",slot)
if keys.length == 0 && cntkeysinslot == 0
zerocounttime = zerocounttime + 1
print "try #{zerocounttime}\n"
end
if zerocounttime > 10
break
end
end
end
break if zerocounttime > 10
zerocounttime = 0
#do migrate keys
...
结论:问题同样复现,不是这里的问题。
我们用的redis版本为3.2.1,新版本是不是也存在这个问题?
在github上clone了最新的代码
3.x的最新版本3.2.6, 问题可以复现。
4.0, 问题没有复现。并且在测试中发现,4.0的redis-server和当前3.2的无法兼容。
例如10.0.0.10:20003是3.2.1,10.0.0.10:20004是4.0,cluster nodes如下:
10.0.0.10:20003> cluster nodes
2aed426536067077179a3d23875b93b223802dea 10.0.0.10:20003 myself,master - 0 0 14 connected 5638
03a53320815d8b5f774810f2d41329007d60ebf4 10.0.0.10:20004 master,fail? - 1482152712981 1482152711815 13 disconnected 0-5637 5639-16383
10.0.0.10:20004> cluster nodes
03a53320815d8b5f774810f2d41329007d60ebf4 10.0.0.10:20004@30004 myself,master - 0 0 13 connected 0-5637 5639-16383
2aed426536067077179a3d23875b93b223802dea 10.0.0.10:20003@30003 master,fail? - 1482152711827 1482152711026 14 connected 5638
全部是4.0时,cluster nodes如下:
10.0.0.10:20003> cluster nodes
03a53320815d8b5f774810f2d41329007d60ebf4 10.0.0.10:20004@30004 master - 0 1482152867134 13 connected 0-5637 5639-16383
2aed426536067077179a3d23875b93b223802dea 10.0.0.10:20003@30003 myself,master - 0 0 14 connected 5638
10.0.0.10:20004> cluster nodes
03a53320815d8b5f774810f2d41329007d60ebf4 10.0.0.10:20004@30004 myself,master - 0 0 13 connected 0-5637 5639-16383
2aed426536067077179a3d23875b93b223802dea 10.0.0.10:20003@30003 master - 0 1482152864189 14 connected 5638
所以继续使用3.2.6的排查。
修改redis-trib代码添加迁移的key 的log,修改redis 3.2.6的代码,在源节点的log里面打印处理的key
...
# Migrate all the keys from source to target using the MIGRATE command
while true
keys = source.r.cluster("getkeysinslot",slot,o[:pipeline])
break if keys.length == 0
print "\nmigrating keys: #{keys}" # debug log
begin
source.r.client.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,:keys,*keys])
end
...
print "."*keys.length if o[:dots]
STDOUT.flush
end
...
redis 3.2.6
//add log for restoreCommand
...
/* Create the key and set the TTL if any */
dbAdd(c->db,c->argv[1],obj);
if (ttl) setExpire(c->db,c->argv[1],mstime()+ttl);
signalModifiedKey(c->db,c->argv[1]);
addReply(c,shared.ok);
server.dirty++;
if (sdslen(c->argv[1]->ptr) < 12){ // debug log
serverLog(LL_WARNING,"store key ok, key:%s, ttl:%lld", c->argv[1]->ptr, ttl);
}
}
// add log for migrateCommand
...
/* Create RESTORE payload and generate the protocol to call the command. */
for (j = 0; j < num_keys; j++) {
expireat = getExpire(c->db,kv[j]);
if (expireat != -1) {
ttl = expireat-mstime();
if (ttl < 1) ttl = 1;
}
serverAssertWithInfo(c,NULL,rioWriteBulkCount(&cmd,'*',replace ? 5 : 4));
if (server.cluster_enabled)
serverAssertWithInfo(c,NULL,
rioWriteBulkString(&cmd,"RESTORE-ASKING",14));
else
serverAssertWithInfo(c,NULL,rioWriteBulkString(&cmd,"RESTORE",7));
serverAssertWithInfo(c,NULL,sdsEncodedObject(kv[j]));
serverAssertWithInfo(c,NULL,rioWriteBulkString(&cmd,kv[j]->ptr,
sdslen(kv[j]->ptr)));
serverAssertWithInfo(c,NULL,rioWriteBulkLongLong(&cmd,ttl));
serverLog(LL_WARNING,"prepare cmd, key:%s, ttl:%lld", kv[j]->ptr, ttl);//debug log
/* Emit the payload argument, that is the serialized object using
* the DUMP format. */
createDumpPayload(&payload,ov[j]);
serverAssertWithInfo(c,NULL,
rioWriteBulkString(&cmd,payload.io.buffer.ptr,
sdslen(payload.io.buffer.ptr)));
sdsfree(payload.io.buffer.ptr);
/* Add the REPLACE option to the RESTORE command if it was specified
* as a MIGRATE option. */
if (replace)
serverAssertWithInfo(c,NULL,rioWriteBulkString(&cmd,"REPLACE",7));
}
...
...
/* Create RESTORE payload and generate the protocol to call the command. */
for (j = 0; j < num_keys; j++) {
serverLog(LL_WARNING,"key:%s", kv[j]->ptr); // debug log
expireat = getExpire(c->db,kv[j]);
if (expireat != -1) {
ttl = expireat-mstime();
if (ttl < 1) ttl = 1;
}
...
}
...
问题复现,这次丢失的key为:
>>> len(notfound)
7
>>>
>>> notfound
[4, 3994, 3995, 3996, 3997, 3998, 3999]
在redis-trib和source节点都能看到缺少的key的日志,很明显的看到key被迁移了。
在target的节点看到日志是这样的:
125838:M 21 Dec 11:40:36.390 # store key ok, key:{haha}_399, ttl:0
125838:M 21 Dec 11:40:36.390 # store key ok, key:{haha}_3990, ttl:0
125838:M 21 Dec 11:40:36.426 # store key ok, key:{haha}_3991, ttl:0
125838:M 21 Dec 11:40:36.445 # store key ok, key:{haha}_3992, ttl:0
125838:M 21 Dec 11:40:36.467 # store key ok, key:{haha}_3993, ttl:0
125838:M 21 Dec 11:40:36.486 # store key ok, key:{haha}_3994, ttl:1
125838:M 21 Dec 11:40:36.505 # store key ok, key:{haha}_3995, ttl:14
125838:M 21 Dec 11:40:36.524 # store key ok, key:{haha}_3996, ttl:22
125838:M 21 Dec 11:40:36.544 # store key ok, key:{haha}_3997, ttl:25
125838:M 21 Dec 11:40:36.561 # store key ok, key:{haha}_3998, ttl:30
125838:M 21 Dec 11:40:36.580 # store key ok, key:{haha}_3999, ttl:32
125838:M 21 Dec 11:40:36.599 # store key ok, key:{haha}_4, ttl:35
125838:M 21 Dec 11:40:36.632 # store key ok, key:{haha}_40, ttl:0
125838:M 21 Dec 11:40:36.632 # store key ok, key:{haha}_400, ttl:0
125838:M 21 Dec 11:40:36.632 # store key ok, key:{haha}_4000, ttl:0
125838:M 21 Dec 11:40:36.632 # store key ok, key:{haha}_4001, ttl:0
所有丢失的key的ttl都不对!
去源节点看一下ttl是啥?
126113:M 21 Dec 11:40:36.561 # key:{haha}_39979
126113:M 21 Dec 11:40:36.561 # prepare cmd, key:{haha}_39979, ttl:30
126113:M 21 Dec 11:40:36.561 # key:{haha}_3998
126113:M 21 Dec 11:40:36.561 # prepare cmd, key:{haha}_3998, ttl:30
126113:M 21 Dec 11:40:36.561 # key:{haha}_39980
126113:M 21 Dec 11:40:36.561 # prepare cmd, key:{haha}_39980, ttl:32
126113:M 21 Dec 11:40:36.580 # key:{haha}_39988
126113:M 21 Dec 11:40:36.580 # prepare cmd, key:{haha}_39988, ttl:30
126113:M 21 Dec 11:40:36.580 # key:{haha}_39989
126113:M 21 Dec 11:40:36.580 # prepare cmd, key:{haha}_39989, ttl:32
126113:M 21 Dec 11:40:36.580 # key:{haha}_3999
126113:M 21 Dec 11:40:36.580 # prepare cmd, key:{haha}_3999, ttl:32
126113:M 21 Dec 11:40:36.580 # key:{haha}_39990
126113:M 21 Dec 11:40:36.580 # prepare cmd, key:{haha}_39990, ttl:35
说明所有丢失key的ttl因为没有处理而使用了前一个key的ttl!
问题出在下面代码的for循环,对于不过期的key,ttl应该是0,但是如果前面有过期的key,ttl>0.那么在下一个处理不过期key时,expireat=-1,不会进入if,ttl还是使用前一个ttl,导致一个永不过期的key因为ttl>0而过期。
/* MIGRATE host port key dbid timeout [COPY | REPLACE]
*
* On in the multiple keys form:
*
* MIGRATE host port "" dbid timeout [COPY | REPLACE] KEYS key1 key2 ... keyN */
void migrateCommand(client *c) {
long long ttl, expireat;
ttl = 0;
...
/* Create RESTORE payload and generate the protocol to call the command. */
/*
问题出在这个for循环,对于不过期的key,ttl应该是0,但是如果前面有过期的key,ttl>0.在处理不过期key时,expireat=-1,导致ttl还是使用前一个ttl.
导致一个永不过期的key因为ttl>0而过期。
*/
for (j = 0; j < num_keys; j++) {
/
expireat = getExpire(c->db,kv[j]);
if (expireat != -1) {
ttl = expireat-mstime();
if (ttl < 1) ttl = 1;
}
serverAssertWithInfo(c,NULL,rioWriteBulkCount(&cmd,'*',replace ? 5 : 4));
if (server.cluster_enabled)
serverAssertWithInfo(c,NULL,
rioWriteBulkString(&cmd,"RESTORE-ASKING",14));
else
serverAssertWithInfo(c,NULL,rioWriteBulkString(&cmd,"RESTORE",7));
serverAssertWithInfo(c,NULL,sdsEncodedObject(kv[j]));
serverAssertWithInfo(c,NULL,rioWriteBulkString(&cmd,kv[j]->ptr,
sdslen(kv[j]->ptr)));
serverAssertWithInfo(c,NULL,rioWriteBulkLongLong(&cmd,ttl));
/* Emit the payload argument, that is the serialized object using
* the DUMP format. */
createDumpPayload(&payload,ov[j]);
serverAssertWithInfo(c,NULL,
rioWriteBulkString(&cmd,payload.io.buffer.ptr,
sdslen(payload.io.buffer.ptr)));
sdsfree(payload.io.buffer.ptr);
/* Add the REPLACE option to the RESTORE command if it was specified
* as a MIGRATE option. */
if (replace)
serverAssertWithInfo(c,NULL,rioWriteBulkString(&cmd,"REPLACE",7));
}
去github看一眼最新的代码,发现就在几天前的12.14号被一个@badboy发现并fix了,好悲伤~
https://github.com/antirez/redis/pull/3673/files