本文介绍redis主从环境下的手工failover操作及排错过程,实现主实例宕机的时候,将从实例提升为主实例,继续写入数据;等到原主实例恢复后,同步原从实例上的数据完成后,恢复初始的主从实例角色!
环境介绍
操作系统版本均为:rhel5.4 64bit
redis版本:2.6.4
redis实例端口均为:6379
redis实例密码均为:123
主实例为server11(192.168.1.112)
从实例为server12(192.168.1.113)
一:未配置持久化情况下的手工切换
1:正常情况下,server11为主实例,server12为从实例,数据同步正常
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 info |grep -A 3 'Replication'
- # Replication
- role:master
- connected_slaves:1
- slave0:192.168.1.113,6379,online
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 config get save
- 1) "save"
- 2) ""
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 config get save
- 1) "save"
- 2) ""
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 set 5 e
- OK
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 get 5
- "e"
- [root@server12 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 5
- "e"
2:当主实例挂掉的时候,从实例可以正常查询,但无法写入数据
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 shutdown
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 get 5
- Could not connect to Redis at 192.168.1.112:6379: Connection refused
- [root@server12 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 5
- "e"
- [root@server12 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 set 6 f
- (error) READONLY You can't write against a read only slave.
3:将从实例提升为主实例,从而实现数据写入
- [root@server12 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 SLAVEOF NO ONE
- OK
- [root@server12 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 info |grep -A 3 'Replication'
- # Replication
- role:master
- connected_slaves:0
- [root@server12 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 5
- "e"
- [root@server12 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 set 6 f
- OK
- [root@server12 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 6
- "f"
4:主实例恢复后尝试从server12实例上获取最新的数据,实际测试表明这种方法不可行,最终导致server11和server12的数据不一致,如果强行恢复初始实例角色,则会导致数据丢失
- [root@server11 ~]# /usr/local/redis2/bin/redis-server /usr/local/redis2/etc/redis.conf
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 info |grep -A 3 'Replication'
- # Replication
- role:master
- connected_slaves:0
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 get 5
- (nil)
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 get 6
- (nil)
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 5
- "e"
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 6
- "f"
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -p 6379 -a 123 SLAVEOF 192.168.1.113 6379
- OK
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 info |grep -A 10 'Replication'
- # Replication
- role:slave
- master_host:192.168.1.113
- master_port:6379
- master_link_status:down
- master_last_io_seconds_ago:-1
- master_sync_in_progress:0
- master_link_down_since_seconds:517
- slave_priority:100
- slave_read_only:1
- connected_slaves:0
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 info |grep -A 3 'Replication'
- # Replication
- role:master
- connected_slaves:0
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 get 5
- (nil)
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 get 6
- (nil)
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 6
- "f"
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 5
- "e"
二:开启从实例快照持久化下的测试
1:恢复原测试环境后,开启从实例的快照持久化,因为是测试环境,所以设置60秒内如果有1条数据变更则保持一次快照
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 config get save
- 1) "save"
- 2) ""
- [root@server12 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 config get save
- 1) "save"
- 2) "60 1"
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 info |grep -A 3 'Replication'
- # Replication
- role:master
- connected_slaves:1
- slave0:192.168.1.113,6379,online
- [root@server12 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 info |grep -A 3 'Replication'
- # Replication
- role:slave
- master_host:192.168.1.112
- master_port:6379
2:写入测试数据主从环境数据是否同步正常
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 set 5 e
- OK
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 get 5
- "e"
- [root@server12 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 5
- "e"
3:模拟主实例宕机,手动将从实例提升为主实例,继续写入新数据
- [root@server11 ~]# killall -9 redis-server
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 info |grep -A 3 'Replication'
- Could not connect to Redis at 192.168.1.112:6379: Connection refused
- [root@server12 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 5
- "e"
- [root@server12 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 set 6 f
- (error) READONLY You can't write against a read only slave
- [root@server12 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 slaveof no one
- OK
- [root@server12 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 info |grep -A 3 'Replication'
- # Replication
- role:master
- connected_slaves:0
- [root@server12 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 5
- "e"
- [root@server12 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 set 6 f
- OK
- [root@server12 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 6
- "f"
4:原主实例恢复后的数据同步及角色复原,这里同步数据采取将从实例的快照文件复制到主实例的方式实现
- [root@server12 ~]# scp /usr/local/redis2/slave_dump.rdb server11:/usr/local/redis2/master_dump.rdb
- [root@server11 ~]# /usr/local/redis2/bin/redis-server /usr/local/redis2/etc/redis.conf
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 info |grep -A 2 'Replication'
- # Replication
- role:master
- connected_slaves:0
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 get 5
- "e"
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 get 6
- "f"
- [root@server12 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 slaveof 192.168.1.112 6379
- OK
- [root@server12 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 info |grep -A 10 'Replication'
- # Replication
- role:slave
- master_host:192.168.1.112
- master_port:6379
- master_link_status:up
- master_last_io_seconds_ago:1
- master_sync_in_progress:0
- slave_priority:100
- slave_read_only:1
- connected_slaves:0
- [root@server12 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 5
- "e"
- [root@server12 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 6
- "f
- [root@server11 ~]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 info |grep -A 3 'Replication'
- # Replication
- role:master
- connected_slaves:1
- slave0:192.168.1.113,6379,online
后续扩展:本文实现的failover过程,到从实例提升到主实例阶段都是可以通过部署keepalive自动实现的,在最后原主实例数据同步和角色复原可以通过shell脚本来调度,下篇文章中将对此进行详细的介绍!