上篇文件介绍了docker环境下redis一主两从环境的搭建,本文接上文介绍docker环境下哨兵模式的配置和启动及注意事项。
1.准备配置文件
注意事项:
虽然大部分配置都相同,但是还是建议分别放在不同的目录里,也就是准备三份哨兵配置文件。主要原因是哨兵会改变配置文件的内容,后面验证的时候方便观察,当然也有其他的办法可以处理。
遵照上面的描述,分别在redis-6379 redis-6380 redis-6381三个文件夹新建文件sentinel.conf,内容如下:
/opt/redis-cluster/redis-6379/sentinel.conf
daemon no
port 26379
protected-mode no
sentinel monitor mymaster 192.168.0.106 6380
logfile /data/sentinel.log
dir /data
/opt/redis-cluster/redis-6380/sentinel.conf
daemon no
port 26380
protected-mode no
sentinel monitor mymaster 192.168.0.106 6380
logfile /data/sentinel.log
dir /data
/opt/redis-cluster/redis-6381/sentinel.conf
daemon no
port 26381
protected-mode no
sentinel monitor mymaster 192.168.0.106 6380
logfile /data/sentinel.log
dir /data
2.启动
docker run -it -d --net host --name sentinel-6380 -v /opt/redis-cluster/redis-6380/data/:/data -v /opt/redis-cluster/redis-6380/sentinel.conf:/usr/local/etc/redis/sentinel.conf redis:latest redis-sentinel /usr/local/etc/redis/sentinel.conf
docker run -it -d --net host --name sentinel-6381 -v /opt/redis-cluster/redis-6381/data/:/data -v /opt/redis-cluster/redis-6381/sentinel.conf:/usr/local/etc/redis/sentinel.conf redis:latest redis-sentinel /usr/local/etc/redis/sentinel.conf
docker run -it -d --net host --name sentinel-6379 -v /opt/redis-cluster/redis-6379/data/:/data -v /opt/redis-cluster/redis-6379/sentinel.conf:/usr/local/etc/redis/sentinel.conf redis:latest redis-sentinel /usr/local/etc/redis/sentinel.conf
启动命令很简单,主要是data目录和配置文件的挂载。
同样通过docker ps验证下启动情况。
可以看到三个哨兵进程都正常启动了。
3.验证故障转移特性
先看下目前的主从情况
[root@kf202 ~]# redis-cli -p 6380
127.0.0.1:6380> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=192.168.0.106,port=6381,state=online,offset=175380,lag=1
slave1:ip=192.168.0.106,port=6379,state=online,offset=175662,lag=1
master_replid:d1d421395d0c3121060b4b984f9896a54b1190e0
master_replid2:e50d457bcabdbb4970ff08817a4400c934dccc7d
master_repl_offset:175662
second_repl_offset:32005
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:175662
127.0.0.1:6380>
没错,6380是master节点,有两个从节点,分别对应6379 和 6381
我们停掉主节点。
再看下主从节点情况,这个时候应该是能检测到主节点下线的。
[root@kf202 data]# redis-cli -p 6379
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:192.168.0.106
master_port:6380
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:2366655
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:d1d421395d0c3121060b4b984f9896a54b1190e0
master_replid2:e50d457bcabdbb4970ff08817a4400c934dccc7d
master_repl_offset:2366655
second_repl_offset:32005
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1318080
repl_backlog_histlen:1048576
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:192.168.0.106
master_port:6380
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:2389792
master_link_down_since_seconds:9
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:d1d421395d0c3121060b4b984f9896a54b1190e0
master_replid2:e50d457bcabdbb4970ff08817a4400c934dccc7d
master_repl_offset:2389792
second_repl_offset:32005
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1341217
repl_backlog_histlen:1048576
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:192.168.0.106
master_port:6380
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:2389792
master_link_down_since_seconds:11
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:d1d421395d0c3121060b4b984f9896a54b1190e0
master_replid2:e50d457bcabdbb4970ff08817a4400c934dccc7d
master_repl_offset:2389792
second_repl_offset:32005
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1341217
repl_backlog_histlen:1048576
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:192.168.0.106
master_port:6380
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:2389792
master_link_down_since_seconds:24
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:d1d421395d0c3121060b4b984f9896a54b1190e0
master_replid2:e50d457bcabdbb4970ff08817a4400c934dccc7d
master_repl_offset:2389792
second_repl_offset:32005
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1341217
repl_backlog_histlen:1048576
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:192.168.0.106
master_port:6380
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:2389792
master_link_down_since_seconds:26
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:d1d421395d0c3121060b4b984f9896a54b1190e0
master_replid2:e50d457bcabdbb4970ff08817a4400c934dccc7d
master_repl_offset:2389792
second_repl_offset:32005
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1341217
repl_backlog_histlen:1048576
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:192.168.0.106
master_port:6381
master_link_status:up
master_last_io_seconds_ago:2
master_sync_in_progress:0
slave_repl_offset:2393116
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:6fd1130af8acb8600ffcfed4cba1457557530233
master_replid2:d1d421395d0c3121060b4b984f9896a54b1190e0
master_repl_offset:2393116
second_repl_offset:2389793
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1344541
repl_backlog_histlen:1048576
127.0.0.1:6379> master_link_down_since_seconds
可以看到 6379端口对应的节点在一段时间后升级成了主节点。重点关注下master_link_status:up和master_last_io_seconds_ago这两个参数。
master_link_status 可以为UP 或者 DOWN 表示当前节点和主节点的连接状态 UP表示正常连接 DOWN表示检测到主节点宕掉了
master_last_io_seconds_ago: 如果master_link_status是DOWN的状态 这个字段记录的是down掉的时间单位是秒 超过指定时间会发起新的主节点你的选举
连接超时时间可以通过参数sentinel down-after-milliseconds设置,默认30秒。
从上面的命令及结果中可以看到失联超过30s自动选举了6381成为了新的主节点。测试下这个主节点和从节点
127.0.0.1:6379> keys *
1) "user1"
127.0.0.1:6379> get user1
"zjtx2018"
127.0.0.1:6379> set user1 zjtx2020
(error) READONLY You can't write against a read only replica.
127.0.0.1:6379>
127.0.0.1:6381> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=192.168.0.106,port=6379,state=online,offset=0,lag=0
master_replid:6fd1130af8acb8600ffcfed4cba1457557530233
master_replid2:d1d421395d0c3121060b4b984f9896a54b1190e0
master_repl_offset:2390242
second_repl_offset:2389793
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1341667
repl_backlog_histlen:1048576
127.0.0.1:6381> keys *
1) "user1"
127.0.0.1:6381> get user1
"zjtx2018"
127.0.0.1:6381> set user1 zjtx2020
OK
127.0.0.1:6381>
可以看到6379不能插入数据,主节点的6381可以正常操作数据
看下哨兵节点日志
可以看到先是检测到主节点Down 然后进行了Vote(投票选举)。
再看下对应的配置文件
/opt/redis-cluster/redis-6381/sentinel.conf
/opt/redis-cluster/redis-6379/sentinel.conf
可以看到在配置文件中当前主节点的配置被修改了。
到此,基于docker的一主两从三哨兵的环境就搭建好了,支持故障转移。