Redis高可用之哨兵模式它就是,当你的reids挂掉了之后,它可以自己切换到其他redis上.不影响用户的正常使用.
Sentinel具有四个特点: 监控,通知,自动故障转移,配置提供者
这是Redis的使用API,和Sentinel的链接.
spring Data Redis API : https://docs.spring.io/spring-data/redis/docs/current/api/
Redis Sentinel :http://redisdoc.com/topic/sentinel.html
下面直接上Dome
一个健壮的部署需要三个哨兵实例,在这里我使用了VMware CentOs64 虚拟机.三个Redis.
修改redis.conf配置文件,(主要是修改redis的
修改sentinel.conf(
运行三个redis,运行三个Sentinel.
验证:杀死master的redis-server. 是否在设置的时间之后完成故障转移.
分析Sentinel的日志信息.分析问题.
前提:make好redis到指定的目录,分好运行的/bin 和配置文件的/etc ,基本的常识不再赘述..
使用的三个reids 分别是 redis1,redis2,redis3.
[root@MyVMS-CentOS64-NO1 local]# cd /usr/local/
[root@MyVMS-CentOS64-NO1 local]# ll
总用量 60
drwxr-xr-x. 2 root root 4096 9月 12 19:54 bin
drwxr-xr-x. 2 root root 4096 9月 23 2011 etc
drwxr-xr-x. 2 root root 4096 9月 23 2011 games
drwxr-xr-x. 2 root root 4096 9月 23 2011 include
drwxr-xr-x. 2 root root 4096 9月 23 2011 lib
drwxr-xr-x. 2 root root 4096 9月 23 2011 lib64
drwxr-xr-x. 2 root root 4096 9月 23 2011 libexec
drwxr-xr-x. 6 root root 4096 9月 15 08:17 redis1
drwxr-xr-x. 6 root root 4096 9月 15 07:40 redis2
drwxr-xr-x. 6 root root 4096 9月 15 08:17 redis3
drwxr-xr-x. 2 root root 4096 9月 23 2011 sbin
drwxr-xr-x. 5 root root 4096 9月 13 01:22 share
drwxr-xr-x. 2 root root 4096 9月 23 2011 src
drwxr-xr-x. 3 root root 4096 9月 13 13:06 tomcat1
下面是自己创建的redis1的层级,方便区分.redis-4.0.11是make后的redis.
[root@MyVMS-CentOS64-NO1 local]# cd /usr/local/redis1/
[root@MyVMS-CentOS64-NO1 redis1]# ll
总用量 16
drwxr-xr-x. 2 root root 4096 9月 15 12:12 bin
drwxr-xr-x. 2 root root 4096 9月 15 12:41 etc
drwxr-xr-x. 2 root root 4096 9月 15 08:17 log
drwxrwxr-x. 6 root root 4096 9月 12 21:17 redis-4.0.11
/bin的内容.
[root@MyVMS-CentOS64-NO1 redis1]# cd /usr/local/redis1/bin/
[root@MyVMS-CentOS64-NO1 bin]# ll
总用量 35484
-rw-r--r--. 1 root root 197 9月 15 12:12 dump.rdb
-rwxr-xr-x. 1 root root 5597510 9月 12 19:52 redis-benchmark
-rwxr-xr-x. 1 root root 8331160 9月 12 19:52 redis-check-aof
-rwxr-xr-x. 1 root root 8331160 9月 12 19:52 redis-check-rdb
-rwxr-xr-x. 1 root root 5737866 9月 12 19:52 redis-cli
lrwxrwxrwx. 1 root root 12 9月 12 19:52 redis-sentinel -> redis-server
-rwxr-xr-x. 1 root root 8331160 9月 12 19:52 redis-server
/etc下的配置文件
[root@MyVMS-CentOS64-NO1 bin]# cd /usr/local/redis1/etc/
[root@MyVMS-CentOS64-NO1 etc]# ll
总用量 72
-rw-rw-r--. 1 root root 58877 9月 15 11:37 redis.conf
-rw-r--r--. 1 root root 8484 9月 15 11:37 sentinel.conf
/log下的日志文件
[root@MyVMS-CentOS64-NO1 etc]# cd /usr/local/redis1/log/
[root@MyVMS-CentOS64-NO1 log]# ll
总用量 16
-rw-r--r--. 1 root root 14084 9月 15 11:51 redis-sentinel_6380.log
另外两个目录层级也一样.
搭建高可用Redis 一主两从
Redis-1 Redis-2 Redis-3
6380 6381 6382
启动三个独立的Redis,
./usr/local/redis1/bin/redis-server /usr/local/redis1/etc/redis.conf
./usr/local/redis2/bin/redis-server /usr/local/redis2/etc/redis.conf
./usr/local/redis3/bin/redis-server /usr/local/redis3/etc/redis.conf
这里我们把reids1,端口为6380作为master,执行下面命令,
命令 6381 为 6380 的仆人.
./usr/local/redis2/bin/redis-cli -h 192.168.111.129 -p 6381 slaveof 192.168.111.129 6380
命令 6382 为 6380 的仆人.
./usr/local/redis3/bin/redis-cli -h 192.168.111.129 -p 6382 slaveof 192.168.111.129 6380
这里是使用了一台服务器,部署了三个redis,所以需要修改端口号.
启动三个哨兵:
这里说明一下,在三个哨兵Sentinel.conf的配置文件中需要注意的地方.其他配置不懂的地方去看上面的Sentinel的官网,贴上配置文件. 哨兵只需要监听主master的redis 就可以知道master下的主从配置.从而找到从redis.
redis1中的Sentinel.conf部分配置.
---
sentinel monitor mymaster 192.168.111.129 6380 2 //这里的都写上你的主master的地址和端口.
sentinel down-after-milliseconds mymaster 6000
sentinel failover-timeout mymaster 60000
---
cd /
./usr/local/redis1/bin/redis-server /usr/local/redis1/etc/sentinel.conf --sentinel
cd /
./usr/local/redis2/bin/redis-server /usr/local/redis2/etc/sentinel.conf --sentinel
cd /
./usr/local/redis3/bin/redis-server /usr/local/redis3/etc/sentinel.conf --sentinel
这里启动的redis 和 sentinel 都是以后台运行(daemonize yes)的方式进行的.所以需要查看三个Sentinel的日志.
[root@MyVMS-CentOS64-NO1 log]# ps -ef | grep redis
root 38130 37887 0 10:45 pts/0 00:00:00 tail -f /usr/local/redis3/log/redis-sentinel_6382.log
root 38131 38088 0 10:45 pts/2 00:00:00 tail -f /usr/local/redis2/log/redis-sentinel_6381.log
root 38132 38111 0 10:45 pts/3 00:00:00 tail -f /usr/local/redis1/log/redis-sentinel_6380.log
root 38274 1 0 11:05 ? 00:00:19 ./usr/local/redis1/bin/redis-server 192.168.111.129:6380
root 38285 1 0 11:05 ? 00:00:21 ./usr/local/redis3/bin/redis-server 192.168.111.129:6382
root 38291 1 0 11:05 ? 00:00:31 ./usr/local/redis1/bin/redis-server *:26380 [sentinel]
root 38299 1 0 11:06 ? 00:00:31 ./usr/local/redis2/bin/redis-server *:26381 [sentinel]
root 38304 1 0 11:06 ? 00:00:31 ./usr/local/redis3/bin/redis-server *:26382 [sentinel]
root 38466 1 0 11:51 ? 00:00:14 ./usr/local/redis2/bin/redis-server 192.168.111.129:6381
root 38815 37710 0 13:24 pts/5 00:00:00 grep redis
查看一下谁是master . 下面是查看sentinel信息的一些命令 (需要在sentinel 客户端下.)
[root@MyVMS-CentOS64-NO1 /]# cd /usr/local/redis1/bin/
[root@MyVMS-CentOS64-NO1 bin]# ./redis-cli -h 192.168.111.129 -p 26380
192.168.111.129:26380> sentinel masters
1) 1) "name"
2) "mymaster"
3) "ip"
4) "192.168.111.129"
5) "port"
6) "6380" //他是master
7) "runid"
8) "6b087ab930d63644997fdb6283224f23e3af62b0"
9) "flags"
10) "master"
11) "link-pending-commands"
12) "0"
13) "link-refcount"
14) "1"
15) "last-ping-sent"
16) "0"
17) "last-ok-ping-reply"
18) "675"
19) "last-ping-reply"
20) "675"
21) "down-after-milliseconds"
22) "6000"
23) "info-refresh"
24) "6839"
25) "role-reported"
26) "master"
27) "role-reported-time"
28) "363894"
29) "config-epoch"
30) "10"
31) "num-slaves"
32) "2"
33) "num-other-sentinels"
34) "2"
35) "quorum"
36) "2"
37) "failover-timeout"
38) "60000"
39) "parallel-syncs"
40) "1"
命令 : info sentinel.
192.168.111.129:26380> info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=192.168.111.129:6380,slaves=2,sentinels=3
下面是在redis客户端下的查看信息命令:
[root@MyVMS-CentOS64-NO1 bin]# ./usr/local/redis1/bin/redis-cli -h 192.168.111.129 -p 6380
192.168.111.129:6380> info replication
# Replication
role:master //角色是mater
connected_slaves:2 //有两台slave
slave0:ip=192.168.111.129,port=6381,state=online,offset=84653,lag=1
slave1:ip=192.168.111.129,port=6382,state=online,offset=84506,lag=1
master_replid:64cd5666c92af9c9539bb390b3585bc0bb640916
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:84653
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:84653
同样查看一下 在redis2服务 的角色.
[root@MyVMS-CentOS64-NO1 /]# ./usr/local/redis2/bin/redis-cli -h 192.168.111.129 -p 6381
192.168.111.129:6381> info replication
# Replication
role:slave //在redis2下面的 6381 为 slave .
master_host:192.168.111.129
master_port:6380 //master 为 6380
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:123258
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:64cd5666c92af9c9539bb390b3585bc0bb640916
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:123258
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:123258
OK ,到这里主从关系很明显了.查看一下哨兵的监听情况.打开三个哨兵的日志文件.
38304:X 15 Sep 13:34:13.838 * +reboot master mymaster 192.168.111.129 6380
38304:X 15 Sep 13:34:13.922 # -sdown master mymaster 192.168.111.129 6380
38304:X 15 Sep 13:34:13.922 # -odown master mymaster 192.168.111.129 6380
38304:X 15 Sep 13:34:19.217 * +reboot slave 192.168.111.129:6381 192.168.111.129 6381 @ mymaster 192.168.111.129 6380
38304:X 15 Sep 13:34:19.275 # -sdown slave 192.168.111.129:6381 192.168.111.129 6381 @ mymaster 192.168.111.129 6380
38304:X 15 Sep 13:34:21.975 * +reboot slave 192.168.111.129:6382 192.168.111.129 6382 @ mymaster 192.168.111.129 6380
38304:X 15 Sep 13:34:22.032 # -sdown slave 192.168.111.129:6382 192.168.111.129 6382 @ mymaster 192.168.111.129 6380
看到6381 , 6382 都是服从 6380 的. (主服务的 sdown 和 odown 出现的情况去看下说明吧.在故障转移的时候这里在分析日志的时候需要注意一下)
sdown 和 sdown (主观下线 和 客观下线) .Redis 的 Sentinel 中关于下线(down)有两个不同的概念:
服务都全部起起来了,日志也已经跑起来了.天空变得很晴朗,心情也不错..接下来就把主服务6380搞死.测试一下故障转移.
立即执行:
redis-cli -p 6379 DEBUG sleep 30
这个命令将使master休眠30秒不能访问。这主要是模仿master由于一些原因挂掉。
(或者直接 kill 掉 主redis )
如果你检查了Sentinel日志,以应该能看到很多动作:
日志上面我已经用 斜线 备注了 redis的端口号.
------------redis1-----6380--
38304:X 15 Sep 14:01:15.476 # +sdown master mymaster 192.168.111.129 6380 //主观下线
38304:X 15 Sep 14:01:19.100 # +new-epoch 12 //新时代的到来
38304:X 15 Sep 14:01:19.105 # +vote-for-leader 72ffd0c42052057fc72c530d8dd1211d167c69ca 12//投票
38304:X 15 Sep 14:01:19.749 # +odown master mymaster 192.168.111.129 6380 #quorum 3/2 //三台Sentinel监听确认客观下线
38304:X 15 Sep 14:01:19.750 # Next failover delay: I will not start a failover before Sat Sep 15 14:03:19 2018 //开始故障转移
38304:X 15 Sep 14:01:20.205 # +config-update-from sentinel 72ffd0c42052057fc72c530d8dd1211d167c69ca 192.168.111.129 26381 @ mymaster 192.168.111.129 6380
38304:X 15 Sep 14:01:20.205 # +switch-master mymaster 192.168.111.129 6380 192.168.111.129 6382
38304:X 15 Sep 14:01:20.206 * +slave slave 192.168.111.129:6381 192.168.111.129 6381 @ mymaster 192.168.111.129 6382
38304:X 15 Sep 14:01:20.207 * +slave slave 192.168.111.129:6380 192.168.111.129 6380 @ mymaster 192.168.111.129 6382
38304:X 15 Sep 14:01:22.717 # +sdown slave 192.168.111.129:6380 192.168.111.129 6380 @ mymaster 192.168.111.129 6382
---------------------------------------redis3----------6382----
38291:X 15 Sep 14:01:19.040 # +sdown master mymaster 192.168.111.129 6380
38291:X 15 Sep 14:01:19.096 # +new-epoch 12
38291:X 15 Sep 14:01:19.103 # +vote-for-leader 72ffd0c42052057fc72c530d8dd1211d167c69ca 12
38291:X 15 Sep 14:01:19.106 # +odown master mymaster 192.168.111.129 6380 #quorum 3/2
38291:X 15 Sep 14:01:19.106 # Next failover delay: I will not start a failover before Sat Sep 15 14:03:19 2018
38291:X 15 Sep 14:01:20.203 # +config-update-from sentinel 72ffd0c42052057fc72c530d8dd1211d167c69ca 192.168.111.129 26381 @ mymaster 192.168.111.129 6380
38291:X 15 Sep 14:01:20.204 # +switch-master mymaster 192.168.111.129 6380 192.168.111.129 6382
38291:X 15 Sep 14:01:20.204 * +slave slave 192.168.111.129:6381 192.168.111.129 6381 @ mymaster 192.168.111.129 6382 // 6381 服从 6382
38291:X 15 Sep 14:01:20.204 * +slave slave 192.168.111.129:6380 192.168.111.129 6380 @ mymaster 192.168.111.129 6382 // 6380 服从 6382
38291:X 15 Sep 14:01:26.236 # +sdown slave 192.168.111.129:6380 192.168.111.129 6380 @ mymaster 192.168.111.129 6382 // // 6380 挂了 也要 服从 6382.
6380主挂了,6382侍从立马上位 为 主 . ( 这个选举 leader 的规则,由 配置可以修改
下面看一下6382 现在的角色到底是不是 主master??
[root@MyVMS-CentOS64-NO1 /]# ./usr/local/redis3/bin/redis-cli -h 192.168.111.129 -p 6382
192.168.111.129:6382> info replication
# Replication
role:master // 6382 的角色是master了.没错晋升了
connected_slaves:1 // 6380 挂了,现在只有一个 侍从了.
slave0:ip=192.168.111.129,port=6381,state=online,offset=669638,lag=1
master_replid:3ec08fb977346bdf70bb33dc53b75b507b19fe89
master_replid2:64cd5666c92af9c9539bb390b3585bc0bb640916
master_repl_offset:669785
second_repl_offset:351242
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:669785
使用Sentinel客户端来查看一下.
[root@MyVMS-CentOS64-NO1 /]# ./usr/local/redis3/bin/redis-cli -h 192.168.111.129 -p 26382
192.168.111.129:26382> SENTINEL get-master-addr-by-name mymaster //获取当前的master的名称和地址
1) "192.168.111.129"
2) "6382"
故障转移已经完成了.
敲黑板!当故障转移完成后,完成后,完成后,才去启动刚才已经挂掉的6380 redis .启动后的6380 已经不再是曾经的master了,就算他再次复活也只是一个slave.
(这里有在故障转移时的坑,如果当你发现6380 master挂掉之后,Sentinel在准备故障转移时,你手动启动了6380.Sentinel的算法会凌乱的,结果就是三个redis没有master.这时候需要全部重启.).建议是当故障转移完成后,方可启动挂掉的master.
你想啊,全部的侍从都在仰视,准备新王登基,还未完成登基,突然曾经的王6380回来了.....是不是必定是一场精彩的战斗.此时群龙无首...