哨兵集群介绍
Redis的哨兵(sentinel) 常用于管理多个 Redis 服务器,它主要会执行以下三个任务:
监控(Monitoring):哨兵(sentinel) 会不断地检查你的Master和Slave是否运作正常。
提醒(Notification):当被监控的某个 Redis出现问题时, 哨兵(sentinel) 可以通过 API 向管理员或者其他应用程序发送通知。
自动故障迁移(Automatic failover):当一个Master不能正常工作时,哨兵(sentinel) 会开始一次自动故障迁移操作,它会将 Master 故障后的其中一个 Slave 升级为新的 Master,并让 Master 故障后的其他 Slave 改为复制新的Master;当客户端试图连接故障的 Master 时,集群也会向客户端返回新Master的地址,使得集群可以使用新的 Master 代替故障的 Master。
哨兵(sentinel) 是一个分布式系统,你可以在一个架构中运行多个哨兵(sentinel) 进程,这些进程使用流言协议(gossipprotocols)来接收关于 Master 是否下线的信息,并使用投票协议(agreement protocols)来决定是否执行自动故障迁移,以及选择哪个 Slave 作为新的 Master。
每个哨兵(sentinel) 会向其它哨兵(sentinel)、master、slave定时发送消息,以确认对方是否“活”着,如果发现对方在指定时间(可配置)内未回应,则暂时认为对方已挂(所谓的”主观认为宕机” Subjective Down,简称sdown)。
若“哨兵群”中的多数 sentinel ,报告某一 Master 没响应,系统才认为该 Master "彻底死亡"(即:客观上的真正down机,Objective Down,简称odown),通过一定的vote算法,从剩下的slave节点中,选一台提升为master,然后自动修改相关配置。
虽然哨兵(sentinel) 释出为一个单独的可执行文件 redis-sentinel ,但实际上它只是一个运行在特殊模式下的 Redis 服务器,你可以在启动一个普通 Redis 服务器时通过给定 --sentinel 选项来启动哨兵(sentinel)。
服务器规划
Redis安装
下载官方安装包:https://redis.io/download
1、Redis支持包安装
由于 Redis 是由 C 语言编写,所系统需要安装 gcc
[root@rocketmq-nameserver1 ~]# yum install -y gcc automake autoconf libtool make
2、解压下载的 Redis 安装包,并进入目录,进行编译,分别在3台服务器上操作。
[root@rocketmq-nameserver1 redis-5.0.4]# make install
此过程大概需要2分钟左右,耐心等待。。。。。。
说明:如果在编译过程中出现 该错误,则需要在编译的时候使用此命令 make MALLOC=libc install 可解决。
3、复制 Redis 命令文件到 /usr/bin 目录
[root@rocketmq-nameserver1 redis-5.0.4]# cd src/
[root@rocketmq-nameserver1 src]# cp redis-cli redis-sentinel redis-server /usr/bin
redis配置
过滤 Redis 配置文件 redis.conf
[root@rocketmq-nameserver1 redis-5.0.4]# grep -Ev "^#|^$" redis.conf
bind 192.168.2.177
protected-mode yes
port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 300
daemonize yes
supervised no
pidfile /wdata/redis/data/redis.pid
loglevel notice
logfile "/wdata/redis/logs/redis.log"
databases 16
always-show-logo yes
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir /wdata/redis
replicaof 192.168.2.177 6379
replica-serve-stale-data yes
replica-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-disable-tcp-nodelay no
replica-priority 100
lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
replica-lazy-flush no
appendonly no
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble yes
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
stream-node-max-bytes 4096
stream-node-max-entries 100
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
dynamic-hz yes
aof-rewrite-incremental-fsync yes
rdb-save-incremental-fsync yes
我们对 Redis 配置文件的修改,大概只需要修改上面标红的部分,其他选项可根据自己的需要修改。其中下面两项为
daemonize yes:设置 Redis 启动时在后台运行
replicaof 192.168.2.177 6379:设置集群 Master 服务器地址和端口,注意,在 192.168.2.177 主Redis 服务器上将该项注释
将修改好的配置文件复制到其他两台服务器
for i in 178 180; do scp redis.conf [email protected].$i:/wdata/redis/config; done
哨兵配置
过滤 sentinel 配置文件 sentinel.conf
[root@rocketmq-nameserver1 redis-5.0.4]# grep -Ev "^#|^$" sentinel.conf
bind 192.168.2.177
port 26379
daemonize yes
pidfile /wdata/redis-sentinel.pid
logfile "/wdata/redis/logs/sentinel.log"
dir /wdata/redis
sentinel monitor mymaster 192.168.2.177 6379 2
sentinel down-after-milliseconds mymaster 30000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
sentinel deny-scripts-reconfig yes
一般,我们只需要修改上面标红的部分,其他选项根据实际需要进行修改,其中下面意向的含义为
sentinel monitor mymaster 192.168.2.177 6379 2:设置哨兵监控的主服务器为 192.168.2.177,端口为6379,2 表示如果同时有 2 个哨兵都认为该主服务器不可访问时,则进行故障转移。
将修改好的 sentinel 配置文件分发到其他两台服务器。
for i in 178 180; do scp sentinel.conf [email protected].$i:/wdata/redis/config; done
启动Redis
在三台服务器上分别执行
[root@rocketmq-nameserver1 ~]# service redis start
查看 Redis 是否启动成功
[root@rocketmq-nameserver1 ~]# ps -ef | grep redis
或者
[root@rocketmq-nameserver1 ~]# service redis status
启动哨兵
在三台服务器上分别执行
[root@rocketmq-nameserver1 ~]# service sentinel start
查看哨兵启动是否成功
[root@rocketmq-nameserver1 ~]# ps -ef | grep sentinel
或者
[root@rocketmq-nameserver1 ~]# service sentinel status
查看哨兵集群状态
我们的 Redis 服务和哨兵(sentinel)服务均已成功启动,接下来我们需要验证一下集群是否正常。
[root@rocketmq-nameserver1 ~]# redis-cli -h 192.168.2.177 -p 26379 192.168.2.177:26379> SENTINEL sentinels mymaster 1) 1) "name" 2) "648ced6a4a5126ffe053c7190a7787ce8507122d" 3) "ip" 4) "192.168.2.178" 5) "port" 6) "26379" 7) "runid" 8) "648ced6a4a5126ffe053c7190a7787ce8507122d" 9) "flags" 10) "sentinel" 11) "link-pending-commands" 12) "0" 13) "link-refcount" 14) "1" 15) "last-ping-sent" 16) "0" 17) "last-ok-ping-reply" 18) "203" 19) "last-ping-reply" 20) "203" 21) "down-after-milliseconds" 22) "30000" 23) "last-hello-message" 24) "425" 25) "voted-leader" 26) "?" 27) "voted-leader-epoch" 28) "0" 2) 1) "name" 2) "25133c581dc5a5dcc41ed40d720bf417b70d6449" 3) "ip" 4) "192.168.2.180" 5) "port" 6) "26379" 7) "runid" 8) "25133c581dc5a5dcc41ed40d720bf417b70d6449" 9) "flags" 10) "sentinel" 11) "link-pending-commands" 12) "0" 13) "link-refcount" 14) "1" 15) "last-ping-sent" 16) "0" 17) "last-ok-ping-reply" 18) "203" 19) "last-ping-reply" 20) "203" 21) "down-after-milliseconds" 22) "30000" 23) "last-hello-message" 24) "870" 25) "voted-leader" 26) "?" 27) "voted-leader-epoch" 28) "0"
由输出可见,其他两个哨兵运行正常
192.168.2.177:26379> SENTINEL masters 1) 1) "name" 2) "mymaster" 3) "ip" 4) "192.168.2.177" 5) "port" 6) "6379" 7) "runid" 8) "f2c92c055ec129186fec93d0b394a99ad120fd1d" 9) "flags" 10) "master" 11) "link-pending-commands" 12) "0" 13) "link-refcount" 14) "1" 15) "last-ping-sent" 16) "0" 17) "last-ok-ping-reply" 18) "800" 19) "last-ping-reply" 20) "800" 21) "down-after-milliseconds" 22) "30000" 23) "info-refresh" 24) "10034" 25) "role-reported" 26) "master" 27) "role-reported-time" 28) "120535" 29) "config-epoch" 30) "0" 31) "num-slaves" 32) "2" 33) "num-other-sentinels" 34) "2" 35) "quorum" 36) "2" 37) "failover-timeout" 38) "180000" 39) "parallel-syncs" 40) "1"
由上输出可见,主 Redis 正常
192.168.2.177:26379> SENTINEL slaves mymaster 1) 1) "name" 2) "192.168.2.180:6379" 3) "ip" 4) "192.168.2.180" 5) "port" 6) "6379" 7) "runid" 8) "2675617f208ace5c13161e133819be75f7079946" 9) "flags" 10) "slave" 11) "link-pending-commands" 12) "0" 13) "link-refcount" 14) "1" 15) "last-ping-sent" 16) "0" 17) "last-ok-ping-reply" 18) "448" 19) "last-ping-reply" 20) "448" 21) "down-after-milliseconds" 22) "30000" 23) "info-refresh" 24) "4606" 25) "role-reported" 26) "slave" 27) "role-reported-time" 28) "235584" 29) "master-link-down-time" 30) "0" 31) "master-link-status" 32) "ok" 33) "master-host" 34) "192.168.2.177" 35) "master-port" 36) "6379" 37) "slave-priority" 38) "100" 39) "slave-repl-offset" 40) "47120" 2) 1) "name" 2) "192.168.2.178:6379" 3) "ip" 4) "192.168.2.178" 5) "port" 6) "6379" 7) "runid" 8) "803d8178c44a4380243523248dca0838c7964299" 9) "flags" 10) "slave" 11) "link-pending-commands" 12) "0" 13) "link-refcount" 14) "1" 15) "last-ping-sent" 16) "0" 17) "last-ok-ping-reply" 18) "448" 19) "last-ping-reply" 20) "448" 21) "down-after-milliseconds" 22) "30000" 23) "info-refresh" 24) "4606" 25) "role-reported" 26) "slave" 27) "role-reported-time" 28) "235627" 29) "master-link-down-time" 30) "0" 31) "master-link-status" 32) "ok" 33) "master-host" 34) "192.168.2.177" 35) "master-port" 36) "6379" 37) "slave-priority" 38) "100" 39) "slave-repl-offset" 40) "47120"
由上输出可见 Redis 从服务正常。
哨兵集群常用命令
SENTINEL masters #列出所有被监视的master,以及当前master状态 SENTINEL master#列出指定的master SENTINEL slaves #列出给定master的所有slave以及slave状态 SENTINEL sentinels #列出监控指定的master的所有sentinel SENTINEL get-master-addr-by-name #返回给定master名字的服务器的IP地址和端口号 SENTINEL reset #重置所有匹配pattern表达式的master状态 SENTINEL failover #当msater失效时, 在不询问其他 Sentinel 意见的情况下, 强制开始一次自动故障迁移,但是它会给其他sentinel发送一个最新的配置,其他sentinel会根据这个配置进行更新 SENTINEL ckquorum #检查当前sentinel的配置能否达到故障切换master所需的数量,此命令可用于检测sentinel部署是否正常,正常返回ok SENTINEL flushconfig #强制sentinel将运行时配置写入磁盘,包括当前sentinel状态
redis启动脚本
#!/bin/sh #chkconfig: 2345 55 25 # # Simple Redis init.d script conceived to work on Linux systems # as it does use of the /proc filesystem. ### BEGIN INIT INFO # Provides: redis_6379 # Default-Start: 2 3 4 5 # Default-Stop: 0 1 6 # Short-Description: Redis data structure server # Description: Redis data structure server. See https://redis.io ### END INIT INFO source /etc/init.d/functions REDISPORT=6379 EXEC=/usr/bin/redis-server CLIEXEC=/usr/bin/redis-cli PIDFILE=/wdata/redis/data/redis.pid CONF="/wdata/redis/config/redis.conf" AUTH="" BIND_IP='192.168.2.177' start(){ if [ -f $PIDFILE ] then echo "$PIDFILE exists, process is already running or crashed" else echo "Starting Redis server..." $EXEC $CONF fi if [ "$?"="0" ] then echo "Redis is running..." else echo "Redis not running !" fi } stop(){ if [ ! -f $PIDFILE ] then echo "$PIDFILE does not exist, process is not running" else PID=$(cat $PIDFILE) echo "Stopping ..." #$CLIEXEC -h $BIND_IP -a $AUTH -p $REDISPORT shutdown $CLIEXEC -h $BIND_IP -p $REDISPORT shutdown while [ -x /proc/${PID} ] do echo "Waiting for Redis to shutdown ..." sleep 1 done echo "Redis stopped." fi } status(){ ps -ef | grep redis-server | grep -v grep >/dev/null 2>&1 if [ $? -eq 0 ];then echo "redis server is running." else echo "redis server is stopped." fi } case "$1" in start) start ;; stop) stop ;; restart) stop start ;; status) status ;; *) echo "Please use start or stop as first argument" ;; esac
sentinel启动脚本
#!/bin/sh #chkconfig: 2345 55 25 # # Simple Sentinel init.d script conceived to work on Linux systems # as it does use of the /proc filesystem. ### BEGIN INIT INFO # Provides: redis_6379 # Default-Start: 2 3 4 5 # Default-Stop: 0 1 6 # Short-Description: Sentinel data structure server # Description: Sentinel data structure server. See https://redis.io ### END INIT INFO source /etc/init.d/functions REDISPORT=26379 EXEC=/usr/bin/redis-sentinel CLIEXEC=/usr/bin/redis-cli PIDFILE=/wdata/redis/data/redis-sentinel.pid CONF="/wdata/redis/config/sentinel.conf" AUTH="" BIND_IP='192.168.2.177' start(){ if [ -f $PIDFILE ] then echo "$PIDFILE exists, process is already running or crashed" else echo "Starting Sentinel server..." $EXEC $CONF fi if [ "$?"="0" ] then echo "Sentinel is running..." else echo "Sentinel not running !" fi } stop(){ if [ ! -f $PIDFILE ] then echo "$PIDFILE does not exist, process is not running" else PID=$(cat $PIDFILE) echo "Stopping ..." #$CLIEXEC -h $BIND_IP -a $AUTH -p $REDISPORT shutdown $CLIEXEC -h $BIND_IP -p $REDISPORT shutdown while [ -x /proc/${PID} ] do echo "Waiting for Sentinel to shutdown ..." sleep 1 done echo "Sentinel stopped." fi } status(){ ps -ef | grep redis-sentinel | grep -v grep >/dev/null 2>&1 if [ $? -eq 0 ];then echo "Sentinel server is running." else echo "Sentinel server is stopped." fi } case "$1" in start) start ;; stop) stop ;; restart) stop start ;; status) status ;; *) echo "Please use start or stop as first argument" ;; esac