一. 简介
在项目验收阶段, 单点Redis的问题被提出来, 参照诸位大神的博客(本文最下方), 最后确定使用keepalived实现redis高可用, 现将其记录下来, 大致思路如下:
- 本项目前端部分使用PHP包Predis, redis主要用于保存Cache和Session, Predis包操作redis集群不支持del操作(实操报错), 遂考虑搭建双机热备Redis环境, 使用keepalived实现主备故障转移, 以及vip漂移, 环境如下:
初始backup1 : 192.168.203.129
初始Backup2 : 192.168.203.130
vip: 192.168.203.240
不了解keepalived vrrp的同学可以参照如下博客:
http://outofmemory.cn/wiki/keepalived-configuration
http://hugnew.com/?p=745keepalived安装后两台机器初始配置状态都是 BACKUP , 优先级都设置为100 , 分别启动两台机器的redis和keepalived, 最初时, 两个机器都是BACKUP状态, 最先启动keepalived机器由于路由组中只有自己一台机器, 会被推举成为master节点(自己推举自己成为master), 随后启动keepalived的机器由于优先级和前一台机器一致, 所以会成为backup节点
keepalived优先级有效范围为0-255(博客上都这么说, 原因未知), 超过255会被转成100
启动keepalived前, 需要先启动两台机器的redis, 并且配置主从, 搭建方法参考上一篇博客https://www.jianshu.com/p/acd3281d9074, 配置主从:
# 在backup2上配置redis成为backup1 redis 的slave
./bin/redis-cli SLAVEOF 192.168.203.129 6379
- 其中backup1需要先启动keepalived成为主节点, 抢占vip, (通过
ip a
查看网卡ip)
二. 安装keepalived
yum -y install keepalived
- 配置keepalived, 主节点和从节点都是BACKUP
vim /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
router_id redis
}
# keepalived监控脚本
vrrp_script chk_redis {
#keepalived 健康检测执行脚本
script /usr/local/redis/keepalived/scripts/redis-check.sh
#每隔几秒发一次健康检测请求
interval 2
#自我确定健康异常, 优先级加多少, priority += weight
#当脚本返回非0, 则认为健康异常, 优先级 + -10
weight -10
#检测失败几次,认为是redis 服务器挂了
fall 3
}
# 实例
vrrp_instance redis {
# 主和从都是BACKUP
state BACKUP
#eth0 表示监听哪块儿网卡
interface eth0
#主从一致
virtual_router_id 51
#优先级, backup1和backup2都设为100, 重要, 关系到vip漂移问题
priority 100
# 发送vrrp通告间隔, 对比优先级
advert_int 1
virtual_ipaddress {
#虚拟的ip 是多少
192.168.203.240
}
# 健康检查脚本
track_script {
chk_redis
}
#keepalived 内部通信,本机ip 地址
unicast_src_ip 192.168.203.129
unicast_peer {
#指定其它keepalived 地址,如果这个不指定,可能出现,主从都虚拟出了192.168.203.240 ip地址
192.168.203.130
}
#keepalived 被推选为主服务器器时执行的脚本
notify_master /usr/local/redis/keepalived/scripts/redis-master.sh
#keepalived 被降级为从服务器时执行的脚本
notify_backup /usr/local/redis/keepalived/scripts/redis-backup.sh
#keepalived 运行出现错误的时候执行的脚本
notify_fault /usr/local/redis/keepalived/scripts/redis-fault.sh
#keepalived 服务停止时执行脚本
notify_stop /usr/local/redis/keepalived/scripts/redis-stop.sh
}
- backup2 keepalived配置和backup1区别如下:
vrrp_instance redis {
unicast_src_ip 192.168.203.130
unicast_peer {
192.168.203.129
}
}
三. 创建监控脚本
除redis-backup.sh外, 其他脚本在backup1 和2 上都保持一致
- 创建脚本保存目录, 以及日志保存目录
mkdir -p /usr/local/redis/keepalived/scripts
mkdir -p /usr/local/redis/keepalived/logs
- redis-check.sh 脚本: 用于检测redis 服务健康状态
vim /usr/local/redis/keepalived/scripts/redis-check.sh
#!/bin/bash
#日志文件位置
logFile=/usr/local/redis/keepalived/logs/check.log
#ping 本机redis服务
pingRS=`/usr/local/redis/bin/redis-cli PING`
#如果ping 的结果为PONG,那么返回0 ,否则返回1
if [ "$pingRS"x == "PONG"x ] ; then
exit 0
else
echo "[`date`] ping is error !" >> $logFile
exit 1
fi
- redis-master.sh 脚本: keepalived 被推选为主服务器时执行
vim /usr/local/redis/keepalived/scripts/redis-master.sh
#!/bin/bash
# redis-cli 命令绝对路径
cliCmd=/usr/local/redis/bin/redis-cli
# keepalived 日志文件位置
logFile=/usr/local/redis/keepalived/logs/master.log
echo "`[date]` master " >> $logFile
# 成为主节点则redis需要取消复制, 也成为主节点
$cliCmd SLAVEOF NO ONE &>>$logFile
- redis-backup.sh 脚本: keepalived 被降级为从服务器时执行
backup1 和 backup2 中$cliCmd SLAVEOF 192.168.203.130 6379
不一致, 需要修改
vim /usr/local/redis/keepalived/scripts/redis-backup.sh
#!/bin/bash
#日志文件
logFile=/usr/local/redis/keepalived/logs/backup.log
# redis-cli 命令绝对路径
cliCmd=/usr/local/redis/bin/redis-cli
echo "[`date`] begin to slave ..." >> $logFile
# 成为从节点, 需要检测redis是否启动, 没有启动, 则启动redis
service redis-server start
# 设置0,5s睡眠, 重要, 不要大 , 也不要小,
# 太小, 下一步设置主从不成功(原因还未知)
# 太大, 如5s, 在5s期间, 节点会由backup转成master, 5s后才会执行slaveof, redis出现问题
sleep 0.5
# 设置主从关系
# keepalived成为从节点后, redis需要后成为兄弟节点的从节点
# backup2 此行为$cliCmd SLAVEOF 192.168.203.129 6379
$cliCmd SLAVEOF 192.168.203.130 6379
echo "[`date`] slave done !" $ >> $logFile
- redis-fault.sh 脚本: keepalived 执行出现错误时执行
vim /usr/local/redis/keepalived/scripts/redis-fault.sh
#!/bin/bash
#Desc keepalived 发生错误时执行脚本
# keepalived 日志文件位置
logFile=/usr/local/redis/keepalived/logs/fault.log
# 向日志输出错误信息
echo "[$(date)] ***** redis falut ***" >> $logFile
- redis-stop.sh 脚本: keepavlied 服务停止时执行
vim /usr/local/redis/keepalived/scripts/redis-stop.sh
#!/bin/bash
#Desc keepalived 停止时执行脚本
#日志文件
logFile=/usr/local/redis/keepalived/logs/stop.log
#输出日志信息
echo "[`date`] stop ..." >> $logFile
- 重要: 给脚本添加可执行权限
cd /usr/local/redis/keepalived/scripts
chmod u+x *`
- 在/etc/rc.local中添加启动
vim /etc/rc.local
service redis-server start
service keepalived start
- 启动backup1(主节点)和backup2(从节点)的redis, 并通过slave of设置redis主从
- 启动主节点的keepalived, 抢占vip
- 启动从节点的keepalived, 成为从节点
- ip a查看各机器ip, 是否绑定了vip
ip a
2: eth0: mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:f5:b3:41 brd ff:ff:ff:ff:ff:ff
inet 192.168.203.130/24 brd 192.168.203.255 scope global eth0
# vip绑定成功
inet 192.168.203.240/32 scope global eth0
- 在其他机器通过虚拟ip连接到redis进行检查, info查看redis当前状态
./bin/redis-cli -h 192.168.203.240
192.168.203.240:6379> info
# Replication
role:master
connected_slaves:1
slave0:ip=192.168.203.130,port=6379,state=online,offset=12071,lag=1
master_repl_offset:12071
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:11904
repl_backlog_histlen:168
四. 高可用原理分析
1. 主节点redis挂掉
- 主节点redis挂掉后, 健康检查脚本ping不通过, 会返回1, weight设置为-10, 所以优先级会降低, 发送自己的vrrp通告为100-10
- 从节点keepalived对比自己优先级, 发出vrrp通告宣誓自己优先级为100, 捍卫主权(其实一直在发送), 从而被推选成为master, 触发从节点机器的redis_master.sh脚本并执行, redis停止复制, 成为主节点, 向局域网内其他机器发出arp包, 说明自己是10.101.67.240 ip对应的机器, 各机器将backup2的MAC地址和10.101.67.240映射关系缓存起来, 通过
ip a
查看ip地址 - 主节点被降级成为backup节点, 触发redis_backup.sh脚本并执行, 脚本再次尝试启动redis, 并开启复制 slave of backup2, redis成为slave, vip发生漂移
- 经过以上几步, 主节点redis挂掉发生故障转移.
- 可以通过查看日志来监控状态切换
tail -f /usr/local/redis/keepalived/logs/backup.log
tail -f /usr/local/redis/keepalived/logs/master.log
- 整个过程不需要人为干预
2. 主节点keepalived挂掉
- 主节点keepalived挂掉后, vip直接发生漂移至备份节点, 触发备份节点成为主节点, 执行redis_master.sh, redis成为master, 应用通过vip直接访问该redis, 实现故障转移,
- 发生故障的机器手动启动挂掉的keepalived, 由于优先级相同, 则成为backup节点. 不会抢占vip, 触发backup脚本, redis开启复制
3. 主节点机器宕机
- 同主节点keepalived挂掉情况一致
4. 从节点redis挂掉
- 可以通过重启keepalived, 重新开启复制, 不会抢占master
5. 从节点keepalived挂掉和从节点宕机情况一致, 启动即可
五. 总结
经过以上几部, 可以实现redis高可用和故障转移, 已在真机上验证
参考博客:
Redis + Keepalived主从集群的搭建及故障转移: https://blog.csdn.net/ECHO_FOLLOW_HEART/article/details/51595228
redis中文网: http://www.redis.cn/documentation.html
https://blog.csdn.net/ws891033655/article/details/39834457
http://blog.51cto.com/hao360/1435297
keepalived:
Keepalived原理与实战精讲--VRRP协议: https://blog.csdn.net/wngua/article/details/54668794
http://outofmemory.cn/wiki/keepalived-configuration
http://hugnew.com/?p=745
http://fengchj.com/?p=2156#respond