参考资料:
http://my.oschina.net/guol/blog/182491
http://18567.blog.51cto.com/8567/655043
http://www.qixing318.com/article/by-keepalived-redis-double-machine.html
背景
目前,Redis集群的官方方案还处在开发测试中,未集成到稳定版中。且目前官方开发中的Redis Cluster提供的功能尚不完善(可参考官方网站或http://www.redisdoc.com/en/latest/topic/cluster-spec.html),在生产环境中不推荐使用。通过调研发现市面上要实现采用单一的IP来访问,大多采用keepalived实现redis的双机热备作为过渡方案。
环境部署
环境介绍:
Master: 192.168.1.218 redis,keepalived
Slave: 192.168.1.219 redis,keepalived
Virtural IP Address (VIP): 192.168.1.220
以下Master表示192.168.1.218这台主机,Slave表示192.168.1.219这台主机;master/slave表示keepalived/redis的role。(首字母大小写的区别)
设计思路:
通过keepalived的自定义脚本功能监控本机的redis服务状态,当监控脚本检测到redis服务出现异常时,则将本机的keepalived关闭,同时这会导致master/backup角色的变化,而keepalived在角色变化时也会触发一些机制执行相关脚本,这就为我们改变redis的master/slave状态提供了机会,这样做的目的是为了是redis的master/slave的数据保持一致。
在keepalived+redis的使用过程中有四种情况:
1 一种是keepalived挂了,同时redis也挂了,这样的话直接VIP飘走之后,是不需要进行redis数据同步的,因为redis挂了,你也无法去master上同步,不过会损失已经写在master上却还没同步到slave上面的这部分数据。
2 另一种是keepalived挂了,redis没挂,这时候VIP飘走后,redis的master/slave还是老的对应关系,如果不变化的话会把数据写入redis slave中,从而不会同步到master上去,这就要借助监控脚本反转redis的master/slave关系。这时候就要预留一点时间进行数据同步,然后反转master/slave。
3 还有一种是keepalived没挂,redis挂了,这时候根据监控脚本会检测到redis挂了,将本地的keepalived关闭,将虚拟IP漂移到另外一台服务器上。由另外一台备机承接redis业务。
4 随后一种是keepalived没挂,redis也没挂,什么都不用操作。
本文的实验环境四种情况都适合,第一种是不需要同步数据的,脚本会默认去同步数据,但是其实是不会成功的。脚本主要是用来处理第二和第三种情况的。
实施步骤:
-------------------创建专用用户-------------------
useradd -g develop redisadmin
echo ******|passwd --stdin redisadmin
说明:以下部署过程都是在root(或具备sudo权限的账号)账户下进行。
-------------------安装配置redis-------------------
在Master和Slave上进行如下操作:
1.下载redis源码
cd
wget http://download.redis.io/releases/redis-2.8.4.tar.gz
2.安装redis
tar -zxvf redis-2.8.4.tar.gz
cd redis-2.8.4
#reds的安装可以不用执行configure
make
#测试
make test
####在速度较慢的机器上执行make test可能出现下列错误,无影响
#*** [err]: Test replication partial resync: no backlog in tests/integration/replication-psync.tcl
3.配置redis
#创建redis主目录
mkdir -p /usr/local/redis/{bin,conf,logs}
#将可执行文件拷贝到相应的目录
find src/ \( -perm -0001 \) -type f -exec cp -a -R -p {} /usr/local/redis/bin \;
#创建redis启动脚本
vi /usr/local/redis/redis-start.sh
####以下是master上的配置,slave上的配置只需修改对应的IP地址。
#!/bin/bash RPATH=/usr/local/redis KPATH=/usr/local/keepalived REDISCLI=$RPATH/bin/redis-cli LOGFILE=$KPATH/logs/redis-state.log LOCALIP=192.168.1.218 REMOTEIP=192.168.1.219 $RPATH/bin/redis-server $RPATH/conf/redis.conf if [ "$?" == "0" ];then echo "[INFO]`date +%F/%H:%M:%S` :$LOCALIP redis start successful." >> $LOGFILE else echo "[ERROR]`date +%F/%H:%M:%S` :$LOCALIP redis start error." >> $LOGFILE fi
#!/bin/bash RPATH=/usr/local/redis KPATH=/usr/local/keepalived LOGFILE=$KPATH/logs/redis-state.log LOCALIP=192.168.1.218 REMOTEIP=192.168.219 kill -9 `ps -ef|grep '/bin/redis-server'|grep -v grep|awk '{print $2}'` if [ "$?" == "0" ];then echo "[INFO]`date +%F/%H:%M:%S` :$LOCALIP redis shutdown completed!" >> $LOGFILE else echo "[INFO]`date +%F/%H:%M:%S` :$LOCALIP redis is not started." >> $LOGFILE fi
#以下为改动部分,其他的按照实际生产环境进行调整
daemonize yes
pidfile /usr/local/redis/redis.pid
#bind 192.168.1.218 #暂时注释,方便测试
timeout 300
loglevel verbose #实际生产环境可用notice,此处是为了详细查看各种输出细节
logfile "/usr/local/redis/logs/redis.log"
dir /usr/local/redis/
appendonly yes
#修改redis的属主和权限
chown -R redisadmin:develop /usr/local/redis/
-------------------安装配置keepalived-------------------
1.下载keepalived最新源码包1.2.10
wget http://www.keepalived.org/software/keepalived-1.2.10.tar.gz
2.安装keepalived
需要先安装以下依赖包: make gcc libpopt-dev libnl-dev libcurl4-openssl-dev popt openssl
cd
tar zxvf keepalived-1.2.10.tar.gz
cd keepalived-1.2.10
注意:先按照下列出错解决版本修改后再执行后面步骤:
./configure --prefix=/usr/local/keepalived
make && make install
安装出错(在版本1.2.9上不会出错):
解决办法:
vi ~/keepalived-1.2.10/keepalived/libipvs-2.6/libipvs.c
按照下图的方法修正(新增57行,注释82行):
3.配置keepalived
cd /usr/local/keepalived
#将keepalived.conf备份:
mv /usr/local/keepalived/etc/keepalived/keepalived.conf /usr/local/keepalived/etc/keepalived/keepalived.conf-bak
#在Master:192.168.1.218上创建如下配置文件(可根据实际情况调整):
vi /usr/local/keepalived/etc/keepalived/keepalived.conf
! Configuration File for keepalived #global_defs { # notification_email { # } # notification_email_from [email protected] # router_id node3 #} vrrp_script chk_redis { script "/usr/local/keepalived/etc/keepalived/scripts/redis_check.sh" #如果脚本执行结果非0,并且weight配置的值小于0,则优先级相应的减少;如果脚本执行结果为0,并且weight配置的值大于0,则优先级相应的增加;其他情况,维持原本prority的优先级。 # weight -20 interval 10 #设置脚本执行的频率。10秒一次 } vrrp_instance VI_1 { state BACKUP #要设置恢复时不抢占,需要将主,从服务器的此项都设置成BACKUP,nopreempt才会生效。 #state MASTER interface eth3 virtual_router_id 51 priority 100 nopreempt #设置不抢占。在priority值比较高的服务器上设置即可。priority值比较低的服务器启动时,发现值高的服务器为master,自动不抢占。 #advert_int的作用是巡检的次数。keepalived默认是在启动完成后3秒向state:MASTER切换。若此处设置成2,则是2*3=6秒后才开启切换。 advert_int 1 authentication { auth_type PASS auth_pass redis } virtual_ipaddress { 192.168.1.220 } track_script { chk_redis } notify_master /usr/local/keepalived/etc/keepalived/scripts/master.sh notify_backup /usr/local/keepalived/etc/keepalived/scripts/backup.sh notify_fault /usr/local/keepalived/etc/keepalived/scripts/fault.sh notify_stop /usr/local/keepalived/etc/keepalived/scripts/stop.sh }
! Configuration File for keepalived #global_defs { # notification_email { # } # notification_email_from [email protected] # router_id node3 #} vrrp_script chk_redis { script "/usr/local/keepalived/etc/keepalived/scripts/redis_check.sh" #如果脚本执行结果非0,并且weight配置的值小于0,则优先级相应的减少;如果脚本执行结果为0,并且weight配置的值大于0,则优先级相应的增加;其他情况,维持原本prority的优先级。 # weight -20 interval 10 #设置脚本执行的频率。10秒一次 } vrrp_instance VI_1 { state BACKUP interface eth5 garp_master_delay 10 virtual_router_id 51 priority 90 #nopreempt #advert_int的作用是巡检的次数。keepalived默认是在启动完成后3秒向state:MASTER切换。若此处设置成2,则是2*3=6秒后才开启切换。 advert_int 1 authentication { auth_type PASS auth_pass redis } virtual_ipaddress { 192.168.1.220 } track_script { chk_redis } notify_master /usr/local/keepalived/etc/keepalived/scripts/master.sh #当keepalived切换成master时,会触发执行master.sh notify_backup /usr/local/keepalived/etc/keepalived/scripts/backup.sh #当keepalived切换成slave时,会触发执行slave.sh notify_fault /usr/local/keepalived/etc/keepalived/scripts/fault.sh #当keepalived出错时,会触发执行fault.sh notify_stop /usr/local/keepalived/etc/keepalived/scripts/stop.sh #当keepalived停止时,会触发执行stop.sh }
#指定keepalived的日志文件
vi /usr/local/keepalived/etc/sysconfig/keepalived
#KEEPALIVED_OPTIONS="-D" KEEPALIVED_OPTIONS="-D -d -S 0"
#Save keepalived message to keepalived.log local0.* /usr/local/keepalived/logs/keepalived.log
cp -r * /
#在Master和Slave上创建监控Redis的相关脚本脚本,以下脚本都是master上的配置,slave上只需修改相应的IP地址。
mkdir /usr/local/keepalived/etc/keepalived/scripts
vi /usr/local/keepalived/etc/keepalived/scripts/redis_check.sh
#!/bin/bash KPATH=/usr/local/keepalived RPATH=/usr/local/redis REDISCLI=$RPATH/bin/redis-cli LOGFILE=$KPATH/logs/redis-state.log LOCALIP="192.168.1.218" REMOTEIP="192.168.1.219" PORT="6379" PID=$$ ALIVE=`$REDISCLI PING` if [ "$ALIVE" == "PONG" ]; then echo "[INFO]`date +'%Y-%m-%d:%H:%M:%S'` :$LOCALIP local redis is health." >> $LOGFILE exit 0 else echo "[ERROR]`date +'%Y-%m-%d:%H:%M:%S'` :$LOCALIP local redis is not health." >> $LOGFILE #当发现本地redis无法连接时,等待一秒后再进行一次检查。若恢复,则提示;若仍无法连接,则关闭本地keepalived,将虚拟ip漂移到另外一台服务器上。 sleep 1 ALIVE1=`$REDISCLI PING` if [ "$ALIVE1" == "PONG" ];then echo "[NOTICE]`date +'%Y-%m-%d:%H:%M:%S'` :$LOCALIP local redis become health." >> $LOGFILE exit 0 else echo "[ERROR]`date +'%Y-%m-%d:%H:%M:%S'` :$LOCALIP local redis is error." >> $LOGFILE echo "[ERROR]`date +'%Y-%m-%d:%H:%M:%S'` :$LOCALIP shutdown local keepalived." >> $LOGFILE /etc/init.d/keepalived stop if [ "$?" != "0" ];then echo "[ERROR]`date +'%Y-%m-%d:%H:%M:%S'` :$LOCALIP keepalived shutdown error." >> $LOGFILE else echo "[INFO]`date +'%Y-%m-%d:%H:%M:%S'` :$LOCALIP keepalived shutdown completed." >> $LOGFILE fi exit 1 fi fi
#######################################################################
vi /usr/local/keepalived/etc/keepalived/scripts/master.sh
#!/bin/bash KPATH=/usr/local/keepalived RPATH=/usr/local/redis REDISCLI=$RPATH/bin/redis-cli LOGFILE=$KPATH/logs/redis-state.log LOCALIP="192.168.1.218" REMOTEIP="192.168.1.219" PORT="6379" PID=$$ #当此服务器的keepalived恢复成master时,即虚拟IP切换到本机时,将本机的redis切换成role:master echo "[WARM]-----------keepalived change to master,change local redis to master---------------" >> $LOGFILE echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[slave]" >> $LOGFILE #先切换成role:slave echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[slave] Run 'SLAVEOF $REMOTEIP $PORT'" >> $LOGFILE $REDISCLI SLAVEOF $REMOTEIP $PORT >> $LOGFILE 2>&1 #同步数据 echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[slave] wait 10 sec for data sync from old master" >> $LOGFILE sleep 10 #等待10秒(此时间要根据实际业务需要进行调整),待数据同步完,再切换成role:master echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[slave] data rsync from old mater ok..." >> $LOGFILE echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master] Run slaveof no one,close master/slave" >> $LOGFILE $REDISCLI SLAVEOF NO ONE >> $LOGFILE 2>&1 echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master] wait other slave connect...." >> $LOGFILE echo "-------------------------------------complete!------------------------------------------" >> $LOGFILE
#######################################################################
vi /usr/local/keepalived/etc/keepalived/scripts/backup.sh
#!/bin/bash KPATH=/usr/local/keepalived RPATH=/usr/local/redis REDISCLI=$RPATH/bin/redis-cli LOGFILE=$KPATH/logs/redis-state.log LOCALIP="192.168.1.218" REMOTEIP="192.168.1.219" PORT="6379" PID=$$ #当此服务器的keepalived恢复成slave时,即虚拟IP切换到其他服务器时,将本机redis切换成role:slave echo "[WARM]------------keepalived change to slave,change local redis to slave----------------" >> $LOGFILE echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master]" >> $LOGFILE echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master] Being slave state..." >> $LOGFILE 2>&1 #切换时,等待10秒,让对方同步数据(此时间要根据实际业务需要进行调整) echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master] wait 10 sec for data sync from old master" >> $LOGFILE sleep 10 echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master] data rsync from old mater ok..." >> $LOGFILE #等数据同步完,再切换成role:slave echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[slave] Run 'SLAVEOF $REMOTEIP $PORT'" >> $LOGFILE $REDISCLI SLAVEOF $REMOTEIP $PORT >> $LOGFILE 2>&1 echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[slave] slave connect to $REMOTEIP $PORT ok..." >> $LOGFILE echo "-------------------------------------complete!------------------------------------------" >> $LOGFILE
#######################################################################
vi /usr/local/keepalived/etc/keepalived/scripts/stop.sh
#!/bin/sh KPATH=/usr/local/keepalived RPATH=/usr/local/redis REDISCLI=$RPATH/bin/redis-cli LOGFILE=$KPATH/logs/redis-state.log LOCALIP="192.168.1.218" REMOTEIP="192.168.1.219" PORT="6379" PID=$$ #当主服务器的keepalived停止时,将本机redis切换成role:slave echo "[ERROR]-----------------keepalived stop,change local redis to slave---------------------" >> $LOGFILE echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master]" >> $LOGFILE echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master] Being slave state..." >> $LOGFILE 2>&1 #切换时,等待10秒,让对方同步数据(此时间要根据实际业务需要进行调整) echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master] wait 10 sec for data sync from old master" >> $LOGFILE sleep 10 echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master] data rsync from old mater ok..." >> $LOGFILE #等数据同步完,再切换成role:slave echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[slave] Run 'SLAVEOF $REMOTEIP $PORT'" >> $LOGFILE $REDISCLI SLAVEOF $REMOTEIP $PORT >> $LOGFILE 2>&1 echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[slave] slave connect to $REMOTEIP $PORT ok..." >> $LOGFILE echo "-------------------------------------complete!------------------------------------------" >> $LOGFILE
#######################################################################
vi /usr/local/keepalived/etc/keepalived/scripts/fault.sh
#!/bin/bash KPATH=/usr/local/keepalived RPATH=/usr/local/redis REDISCLI=$RPATH/bin/redis-cli LOGFILE=$KPATH/logs/redis-state.log LOCALIP="192.168.1.218" REMOTEIP="192.168.1.219" PORT="6379" PID=$$ #当此服务器的keepalived出错时,将本机redis切换成role:slave echo "[ERROR]---------------keepalived is fault,change local redis to slave-------------------" >> $LOGFILE echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master]" >>$LOGFILE echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master] Being slave state..." >> $LOGFILE 2>&1 #切换时,等待10秒,让对方同步数据(此时间要根据实际业务需要进行调整) echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master] wait 10 sec for data sync from old master" >> $LOGFILE sleep 10 echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master] data rsync from old mater ok..." >> $LOGFILE #等数据同步完,再切换成role:slave echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[slave] Run 'SLAVEOF $REMOTEIP $PORT'" >> $LOGFILE $REDISCLI SLAVEOF $REMOTEIP $PORT >> $LOGFILE 2>&1 echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[slave] slave connect to $REMOTEIP $PORT ok..." >> $LOGFILE echo "-------------------------------------complete!------------------------------------------" >> $LOGFILE
#######################################################################
修改监控脚本的权限:
chmod -R 750 /usr/local/keepalived/etc/keepalived/scripts/
系统测试
注意:
(1).在keepalived.conf配置文件中,将keepalived双机 都设置成BACKUP.同时在218上设置了nopreempt,即恢复时不抢占。而规划中是将218作为master。所以在启动过程中要遵循以下顺序:先启动218上的keepalived,等待数据同步完成后,再启动219上的keepalived.
(2).在keepalived的巡检脚本redis_check.sh中加入了状态切换的监控脚本。在master.sh中设置了当keepalived切换成master,会先将redis切换成slave进行同步数据,再切换回master。所以在启动keepalived之前,要保证Master和Slave上redis的数据是一致的,这样先启动redis的master那台的keepalived,虽然redis master会连接到redis slave同步数据,但是两边数据在刚开始的时候是一致的,并不会产生什么问题。
(3).在实际生产环境中需要修改防火墙策略,开放相应的端口。在此直接先将防火墙关闭:service iptables stop。
以下为各种测试场景和输出结果:
-----------------------------------------初始环境--------------------------------------------------
设定一下初始环境:
----启动218和219的redis: /usr/local/redis/redis-start.sh
----启动218的keepalived: service keepalived start;先不启动219的keepalived.
在218上执行tail �Cf /usr/local/keepalived/logs/keepalived.log,可看到keepavlived切换成master state(配置文件中是设置state:backup),且绑定了VIP。
查看218Master:redis的日志,可以看到redis切换的过程如下:
----启动219Slave的keepalived,并查看redis的日志,可以看到redis的状态变成了slave:
-----------------------------------------初始环境--------------------------------------------------
-----------------------------------------设计思路3-------------------------------------------------
----模拟设计思路3,将218Master的redis进程kill掉:
此时218的keepalived会被停止,如下图:
219的keepalived会正确切换成State:Master,VIP完成漂移,如下图:
218的redis监控日志如下,
219的redis监控日志如下,显示了219已切换成master,保证了业务(当然此处218在内存中未写入文件的数据会丢失):
----模拟218从故障中恢复:
因为在发现故障时,会将218上的keepalived关闭,因此恢复时,需要先启动218的redis,然后再启动218的keepalived:
查看218的keepalived日志,218的keepalived直接进入state:backup,不会造成业务的来回切换:
查看218的redis日志,218的redis启动后,会切换成已存在redis服务器的备机。
综上所示,设计思路3测试成功。
-----------------------------------------设计思路3-------------------------------------------------
-----------------------------------------设计思路2-------------------------------------------------
----先设置成初始环境,再模拟设计思路2,将218的keepalived进程kill掉(service keepalived stop):
查看218的redis监控日志:
查看219的keepalivd日志,说明keepalived正常切换了:
查看219的redis监控日志,可以看到redis完成了主从切换:
----模拟218从keepalived故障中恢复(只需要先kill所有keepalived进程后正常启动),执行service keepalived start:
查看219的keepalived的日志,可以看到keepalived的state为backup,不会造成VIP的漂移:
查看218的redis监控日志,可
查看218的redis运行日志,可以看到redis恢复为slave身份,不会造成业务切换:
综上所示,设计思路2测试成功。
-----------------------------------------设计思路2-------------------------------------------------
设计思路4即为初始环境,设计思路1的情况为设计思路2和3的综合情况,无需测试了。
以上即为Redis双机热备方案。