主从架构无法实现master和slave角色的自动切换,当master出现redis服务异常、主机断电、磁盘损坏等问题导致master无法使用,而redis主从复制无法实现自动的故障转移,需要手动修改环境配置,才能切换到slave redis服务器,另外也无法横向扩展redis服务的并行写入性能,当单台redis服务器性能无法满足业务写入需求的时候,也需要解决以上的两个核心问题:
master和slave角色的无缝切换,让业务无感知从而不影响业务使用
可横向动态扩展redis服务
redis集群实现方式:
客户端分片:由应用决定将不同的KEY发送到不同的redis服务器
代理分片:由代理决定将不同的KEY发送到不同的redis服务器,代理程序如:codis,twemproxy等
Redis Cluster
Sentinel进程是用于监控redis集群中master主服务器工作的状态,在master主服务器发生故障时候,可以实现master和slave服务器的切换,保证系统的高可用,此功能在redis2.6+版本已引用,redis的哨兵模式到了2.8版本之后就稳定下来。一般在生产环境也建议使用redis2.8版本以后版本
哨兵(sentinel)是一个分布式系统,可以在一个架构中运行多个哨兵(sentinel)进程,这些进程使用流言协议(gossip protocols)来接收关于master主服务器是否下线的消息,并使用投票协议(Agreement Protocols)来决定是否执行自动故障迁移,以及选择哪个slave作为新的master
每个哨兵(sentienl)进程会向其它哨兵(sentinel)、master、slave定时发送消息,以确认对方是否活着,如果发现对方在指定配置时间内未得到响应,则暂时认为对方已离线,也就是所谓的主观认为宕机,英文名称:Subjective Down,简称SDOWN
有主观宕机,对应的有客观宕机。当哨兵群众的多数sentinel进程在对master主服务器做出SDOWN的判断,并且通过SENTINEL is-master-down-by-addr命令相互交流之后,得出的master server下线判断,这种方式就是客观宕机,英文名称是:Objectively Down,简称ODOWN
通过一定的vote算法,从剩下的slave从服务器节点中,选一台提升为master服务器节点,然后自动修改相关配置,并开启故障转移(failover)
sentinel机制可以解决master和slave角色的自动切换问题,但单个master的性能瓶颈问题无法解决,类似于MySQL中的MHA功能
Redis Sentienl的sentinel节点个数应该大于等于3且最好为奇数
客户端初始化时连接的是sentinel节点集合,不再是具体的redis节点,但sentinel只是配置中心不是代理
redis sentinel节点与普通redis没有区别,要实现读写分离依赖于客户端程序
redis3.0之前版本中,生产环境一般是用哪个哨兵模式,但3.0后推出redis cluster功能后,可以支持更大规模的生产环境
每10秒每个sentinel对master和slave执行info
发现slave节点
确认主从关系
每2秒每个sentinel通过master节点的channel交换信息
通过sentinel_:hello频道交互
交互对节点的看法和自身信息
每1秒每个sentinel对其他sentinel和redis执行ping
哨兵的前提是已经实现了一个redis的主从复制的运行环境,从而实现一个一主两从基于哨兵的高可用redis架构
注意:master的配置文件中masteerauth和slave都必须相同
所有主从节点的redis.conf中关键配置
#所有主从节点执行
# vim /apps/redis/etc/redis.conf
bind 0.0.0.0
masterauth 123456
requirepass 123456
#在所有从节点执行
# echo "replicaof 10.0.0.58 6379" >> /apps/redis/etc/redis.conf
#在所有主从节点执行
# systemctl enable --now redis
master服务器
[root@centos8 ~]# redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=10.0.0.17,port=6379,state=online,offset=42,lag=0
slave1:ip=10.0.0.27,port=6379,state=online,offset=42,lag=0
master_failover_state:no-failover
master_replid:1e6cb22b9e44690d4c6730744d24b83c0026b070
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:42
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:42
配置slave1
[root@centos7 ~]# redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> replicaof 10.0.0.58 6379
OK
127.0.0.1:6379> config set masterauth "123456"
OK
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:10.0.0.58
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:182
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:1e6cb22b9e44690d4c6730744d24b83c0026b070
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:182
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:182
配置slave2
[root@centos7 ~]# redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> replicaof 10.0.0.58 6379
OK
127.0.0.1:6379> config set masterauth "123456"
OK
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:10.0.0.58
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:182
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:1e6cb22b9e44690d4c6730744d24b83c0026b070
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:182
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:182
sentinel配置
sentinel实际上是一个特殊的redis服务器,有些redis指令支持,但很多指令并不支持,默认监听在26379/tcp端口
哨兵可以不和redis服务器部署在一起,但一般部署在一起,所有redis节点使用相同的示例配置文件
#如果是编译安装,在源码目录有sentinel.conf,复制到安装目录即可
如:/apps/redis/etc/sentinel.conf
# vim /apps/redis/etc/sentinel.conf
bind 0.0.0.0
port 26379
daemonize yes
pidfile "redis-sentinel.pid"
logfile "sentinel_26379.log"
dir "/tmp"
sentinel monitor mymaster 10.0.0.58 6379 2 #指定当前mymaster集群中master服务器的地址和端口
#2为法定人数限制(quorum),即有几个sentinel认为master down了就进行故障转移,一般此值是所有sentinel节点(一般总数是>=3的 奇数,如:3,5,7等)的一半以上的整数值,比如,总数是3,即3/2=1.5,取整为2,是master的ODOWN客观下线的依据
sentinel auth-pass mymaster 123456 #mymaster集群中master的密码
sentinel down-after-milliseconds mymaster 30000 #(SDOWN)判断mymaster集群中所有节点的主观下线的时间,单位:毫秒,建议3000
sentinel parallel-syncs mymaster 1 #发生故障转移后,同时向新master同步数据的slave数量,数字越小总同步时间越长,但可以减轻新master的负载压力
sentinel failover-timeout mymaster 180000 #所有slaves指向新的master所需的超时时间,单位:毫秒
sentinel deny-scripts-reconfig yes #禁止修改脚本
三个哨兵服务器的配置如下
[root@centos8 ~]#grep -vE "^#|^$" /apps/redis/etc/sentinel.conf
bind 0.0.0.0
port 26379
daemonize yes
pidfile /apps/redis/run/redis-sentinel.pid
logfile /apps/redis/log/sentinel_26379.log
dir /tmp
sentinel monitor mymaster 10.0.0.58 6379 2
sentinel auth-pass mymaster 123456
sentinel down-after-milliseconds mymaster 3000
acllog-max-len 128
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
sentinel deny-scripts-reconfig yes
SENTINEL resolve-hostnames no
SENTINEL announce-hostnames no
[root@centos8 ~]#scp /apps/redis/etc/sentinel.conf 10.0.0.17:/apps/redis/etc/sentinel.conf
[root@centos8 ~]#scp /apps/redis/etc/sentinel.conf 10.0.0.17:/apps/redis/etc/sentinel.conf
三台哨兵服务器都要启动
#如果是编译安装在所有节点生成新的service文件
[root@centos8 ~]#vim /lib/systemd/system/redis-sentinel.service
[Unit]
Description=Redis Sentinel
After=network.target
[Service]
ExecStart=/apps/redis/bin/redis-sentinel /apps/redis/etc/sentinel.conf --supervised systemd
ExecStop=/bin/kill -s QUIT $MAINPID
User=redis
Group=redis
RuntimeDirectory=redis
RuntimeDirectoryMode=0755
[Install]
WantedBy=multi-user.target
[root@centos8 ~]#scp /lib/systemd/system/redis-sentinel.service 10.0.0.17:/lib/systemd/system/redis-sentinel.service
[root@centos8 ~]#scp /lib/systemd/system/redis-sentinel.service 10.0.0.27:/lib/systemd/system/redis-sentinel.service
#注意所有节点的目录权限,否则无法启动服务
[root@centos8 ~]#chown -R redis.redis /apps/redis/
[root@centos8 ~]#ll /apps/redis/etc/sentinel.conf
-rw-r--r-- 1 redis redis 14292 May 5 13:03 /apps/redis/etc/sentinel.conf
[root@slave1 ~]#chown -R redis.redis /apps/redis/
[root@slave1 ~]#ll /apps/redis/etc/sentinel.conf
-rw-r--r-- 1 redis redis 14291 May 5 13:03 /apps/redis/etc/sentinel.conf
[root@slave2 ~]#chown -R redis.redis /apps/redis/
[root@slave2 ~]#ll /apps/redis/etc/sentinel.conf
-rw-r--r-- 1 redis redis 14291 May 5 13:03 /apps/redis/etc/sentinel.conf
#重新加载配置文件
[root@centos8 ~]#systemctl daemon-reload
[root@slave1 ~]#systemctl daemon-reload
[root@slave2 ~]#systemctl daemon-reload
#启动哨兵服务
[root@centos8 ~]#systemctl start redis-sentinel
[root@slave1 ~]#systemctl start redis-sentinel
[root@slave2 ~]#systemctl start redis-sentinel
#确保每个哨兵的myid不同
[root@centos8 ~]# tail /apps/redis/etc/sentinel.conf
sentinel myid b0f5f4759acc78f7d12d7f8ad63ceecdd054f2f0
[root@centos7 ~]# tail /apps/redis/etc/sentinel.conf
sentinel myid 8811969413652e79ba5e787fa3a00a6022ac86f4
[root@centos7 ~]# tail /apps/redis/etc/sentinel.conf
sentinel myid 55ef9b2271ba0548eaea5cd43c200eb32c0e734f
[root@centos8 ~]# ss -ntl
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 511 0.0.0.0:26379 0.0.0.0:*
LISTEN 0 511 0.0.0.0:6379 0.0.0.0:*
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 511 [::]:26379 [::]:*
LISTEN 0 128 [::]:22 [::]:*
#两台slave也做同样的的端口验证
[root@centos8 ~]# tail -f /apps/redis/log/sentinel.log
2552:X 26 May 2022 10:16:49.568 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
2552:X 26 May 2022 10:16:49.568 # Redis version=6.2.4, bits=64, commit=00000000, modified=0, pid=2552, just started
2552:X 26 May 2022 10:16:49.568 # Configuration loaded
2552:X 26 May 2022 10:16:49.569 * Increased maximum number of open files to 10032 (it was originally set to 1024).
2552:X 26 May 2022 10:16:49.569 * monotonic clock: POSIX clock_gettime
2552:X 26 May 2022 10:16:49.571 * Running mode=sentinel, port=26379.
2552:X 26 May 2022 10:16:49.571 # Sentinel ID is b0f5f4759acc78f7d12d7f8ad63ceecdd054f2f0
2552:X 26 May 2022 10:16:49.571 # +monitor master mymaster 10.0.0.58 6379 quorum 2
2552:X 26 May 2022 10:17:29.551 * +sentinel sentinel 8811969413652e79ba5e787fa3a00a6022ac86f4 10.0.0.17 26379 @ mymaster 10.0.0.58 6379
2552:X 26 May 2022 10:17:51.012 * +sentinel sentinel 55ef9b2271ba0548eaea5cd43c200eb32c0e734f 10.0.0.27 26379 @ mymaster 10.0.0.58 6379
#两台slave同样方式查看日志
在sentinel状态中尤其是最后一行,涉及到masterIP是多少,有几个slave,有几个sentinel,必须符合全部服务器数量
[root@centos8 ~]# redis-cli -p 26379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
Warning: AUTH failed
127.0.0.1:26379> quit
[root@centos8 ~]# redis-cli -p 26379
127.0.0.1:26379> info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=10.0.0.58:6379,slaves=2,sentinels=3
#两个slave,三个sentinel服务器,如果sentinels值不对,检查myid可能冲突
[root@centos8 ~]# killall redis-server
#查看各个节点上哨兵信息
[root@centos8 ~]# redis-cli -p 26379
127.0.0.1:26379> info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=10.0.0.17:6379,slaves=2,sentinels=3 #这时候master已经转移到10.0.0.17上
#故障转移时sentinel的日志信息:
[root@centos8 ~]# tail -f /apps/redis/log/sentinel.log
2552:X 26 May 2022 10:38:11.538 # +sdown master mymaster 10.0.0.58 6379
2552:X 26 May 2022 10:38:11.605 # +odown master mymaster 10.0.0.58 6379 #quorum 3/2
2552:X 26 May 2022 10:38:11.606 # Next failover delay: I will not start a failover before Thu May 26 10:44:11 2022
2552:X 26 May 2022 10:38:11.786 # +config-update-from sentinel 8811969413652e79ba5e787fa3a00a6022ac86f4 10.0.0.17 26379 @ mymaster 10.0.0.58 6379
2552:X 26 May 2022 10:38:11.786 # +switch-master mymaster 10.0.0.58 6379 10.0.0.17 6379
2552:X 26 May 2022 10:38:11.787 * +slave slave 10.0.0.27:6379 10.0.0.27 6379 @ mymaster 10.0.0.17 6379
2552:X 26 May 2022 10:38:11.787 * +slave slave 10.0.0.58:6379 10.0.0.58 6379 @ mymaster 10.0.0.17 6379
2552:X 26 May 2022 10:38:14.864 # +sdown slave 10.0.0.58:6379 10.0.0.58 6379 @ mymaster 10.0.0.17 6379
[root@centos7 ~]# grep ^replicaof /apps/redis/etc/redis.conf
replicaof 10.0.0.17 6379
#哨兵配合文件的sentinel monitor IP同样也会被修改
[root@centos7 ~]# grep "^sentinel monitor" /apps/redis/etc/sentinel.conf
sentinel monitor mymaster 10.0.0.17 6379 2
#新的master状态
[root@centos7 ~]# redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=10.0.0.27,port=6379,state=online,offset=398647,lag=1
master_failover_state:no-failover
master_replid:48e6470690742969691ceb965ca8c247c9268a91
master_replid2:1e6cb22b9e44690d4c6730744d24b83c0026b070
master_repl_offset:398647
second_repl_offset:261201
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:398647
#另外一个slave指向新的master
[root@centos7 ~]# redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:10.0.0.17
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:408160
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:48e6470690742969691ceb965ca8c247c9268a91
master_replid2:1e6cb22b9e44690d4c6730744d24b83c0026b070
master_repl_offset:408160
second_repl_offset:261201
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:408160
#恢复最早的故障master节点
[root@centos8 ~]# systemctl start redis
#sentinel会自动修改下面指向新的master
[root@centos8 ~]# grep '^replicaof' /apps/redis/etc/redis.conf
replicaof 10.0.0.17 6379
#在最早的master爽查看状态
[root@centos8 ~]# redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command lineterface may not be safe.
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:10.0.0.17
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:471008
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:48e6470690742969691ceb965ca8c247c9268a91
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:471008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:440438
repl_backlog_histlen:30571
[root@centos8 ~]# redis-cli -p 26379
127.0.0.1:26379> info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=10.0.0.17:6379,slaves=2,sentinels=3
#观察最新master上状态的日志
[root@centos7 ~]# redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=10.0.0.27,port=6379,state=online,offset=498322,lag=1
slave1:ip=10.0.0.58,port=6379,state=online,offset=498469,lag=0
master_failover_state:no-failover
master_replid:48e6470690742969691ceb965ca8c247c9268a91
master_replid2:1e6cb22b9e44690d4c6730744d24b83c0026b070
master_repl_offset:498602
second_repl_offset:261201
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:498602
[root@centos7 ~]# tail -f /apps/redis/log/sentinel.log
7792:X 26 May 2022 10:53:38.562 # -sdown slave 10.0.0.58:6379 10.058 6379 @ mymaster 10.0.0.17 6379
手动让主节点下线
sentinel failover
#手工故障转移
[root@centos8 ~]# vim /apps/redis/etc/redis.conf
replica-priority 10 #指定优先级,值越小会优先成为新master
#手工让主节点下线后,会转移到优先级数值小的节点上去
[root@centos7 ~]# redis-cli -p 26379
127.0.0.1:26379> sentinel failover mymaster
OK
[root@centos8 ~]# yum -y install python3 python3-redis
[root@centos8 ~]# cat sentinel_test.py
#!/usr/bin/python3
import redis
from redis.sentinel import Sentinel
#连接哨兵服务器(主机名也可以用域名)
sentinel = Sentinel([('10.0.0.58', 26379),
('10.0.0.17', 26379),
('10.0.0.27', 26379)
],
socket_timeout=0.5)
redis_auth_pass='123456'
#mymaster 是运维人员配置哨兵模式的数据库名称,实际名称按照个人部署案例来填写
#获取主服务器地址
master = sentinel.discover_master('mymaster')
print(master)
#获取从服务器地址
slave = sentinel.discover_slaves('mymaster')
print(slave)
#获取主服务器进行写入
master = sentinel.master_for('mymaster', socket_timeout=0.5, password=redis_auth_pass, db=0)
w_ret = master.set('name', 'wang')
#输出:True
#获取从服务器进行读取(默认是round-roubin)
slave = sentinel.slave_for('mymaster', socket_timeout=0.5, password=redis_auth_pass, db=0)
r_ret = slave.get('name')
print(r_ret)
#输出:wang
[root@centos8 ~]# chmod +x sentinel_test.py
[root@centos8 ~]# ./sentinel_test.py
('10.0.0.58', 6379)
[('10.0.0.17', 6379), ('10.0.0.27', 6379)]
b'wang'