redis哨兵(Sentinel)原理和实现

redis哨兵(Sentinel)原理和实现

1redis集群介绍

主从架构无法实现master和slave角色的自动切换,当master出现redis服务异常、主机断电、磁盘损坏等问题导致master无法使用,而redis主从复制无法实现自动的故障转移,需要手动修改环境配置,才能切换到slave redis服务器,另外也无法横向扩展redis服务的并行写入性能,当单台redis服务器性能无法满足业务写入需求的时候,也需要解决以上的两个核心问题:

  1. master和slave角色的无缝切换,让业务无感知从而不影响业务使用

  2. 可横向动态扩展redis服务

redis集群实现方式:

  • 客户端分片:由应用决定将不同的KEY发送到不同的redis服务器

  • 代理分片:由代理决定将不同的KEY发送到不同的redis服务器,代理程序如:codis,twemproxy等

  • Redis Cluster

2哨兵(Sentinel)工作原理

2.1sentinel架构和故障转移

redis哨兵(Sentinel)原理和实现_第1张图片

 

Sentinel进程是用于监控redis集群中master主服务器工作的状态,在master主服务器发生故障时候,可以实现master和slave服务器的切换,保证系统的高可用,此功能在redis2.6+版本已引用,redis的哨兵模式到了2.8版本之后就稳定下来。一般在生产环境也建议使用redis2.8版本以后版本

哨兵(sentinel)是一个分布式系统,可以在一个架构中运行多个哨兵(sentinel)进程,这些进程使用流言协议(gossip protocols)来接收关于master主服务器是否下线的消息,并使用投票协议(Agreement Protocols)来决定是否执行自动故障迁移,以及选择哪个slave作为新的master

每个哨兵(sentienl)进程会向其它哨兵(sentinel)、master、slave定时发送消息,以确认对方是否活着,如果发现对方在指定配置时间内未得到响应,则暂时认为对方已离线,也就是所谓的主观认为宕机,英文名称:Subjective Down,简称SDOWN

有主观宕机,对应的有客观宕机。当哨兵群众的多数sentinel进程在对master主服务器做出SDOWN的判断,并且通过SENTINEL is-master-down-by-addr命令相互交流之后,得出的master server下线判断,这种方式就是客观宕机,英文名称是:Objectively Down,简称ODOWN

通过一定的vote算法,从剩下的slave从服务器节点中,选一台提升为master服务器节点,然后自动修改相关配置,并开启故障转移(failover)

sentinel机制可以解决master和slave角色的自动切换问题,但单个master的性能瓶颈问题无法解决,类似于MySQL中的MHA功能

Redis Sentienl的sentinel节点个数应该大于等于3且最好为奇数

客户端初始化时连接的是sentinel节点集合,不再是具体的redis节点,但sentinel只是配置中心不是代理

redis sentinel节点与普通redis没有区别,要实现读写分离依赖于客户端程序

redis3.0之前版本中,生产环境一般是用哪个哨兵模式,但3.0后推出redis cluster功能后,可以支持更大规模的生产环境

1.2.2sentinel中的三个定时任务

  • 每10秒每个sentinel对master和slave执行info

    • 发现slave节点

    • 确认主从关系

  • 每2秒每个sentinel通过master节点的channel交换信息

    • 通过sentinel_:hello频道交互

    • 交互对节点的看法和自身信息

  • 每1秒每个sentinel对其他sentinel和redis执行ping

3实现哨兵(sentinel)

3.1开始实现哨兵(sentinel)

redis哨兵(Sentinel)原理和实现_第2张图片

 

哨兵的前提是已经实现了一个redis的主从复制的运行环境,从而实现一个一主两从基于哨兵的高可用redis架构

注意:master的配置文件中masteerauth和slave都必须相同

所有主从节点的redis.conf中关键配置

#所有主从节点执行
# vim /apps/redis/etc/redis.conf
bind 0.0.0.0
masterauth 123456
requirepass 123456

#在所有从节点执行
# echo "replicaof 10.0.0.58 6379" >> /apps/redis/etc/redis.conf 
 
#在所有主从节点执行
# systemctl enable --now redis

 master服务器

[root@centos8 ~]# redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=10.0.0.17,port=6379,state=online,offset=42,lag=0
slave1:ip=10.0.0.27,port=6379,state=online,offset=42,lag=0
master_failover_state:no-failover
master_replid:1e6cb22b9e44690d4c6730744d24b83c0026b070
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:42
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:42

配置slave1

[root@centos7 ~]# redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> replicaof 10.0.0.58 6379
OK
127.0.0.1:6379> config set masterauth "123456"
OK
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:10.0.0.58
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:182
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:1e6cb22b9e44690d4c6730744d24b83c0026b070
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:182
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:182

配置slave2

[root@centos7 ~]# redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> replicaof 10.0.0.58 6379
OK
127.0.0.1:6379> config set masterauth "123456"
OK
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:10.0.0.58
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:182
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:1e6cb22b9e44690d4c6730744d24b83c0026b070
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:182
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:182

3.2编辑哨兵的配置文件

sentinel配置

sentinel实际上是一个特殊的redis服务器,有些redis指令支持,但很多指令并不支持,默认监听在26379/tcp端口

哨兵可以不和redis服务器部署在一起,但一般部署在一起,所有redis节点使用相同的示例配置文件

#如果是编译安装,在源码目录有sentinel.conf,复制到安装目录即可

如:/apps/redis/etc/sentinel.conf

# vim /apps/redis/etc/sentinel.conf

bind 0.0.0.0

port 26379

daemonize yes

pidfile "redis-sentinel.pid"

logfile "sentinel_26379.log"

dir "/tmp"

sentinel monitor mymaster 10.0.0.58 6379 2 #指定当前mymaster集群中master服务器的地址和端口

#2为法定人数限制(quorum),即有几个sentinel认为master down了就进行故障转移,一般此值是所有sentinel节点(一般总数是>=3的 奇数,如:3,5,7等)的一半以上的整数值,比如,总数是3,即3/2=1.5,取整为2,是master的ODOWN客观下线的依据

sentinel auth-pass mymaster 123456 #mymaster集群中master的密码

sentinel down-after-milliseconds mymaster 30000 #(SDOWN)判断mymaster集群中所有节点的主观下线的时间,单位:毫秒,建议3000

sentinel parallel-syncs mymaster 1 #发生故障转移后,同时向新master同步数据的slave数量,数字越小总同步时间越长,但可以减轻新master的负载压力

sentinel failover-timeout mymaster 180000 #所有slaves指向新的master所需的超时时间,单位:毫秒

sentinel deny-scripts-reconfig yes #禁止修改脚本

三个哨兵服务器的配置如下

[root@centos8 ~]#grep -vE "^#|^$" /apps/redis/etc/sentinel.conf
bind 0.0.0.0
port 26379
daemonize yes
pidfile /apps/redis/run/redis-sentinel.pid
logfile /apps/redis/log/sentinel_26379.log
dir /tmp
sentinel monitor mymaster 10.0.0.58 6379 2
sentinel auth-pass mymaster 123456
sentinel down-after-milliseconds mymaster 3000
acllog-max-len 128
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
sentinel deny-scripts-reconfig yes
SENTINEL resolve-hostnames no
SENTINEL announce-hostnames no

[root@centos8 ~]#scp /apps/redis/etc/sentinel.conf 10.0.0.17:/apps/redis/etc/sentinel.conf
[root@centos8 ~]#scp /apps/redis/etc/sentinel.conf 10.0.0.17:/apps/redis/etc/sentinel.conf

3.3启动哨兵

三台哨兵服务器都要启动

#如果是编译安装在所有节点生成新的service文件
[root@centos8 ~]#vim /lib/systemd/system/redis-sentinel.service
[Unit]
Description=Redis Sentinel
After=network.target

[Service]
ExecStart=/apps/redis/bin/redis-sentinel /apps/redis/etc/sentinel.conf --supervised systemd

ExecStop=/bin/kill -s QUIT $MAINPID
User=redis
Group=redis
RuntimeDirectory=redis
RuntimeDirectoryMode=0755

[Install]
WantedBy=multi-user.target

[root@centos8 ~]#scp /lib/systemd/system/redis-sentinel.service 10.0.0.17:/lib/systemd/system/redis-sentinel.service
[root@centos8 ~]#scp /lib/systemd/system/redis-sentinel.service 10.0.0.27:/lib/systemd/system/redis-sentinel.service

#注意所有节点的目录权限,否则无法启动服务
[root@centos8 ~]#chown -R redis.redis /apps/redis/
[root@centos8 ~]#ll /apps/redis/etc/sentinel.conf
-rw-r--r-- 1 redis redis 14292 May  5 13:03 /apps/redis/etc/sentinel.conf

[root@slave1 ~]#chown -R redis.redis /apps/redis/
[root@slave1 ~]#ll /apps/redis/etc/sentinel.conf
-rw-r--r-- 1 redis redis 14291 May  5 13:03 /apps/redis/etc/sentinel.conf

[root@slave2 ~]#chown -R redis.redis /apps/redis/
[root@slave2 ~]#ll /apps/redis/etc/sentinel.conf
-rw-r--r-- 1 redis redis 14291 May  5 13:03 /apps/redis/etc/sentinel.conf

#重新加载配置文件
[root@centos8 ~]#systemctl daemon-reload
[root@slave1 ~]#systemctl daemon-reload
[root@slave2 ~]#systemctl daemon-reload

#启动哨兵服务
[root@centos8 ~]#systemctl start redis-sentinel
[root@slave1 ~]#systemctl start redis-sentinel
[root@slave2 ~]#systemctl start redis-sentinel

#确保每个哨兵的myid不同
[root@centos8 ~]# tail /apps/redis/etc/sentinel.conf 
sentinel myid b0f5f4759acc78f7d12d7f8ad63ceecdd054f2f0
[root@centos7 ~]# tail /apps/redis/etc/sentinel.conf
sentinel myid 8811969413652e79ba5e787fa3a00a6022ac86f4
[root@centos7 ~]# tail /apps/redis/etc/sentinel.conf
sentinel myid 55ef9b2271ba0548eaea5cd43c200eb32c0e734f

3.4验证哨兵端口

[root@centos8 ~]# ss -ntl
State   Recv-Q  Send-Q   Local Address:Port    Peer Address:Port Process 
LISTEN  0       511            0.0.0.0:26379        0.0.0.0:*            
LISTEN  0       511            0.0.0.0:6379         0.0.0.0:*            
LISTEN  0       128            0.0.0.0:22           0.0.0.0:*            
LISTEN  0       511               [::]:26379           [::]:*            
LISTEN  0       128               [::]:22              [::]:*            
#两台slave也做同样的的端口验证

3.5查看哨兵日志

[root@centos8 ~]# tail -f /apps/redis/log/sentinel.log 
2552:X 26 May 2022 10:16:49.568 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
2552:X 26 May 2022 10:16:49.568 # Redis version=6.2.4, bits=64, commit=00000000, modified=0, pid=2552, just started
2552:X 26 May 2022 10:16:49.568 # Configuration loaded
2552:X 26 May 2022 10:16:49.569 * Increased maximum number of open files to 10032 (it was originally set to 1024).
2552:X 26 May 2022 10:16:49.569 * monotonic clock: POSIX clock_gettime
2552:X 26 May 2022 10:16:49.571 * Running mode=sentinel, port=26379.
2552:X 26 May 2022 10:16:49.571 # Sentinel ID is b0f5f4759acc78f7d12d7f8ad63ceecdd054f2f0
2552:X 26 May 2022 10:16:49.571 # +monitor master mymaster 10.0.0.58 6379 quorum 2
2552:X 26 May 2022 10:17:29.551 * +sentinel sentinel 8811969413652e79ba5e787fa3a00a6022ac86f4 10.0.0.17 26379 @ mymaster 10.0.0.58 6379
2552:X 26 May 2022 10:17:51.012 * +sentinel sentinel 55ef9b2271ba0548eaea5cd43c200eb32c0e734f 10.0.0.27 26379 @ mymaster 10.0.0.58 6379

#两台slave同样方式查看日志

3.6查看当前sentinel状态

在sentinel状态中尤其是最后一行,涉及到masterIP是多少,有几个slave,有几个sentinel,必须符合全部服务器数量

[root@centos8 ~]# redis-cli -p 26379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
Warning: AUTH failed
127.0.0.1:26379> quit
[root@centos8 ~]# redis-cli -p 26379
127.0.0.1:26379> info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=10.0.0.58:6379,slaves=2,sentinels=3
#两个slave,三个sentinel服务器,如果sentinels值不对,检查myid可能冲突

3.7停止redis master测试故障转移

[root@centos8 ~]# killall redis-server

#查看各个节点上哨兵信息
[root@centos8 ~]# redis-cli -p 26379
127.0.0.1:26379> info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=10.0.0.17:6379,slaves=2,sentinels=3		#这时候master已经转移到10.0.0.17上

#故障转移时sentinel的日志信息:
[root@centos8 ~]# tail -f /apps/redis/log/sentinel.log
2552:X 26 May 2022 10:38:11.538 # +sdown master mymaster 10.0.0.58 6379
2552:X 26 May 2022 10:38:11.605 # +odown master mymaster 10.0.0.58 6379 #quorum 3/2
2552:X 26 May 2022 10:38:11.606 # Next failover delay: I will not start a failover before Thu May 26 10:44:11 2022
2552:X 26 May 2022 10:38:11.786 # +config-update-from sentinel 8811969413652e79ba5e787fa3a00a6022ac86f4 10.0.0.17 26379 @ mymaster 10.0.0.58 6379
2552:X 26 May 2022 10:38:11.786 # +switch-master mymaster 10.0.0.58 6379 10.0.0.17 6379
2552:X 26 May 2022 10:38:11.787 * +slave slave 10.0.0.27:6379 10.0.0.27 6379 @ mymaster 10.0.0.17 6379
2552:X 26 May 2022 10:38:11.787 * +slave slave 10.0.0.58:6379 10.0.0.58 6379 @ mymaster 10.0.0.17 6379
2552:X 26 May 2022 10:38:14.864 # +sdown slave 10.0.0.58:6379 10.0.0.58 6379 @ mymaster 10.0.0.17 6379

3.8故障转以后的redis配合文件会被自动修改

[root@centos7 ~]# grep ^replicaof /apps/redis/etc/redis.conf 
replicaof 10.0.0.17 6379

#哨兵配合文件的sentinel monitor IP同样也会被修改
[root@centos7 ~]# grep "^sentinel monitor" /apps/redis/etc/sentinel.conf
sentinel monitor mymaster 10.0.0.17 6379 2

3.9当前redis状态

#新的master状态
[root@centos7 ~]# redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=10.0.0.27,port=6379,state=online,offset=398647,lag=1
master_failover_state:no-failover
master_replid:48e6470690742969691ceb965ca8c247c9268a91
master_replid2:1e6cb22b9e44690d4c6730744d24b83c0026b070
master_repl_offset:398647
second_repl_offset:261201
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:398647

#另外一个slave指向新的master
[root@centos7 ~]# redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:10.0.0.17
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:408160
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:48e6470690742969691ceb965ca8c247c9268a91
master_replid2:1e6cb22b9e44690d4c6730744d24b83c0026b070
master_repl_offset:408160
second_repl_offset:261201
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:408160

3.10恢复故障的原master重新加入redis集群

#恢复最早的故障master节点
[root@centos8 ~]# systemctl start redis

#sentinel会自动修改下面指向新的master
[root@centos8 ~]# grep '^replicaof' /apps/redis/etc/redis.conf 
replicaof 10.0.0.17 6379

#在最早的master爽查看状态
[root@centos8 ~]# redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command lineterface may not be safe.
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:10.0.0.17
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:471008
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:48e6470690742969691ceb965ca8c247c9268a91
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:471008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:440438
repl_backlog_histlen:30571

[root@centos8 ~]# redis-cli -p 26379
127.0.0.1:26379> info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=10.0.0.17:6379,slaves=2,sentinels=3

#观察最新master上状态的日志
[root@centos7 ~]# redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=10.0.0.27,port=6379,state=online,offset=498322,lag=1
slave1:ip=10.0.0.58,port=6379,state=online,offset=498469,lag=0
master_failover_state:no-failover
master_replid:48e6470690742969691ceb965ca8c247c9268a91
master_replid2:1e6cb22b9e44690d4c6730744d24b83c0026b070
master_repl_offset:498602
second_repl_offset:261201
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:498602

[root@centos7 ~]# tail -f /apps/redis/log/sentinel.log 
7792:X 26 May 2022 10:53:38.562 # -sdown slave 10.0.0.58:6379 10.058 6379 @ mymaster 10.0.0.17 6379

3.11sentinel运维

手动让主节点下线
sentinel failover 

#手工故障转移
[root@centos8 ~]# vim /apps/redis/etc/redis.conf 
replica-priority 10	#指定优先级,值越小会优先成为新master

#手工让主节点下线后,会转移到优先级数值小的节点上去
[root@centos7 ~]# redis-cli -p 26379
127.0.0.1:26379> sentinel failover mymaster
OK

3.12python连接sentinel

 

[root@centos8 ~]# yum -y install python3 python3-redis
[root@centos8 ~]# cat sentinel_test.py 
#!/usr/bin/python3
import redis
from redis.sentinel import Sentinel

#连接哨兵服务器(主机名也可以用域名)
sentinel = Sentinel([('10.0.0.58', 26379),
                     ('10.0.0.17', 26379),
                     ('10.0.0.27', 26379)
             ],
                    socket_timeout=0.5)

redis_auth_pass='123456'

#mymaster 是运维人员配置哨兵模式的数据库名称,实际名称按照个人部署案例来填写
#获取主服务器地址
master = sentinel.discover_master('mymaster')
print(master)


#获取从服务器地址
slave = sentinel.discover_slaves('mymaster')
print(slave)



#获取主服务器进行写入
master = sentinel.master_for('mymaster', socket_timeout=0.5, password=redis_auth_pass, db=0)
w_ret = master.set('name', 'wang')
#输出:True


#获取从服务器进行读取(默认是round-roubin)
slave = sentinel.slave_for('mymaster', socket_timeout=0.5, password=redis_auth_pass, db=0)
r_ret = slave.get('name')
print(r_ret)
#输出:wang

[root@centos8 ~]# chmod +x sentinel_test.py 
[root@centos8 ~]# ./sentinel_test.py 
('10.0.0.58', 6379)
[('10.0.0.17', 6379), ('10.0.0.27', 6379)]
b'wang'

你可能感兴趣的:(redis,数据库,缓存)