哨兵(sentinel) 是一个分布式系统,用于对主从结构中的每台服务器进行监控,当出现故障时通过投票机制选择新的 master 并将所有 slave 连接到新的 master。
作用:
注意:
①哨兵也是一台redis服务器,只是不提供数据服务。
②通常哨兵配置数量为单数。
回到目录…
redis-sentinel sentinel-端口号.conf
默认的 sentinel.conf 文件内容:
# 哨兵的端口
port 26379
dir /tmp
# 设置哨兵监听的主服务器信息 <自定义服务名称> <主机地址> <端口> <决定master下线的票数,一般设为哨兵数的一半+1>
sentinel monitor mymaster 127.0.0.1 6379 2
# 判定服务器挂掉的时间周期,默认30秒(30000ms)
sentinel down-after-milliseconds mymaster 30000
# 新的master上任时, 指定同时进行主从同步的slave数量。数值越小,要求网络资源越少,同步时间越长
sentinel parallel-syncs mymaster 1
# 指定出现故障后,故障切换的最大超时时间,超过该值,认定切换失败,默认3分钟
sentinel failover-timeout mymaster 180000
回到目录…
主服务器-6381日志: 6381作为 master,而 6382 和 6383 作为 slave 连接上 6381
[root@VM-4-12-centos conf]# redis-server redis-6381.conf
.......
# 日志:6382 和 6383 的从服务器连接到 主服务器6381上
1755322:M 23 Feb 22:36:48.948 * Ready to accept connections
1755322:M 23 Feb 22:37:25.611 * Slave 127.0.0.1:6382 asks for synchronization
1755322:M 23 Feb 22:37:25.612 * Full resync requested by slave 127.0.0.1:6382
1755322:M 23 Feb 22:37:25.612 * Starting BGSAVE for SYNC with target: disk
1755322:M 23 Feb 22:37:25.612 * Background saving started by pid 1755536
1755536:C 23 Feb 22:37:25.618 * DB saved on disk
1755536:C 23 Feb 22:37:25.618 * RDB: 4 MB of memory used by copy-on-write
1755322:M 23 Feb 22:37:25.714 * Background saving terminated with success
1755322:M 23 Feb 22:37:25.714 * Synchronization with slave 127.0.0.1:6382 succeeded
1755322:M 23 Feb 22:37:59.455 * Slave 127.0.0.1:6383 asks for synchronization
1755322:M 23 Feb 22:37:59.455 * Partial resynchronization request from 127.0.0.1:6383 accepted. Sending 56 bytes of backlog starting from offset 1.
①哨兵-26380日志:
[root@VM-4-12-centos conf]# redis-sentinel sentinel-26380.conf
......
1756633:X 23 Feb 22:41:55.941 # Sentinel ID is ba74e1a8288f1b54b5c494d18875dc684df679c6
1756633:X 23 Feb 22:41:55.941 # +monitor master mymaster 127.0.0.1 6381 quorum 2
1756633:X 23 Feb 22:41:55.941 * +slave slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster 127.0.0.1 6381
1756633:X 23 Feb 22:41:55.946 * +slave slave 127.0.0.1:6383 127.0.0.1 6383 @ mymaster 127.0.0.1 6381
1756633:X 23 Feb 22:43:36.835 * +sentinel sentinel e554990de305b149bb5b92556e7012dd584c5988 127.0.0.1 26381 @ mymaster 127.0.0.1 6381
1756633:X 23 Feb 22:44:40.125 * +sentinel sentinel 93f61f3036a6d5d1a90524ea9b526d9f0c5ce025 127.0.0.1 26382 @ mymaster 127.0.0.1 6381
②哨兵-26381日志:
[root@VM-4-12-centos conf]# redis-sentinel sentinel-26381.conf
......
1757082:X 23 Feb 22:43:34.795 # Sentinel ID is e554990de305b149bb5b92556e7012dd584c5988
1757082:X 23 Feb 22:43:34.795 # +monitor master mymaster 127.0.0.1 6381 quorum 2
1757082:X 23 Feb 22:43:34.797 * +slave slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster 127.0.0.1 6381
1757082:X 23 Feb 22:43:34.800 * +slave slave 127.0.0.1:6383 127.0.0.1 6383 @ mymaster 127.0.0.1 6381
1757082:X 23 Feb 22:43:35.961 * +sentinel sentinel ba74e1a8288f1b54b5c494d18875dc684df679c6 127.0.0.1 26380 @ mymaster 127.0.0.1 6381
1757082:X 23 Feb 22:44:40.125 * +sentinel sentinel 93f61f3036a6d5d1a90524ea9b526d9f0c5ce025 127.0.0.1 26382 @ mymaster 127.0.0.1 6381
③哨兵-26382日志:
[root@VM-4-12-centos conf]# redis-sentinel sentinel-26382.conf
.......
1757359:X 23 Feb 22:44:38.088 # Sentinel ID is 93f61f3036a6d5d1a90524ea9b526d9f0c5ce025
1757359:X 23 Feb 22:44:38.088 # +monitor master mymaster 127.0.0.1 6381 quorum 2
1757359:X 23 Feb 22:44:38.088 * +slave slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster 127.0.0.1 6381
1757359:X 23 Feb 22:44:38.092 * +slave slave 127.0.0.1:6383 127.0.0.1 6383 @ mymaster 127.0.0.1 6381
1757359:X 23 Feb 22:44:38.585 * +sentinel sentinel e554990de305b149bb5b92556e7012dd584c5988 127.0.0.1 26381 @ mymaster 127.0.0.1 6381
1757359:X 23 Feb 22:44:39.239 * +sentinel sentinel ba74e1a8288f1b54b5c494d18875dc684df679c6 127.0.0.1 26380 @ mymaster 127.0.0.1 6381
回到目录…
①哨兵-26380日志: 监听对象 master 切换为 6383
......
1756633:X 23 Feb 22:51:42.786 # +sdown master mymaster 127.0.0.1 6381
1756633:X 23 Feb 22:51:42.863 # +odown master mymaster 127.0.0.1 6381 #quorum 2/2
1756633:X 23 Feb 22:51:42.863 # +new-epoch 1
1756633:X 23 Feb 22:51:42.863 # +try-failover master mymaster 127.0.0.1 6381
1756633:X 23 Feb 22:51:42.866 # +vote-for-leader ba74e1a8288f1b54b5c494d18875dc684df679c6 1
1756633:X 23 Feb 22:51:42.866 # e554990de305b149bb5b92556e7012dd584c5988 voted for e554990de305b149bb5b92556e7012dd584c5988 1
1756633:X 23 Feb 22:51:42.870 # 93f61f3036a6d5d1a90524ea9b526d9f0c5ce025 voted for e554990de305b149bb5b92556e7012dd584c5988 1
1756633:X 23 Feb 22:51:43.658 # +config-update-from sentinel e554990de305b149bb5b92556e7012dd584c5988 127.0.0.1 26381 @ mymaster 127.0.0.1 6381
1756633:X 23 Feb 22:51:43.658 # +switch-master mymaster 127.0.0.1 6381 127.0.0.1 6383
1756633:X 23 Feb 22:51:43.658 * +slave slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster 127.0.0.1 6383
1756633:X 23 Feb 22:51:43.658 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6383
1756633:X 23 Feb 22:52:13.748 # +sdown slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6383
②哨兵-26381日志: 监听对象 master 切换为 6383
......
# 6381 主观下线
1757082:X 23 Feb 22:51:42.792 # +sdown master mymaster 127.0.0.1 6381
# 6381 客观下线
1757082:X 23 Feb 22:51:42.859 # +odown master mymaster 127.0.0.1 6381 #quorum 3/2
# 发起第一轮投票
1757082:X 23 Feb 22:51:42.859 # +new-epoch 1
1757082:X 23 Feb 22:51:42.859 # +try-failover master mymaster 127.0.0.1 6381
1757082:X 23 Feb 22:51:42.863 # +vote-for-leader e554990de305b149bb5b92556e7012dd584c5988 1
1757082:X 23 Feb 22:51:42.866 # ba74e1a8288f1b54b5c494d18875dc684df679c6 voted for ba74e1a8288f1b54b5c494d18875dc684df679c6 1
1757082:X 23 Feb 22:51:42.870 # 93f61f3036a6d5d1a90524ea9b526d9f0c5ce025 voted for e554990de305b149bb5b92556e7012dd584c5988 1
1757082:X 23 Feb 22:51:42.925 # +elected-leader master mymaster 127.0.0.1 6381
1757082:X 23 Feb 22:51:42.925 # +failover-state-select-slave master mymaster 127.0.0.1 6381
1757082:X 23 Feb 22:51:43.008 # +selected-slave slave 127.0.0.1:6383 127.0.0.1 6383 @ mymaster 127.0.0.1 6381
1757082:X 23 Feb 22:51:43.008 * +failover-state-send-slaveof-noone slave 127.0.0.1:6383 127.0.0.1 6383 @ mymaster 127.0.0.1 6381
1757082:X 23 Feb 22:51:43.079 * +failover-state-wait-promotion slave 127.0.0.1:6383 127.0.0.1 6383 @ mymaster 127.0.0.1 6381
1757082:X 23 Feb 22:51:43.606 # +promoted-slave slave 127.0.0.1:6383 127.0.0.1 6383 @ mymaster 127.0.0.1 6381
1757082:X 23 Feb 22:51:43.606 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6381
1757082:X 23 Feb 22:51:43.657 * +slave-reconf-sent slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster 127.0.0.1 6381
# 投票结束,让 6381 下线,并修改所有从节点的配置
1757082:X 23 Feb 22:51:43.948 # -odown master mymaster 127.0.0.1 6381
1757082:X 23 Feb 22:51:44.610 * +slave-reconf-inprog slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster 127.0.0.1 6381
1757082:X 23 Feb 22:51:44.610 * +slave-reconf-done slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster 127.0.0.1 6381
1757082:X 23 Feb 22:51:44.701 # +failover-end master mymaster 127.0.0.1 6381
# 委任新的 master 为 6383,并让其它从节点重新连接
1757082:X 23 Feb 22:51:44.701 # +switch-master mymaster 127.0.0.1 6381 127.0.0.1 6383
1757082:X 23 Feb 22:51:44.701 * +slave slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster 127.0.0.1 6383
1757082:X 23 Feb 22:51:44.701 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6383
1757082:X 23 Feb 22:52:14.794 # +sdown slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6383
③哨兵-26382日志: 监听对象 master 切换为 6383
......
1757359:X 23 Feb 22:51:42.759 # +sdown master mymaster 127.0.0.1 6381
1757359:X 23 Feb 22:51:42.866 # +new-epoch 1
1757359:X 23 Feb 22:51:42.870 # +vote-for-leader e554990de305b149bb5b92556e7012dd584c5988 1
1757359:X 23 Feb 22:51:43.657 # +config-update-from sentinel e554990de305b149bb5b92556e7012dd584c5988 127.0.0.1 26381 @ mymaster 127.0.0.1 6381
1757359:X 23 Feb 22:51:43.657 # +switch-master mymaster 127.0.0.1 6381 127.0.0.1 6383
1757359:X 23 Feb 22:51:43.657 * +slave slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster 127.0.0.1 6383
1757359:X 23 Feb 22:51:43.657 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6383
1757359:X 23 Feb 22:52:13.730 # +sdown slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6383
回到目录…
简单来说,一个 sentinel 会向 master、slaves、其它sentinels 获取状态。而 sentinels 之间会组建一个频道,用于将各自收集到的信息进行分享、收集、同步。
回到目录…
该阶段随时会有 sentinel 向 master 和 slaves 发送 hello 信息,已确认对方是否正常运行。再将这些状态及时分享到 sentinels 的信息圈中传播信息,以此长期维护信息对等。
回到目录…
第一步,确认是否故障:
①当有一个 sentinel 发现 master 故障时,就将它标记为 SRI_S_DOWN
,并且发布到内网中。
②此时其它收到指令后,分别去和 master 发消息确认是否宕机,如果确认宕机则也给SRI_S_DOWN
标记。
③当内网中超过一半的 sentinel 给了疑似宕机标记时,就确认了 master 真的宕机了,并给它切换成 SRI_O_DOWN
标记。
第二步,sentinel 争夺处置权:
每个 sentinel 将自己的竞选信息发送到内网中,每个人既是竞争者又是投票者,将票投给自己最先收到的竞选者信息的 sentinel。最终获得票数超过总数的一半时,就确认了处置者。否则,重新投票。
第三步,开始处置,挑选新的 master:
回到目录…
总结:
提示:这里对文章进行总结:
本文是对Redis哨兵的学习,认识了什么是哨兵?如何搭建哨兵模式?以及哨兵工作的三个阶段:监控、通知和故障转移。之后的学习内容将持续更新!!!