首先要清楚,sentinel是一个独立于redis之外的进程,不对外提供key/value服务。在redis的安装目录下名称叫redis-sentinel。主要用来监控redis-server进程,进行master/slave管理,如果你的redis没有运行在master/slave模式下,不需要设置sentinel。
两个基本概念
- S_DOWN:subjectively down,直接翻译的为"主观"失效,即当前sentinel实例认为某个redis服务为"不可用"状态.
- O_DOWN:objectively down,直接翻译为"客观"失效,即多个sentinel实例都认为master处于"SDOWN"状态,那么此时master将处于ODOWN,ODOWN可以简单理解为master已经被集群确定为"不可用",将会开启failover.
redis2.8.7的选举有两个条件,首先是要下面的条件过滤掉一些节点
一、使用如下条件筛选备选node:
1、slave节点状态处于S_DOWN,O_DOWN,DISCONNECTED的除外
2、最近一次ping应答时间不超过5倍ping的间隔(假如ping的间隔为1秒,则最近一次应答延迟不应超过5秒,redis sentinel默认为1秒)
3、info_refresh应答不超过3倍info_refresh的间隔(原理同2,redis sentinel默认为10秒)
4、slave节点与master节点失去联系的时间不能超过( (now - master->s_down_since_time) + (master->down_after_period * 10))。总体意思是说,slave节点与master同步太不及时的(比如新启动的节点),不应该参与被选举。
5、Slave priority不等于0(这个是在配置文件中指定,默认配置为100)。
二、从备选node中,按照如下顺序选择新的master
1、较低的slave_priority(这个是在配置文件中指定,默认配置为100)
2、较大的replication offset(每个slave在与master同步后offset自动增加)
3、较小的runid(每个redis实例,都会有一个runid,通常是一个40位的随机字符串,在redis启动时设置,重复概率非常小)
4、如果以上条件都不足以区别出唯一的节点,则会看哪个slave节点处理之前master发送的command多,就选谁。
附原英文:
Select a suitable slave to promote. The current algorithm only uses
the following parameters:
1) None of the following conditions: S_DOWN, O_DOWN, DISCONNECTED.
2) Last time the slave replied to ping no more than 5 times the PING period.
3) info_refresh not older than 3 times the INFO refresh period.
4) master_link_down_time no more than:
(now - master->s_down_since_time) + (master->down_after_period * 10).
Basically since the master is down from our POV, the slave reports
to be disconnected no more than 10 times the configured down-after-period.
This is pretty much black magic but the idea is, the master was not
available so the slave may be lagging, but not over a certain time.
Anyway we'll select the best slave according to replication offset.
5) Slave priority can't be zero, otherwise the slave is discarded.
Among all the slaves matching the above conditions we select the slave
with, in order of sorting key:
- lower slave_priority
- bigger processed replication offset.
- lexicographically smaller runid.
Basically if runid is the same, the slave that processed more commands
from the master is selected.
runid
Redis "Run ID", a SHA1-sized random number that identifies a
given execution of Redis, so that if you are talking with an instance
having run_id == A, and you reconnect and it has run_id == B, you can be
sure that it is either a different instance or it was restarte
util.c
void getRandomHexChars(char *p, unsigned int len);