我要精通C++

【Redis-6.0.8】哨兵源码解析上

0.阅读引用

menwen-哨兵序

menwen-哨兵上

menwen-哨兵下

Muten写的参照哨兵部分

《Redis设计与实现》哨兵部分

《Redis5设计与源码分析》哨兵部分

动画演示RAFT

一篇很好的RAFT解读文章

Redis集群分析（27）

本文解析了hiredis中的异步连接

1.复习一下

1.1 配置文件

/监控一个名称为mymaster的Redis Master服务，地址和端口号为127.0.0.1:6379, quorum为2
sentinel monitor mymaster 127.0.0.16379 2
//如果哨兵60s内未收到mymaster的有效ping回复，则认为mymaster处于down的状态
sentinel down-after-milliseconds mymaster 60000
sentinel failover-timeout mymaster 180000//执行切换的超时
时间为180s
//切换完成后同时向新的Redis Master发起同步数据请求的Redis Slave个 数为1，即切换完成后依次让
每个Slave去同步数据，前一个Slave同步完成后下一个Slave才发起同步 数据的请求
sentinel parallel-syncs mymaster 1
//监控一个名称为resque的Redis Master服务，地址和端口号为127.0.0.1:6380，quorum为4
sentinel monitor resque 192.168.1.36380 4
 
 
sentinel down-after-milliseconds mymaster 10000
sentinel failover-timeout mymaster 180000
sentinel parallel-syncs mymaster 5
 
 
 
quorum在哨兵中有两层含义：
第一层含义为：如果某个哨兵认为其监听的Master处于下线的状态,这个状态在Redis中标记为S_DOWN,即
主观下线。假设quorum配置为2，则当有两个哨兵同时认为一个Master处于下线的状态时，会标记该Master
为O_DOWN，即客观下线. 只有一个Master处于客观下线状态时才会开始执行切换。
 
第二层含义为：假设有5个哨兵，quorum配置为4. 首先, 判断客观下线需要4个哨兵才能认定. 其次,当开始
执行切换时，会从5个哨兵中选择一个leader执行该次选举，此时一个哨兵也必须得到4票才能被选举为
leader，而不是3票（即哨兵的大多数）。

1.2 哨兵的启动模式

哨兵可以直接使用redis-server命令启动，如下：

./redis-server ../sentinel.conf --sentinel
或者
./redis-sentinel ../sentinel.conf


注意：哨兵的配置文件必须要有写权限


问题：如果我想要定制我的持久化策略什么的？如何在哨兵模式下实现？
      现在看这个启动方式好像并没有指定啊.
      其实没有关系，就是三个正常的redis-server进程和三个一sentilnel模型运行的进程
                   或者是三个redis-server进程，三个redis-sentinel进程.傻啦

2.源码分析

2.1 相关源码路径

E:\004-代码\007-redis\redis-6.0.8.tar\redis-6.0.8\redis-6.0.8\src\server.h
E:\004-代码\007-redis\redis-6.0.8.tar\redis-6.0.8\redis-6.0.8\src\server.c
E:\004-代码\007-redis\redis-6.0.8.tar\redis-6.0.8\redis-6.0.8\src\sentinel.c

2.2 sentinelcmds

struct redisCommand sentinelcmds[] = {
    {"ping",pingCommand,1,"",0,NULL,0,0,0,0,0},
    {"sentinel",sentinelCommand,-2,"",0,NULL,0,0,0,0,0},
    {"subscribe",subscribeCommand,-2,"",0,NULL,0,0,0,0,0},
    {"unsubscribe",unsubscribeCommand,-1,"",0,NULL,0,0,0,0,0},
    {"psubscribe",psubscribeCommand,-2,"",0,NULL,0,0,0,0,0},
    {"punsubscribe",punsubscribeCommand,-1,"",0,NULL,0,0,0,0,0},
    {"publish",sentinelPublishCommand,3,"",0,NULL,0,0,0,0,0},
    {"info",sentinelInfoCommand,-1,"",0,NULL,0,0,0,0,0},
    {"role",sentinelRoleCommand,1,"ok-loading",0,NULL,0,0,0,0,0},
    {"client",clientCommand,-2,"read-only no-script",0,NULL,0,0,0,0,0},
    {"shutdown",shutdownCommand,-1,"",0,NULL,0,0,0,0,0},
    {"auth",authCommand,2,"no-auth no-script ok-loading ok-stale fast",0,NULL,0,0,0,0,0},
    {"hello",helloCommand,-2,"no-auth no-script fast",0,NULL,0,0,0,0,0}
};

可以看到，哨兵中只可以执行有限的几种命令。这里主要介绍哨兵中独有的命令：sentinel。类似其他命令的执行流程，该命令会调用sentinelCommand函数。接下来，我们详细介绍该命令的几种重点形式。
1）sentinel masters：返回该哨兵监控的所有Master的相关信息。
2）SENTINEL MASTER ：返回指定名称Master的相关信息。
3）SENTINEL SLAVES ：返回指定名称Master的所有Slave的相关信息。
4）SENTINEL SENTINELS ：返回指定名称Master的所有哨兵的相关信息。
5）SENTINEL IS-MASTER-DOWN-BY-ADDR   ：如果runid是*，返回由IP和Port指定的Master是否处于主观下线状态。如果runid是某个哨兵的ID，则同时会要求对该runid进行选举投票。
6）SENTINEL RESET ：重置所有该哨兵监控的匹配模式（pattern）的Masters（刷新状态，重新建立各类连接）。
7）SENTINEL GET-MASTER-ADDR-BY-NAME ：返回指定名称的Master对应的IP和Port。
8）SENTINEL FAILOVER ：对指定的Mmaster手动强制执行一次切换。
9）SENTINEL MONITOR    ：指定该哨兵监听一个Master。
10）SENTINEL flushconfig：将配置文件刷新到磁盘。
11）SENTINEL REMOVE ：从监控中去除掉指定名称的Master。
12）SENTINEL CKQUORUM ：根据可用哨兵数量，计算哨兵可用数量是否满足配置数量（认定客观下线的数量）；
13）SENTINEL SET  [

2.3 主程序启动流程

2.3.1 主流程的脉络

当以
./redis-server ../sentinel.conf --sentinel
命令启动起服务,就是以哨兵的模式启动一个redis-server服务.


【Redis-6.0.8】
// 主流程中对sentinel做的工作只是进行初始化
int main(int argc, char **argv) {
if (server.sentinel_mode) {
            role_char = 'X'; /* Sentinel. */
        }
...
 
/* 检测是否以sentinel模式启动, 有两种启动模式 */
server.sentinel_mode = checkForSentinelMode(argc,argv); // line 5141
 
....
if (server.sentinel_mode) {
        initSentinelConfig();//初始化哨兵的配置，设置监听端口和保护模式
        initSentinel();// 初始化哨兵,line 5160
    }
....
...
// sentinelHandleConfiguration作用是解析配置文件并初始化
loadServerConfig(configfile,options);// line 5233 main->loadServerConfig->loadServerConfigFromString->sentinelHandleConfiguration
...
 if (!server.sentinel_mode) {
        // 在不是哨兵模式下，会载入AOF文件和RDB文件，打印内存警告，集群模式载入数据等等操作。
    } else { 
      InitServerLast(); // 哨兵模式下也要是多线程的
      sentinelIsRunning();//line 5295，随机生成一个40字节的哨兵ID，打印日志
    }
...
}

以上过程可以大致分为四步：

检查是否开启哨兵模式；

初始化哨兵的配置和哨兵；

解析配置文件并初始化；

进行服务的最后初始化，准备开始哨兵的工作.

2.3.2 checkForSentinelMode

/* Returns 1 if there is --sentinel among the arguments or if
 * argv[0] contains "redis-sentinel". */
int checkForSentinelMode(int argc, char **argv) {
    int j;

    if (strstr(argv[0],"redis-sentinel") != NULL) return 1;
    for (j = 1; j < argc; j++)
        if (!strcmp(argv[j],"--sentinel")) return 1;
    return 0;
}

2.3.3 initSentinelConfig

配置文件中的protected-mode解释：
# Protected mode is a layer of security protection, in order to avoid that
# Redis instances left open on the internet are accessed and exploited.
#
# When protected mode is on and if:
#
# 1) The server is not binding explicitly to a set of addresses using the
#    "bind" directive.
# 2) No password is configured.
#
# The server only accepts connections from clients connecting from the
# IPv4 and IPv6 loopback addresses 127.0.0.1 and ::1, and from Unix domain
# sockets.
#
# By default protected mode is enabled. You should disable it only if
# you are sure you want clients from other hosts to connect to Redis
# even if no authentication is configured, nor a specific set of interfaces
# are explicitly listed using the "bind" directive.
protected-mode yes

sentinel.c中关于REDIS_SENTINEL_PORT的定义：

#define REDIS_SENTINEL_PORT 26379
struct redisServer {
 ...
 int protected_mode;         /* Don't accept external connections. */
 ...
}

/* This function overwrites a few normal Redis config default with Sentinel
 * specific defaults. */
void initSentinelConfig(void) {
    server.port = REDIS_SENTINEL_PORT;/*  设置Sentinel的默认端口, 覆盖服务器的默认属性 */
    server.protected_mode = 0; /* Sentinel must be exposed. 哨兵必须暴露出来 */
}

2.3.4 initSentinel

1.首先看下sentinel的定义
/* Main state. */
struct sentinelState {
    /* 当前哨兵的id*/
    char myid[CONFIG_RUN_ID_SIZE+1]; /* This sentinel ID. */
    /* 当前纪元 */
    uint64_t current_epoch;         /* Current epoch. */
    /*
      当前哨兵节点监控的主节点字典,字典的键是主节点实例的名字,字典的值是一个指针,指向一个
      sentinelRedisInstance类型的结构,可监控多个master
    */
    dict *masters;      /* Dictionary of master sentinelRedisInstances.
                           Key is the instance name, value is the
                           sentinelRedisInstance structure pointer. */
    int tilt;           /* Are we in TILT mode? */
    /* 当前正在执行的脚本的数量 */
    int running_scripts;    /* Number of scripts in execution right now. */
    mstime_t tilt_start_time;       /* When TITL started. */
    mstime_t previous_time;         /* Last time we ran the time handler. */
    /* 保存要执行用户脚本的队列 */
    list *scripts_queue;            /* Queue of user scripts to execute. */
    char *announce_ip;  /* IP addr that is gossiped to other sentinels if
                           not NULL. */
    int announce_port;  /* Port that is gossiped to other sentinels if
                           non zero. */
    unsigned long simfailure_flags; /* Failures simulation. */
    int deny_scripts_reconfig; /* Allow SENTINEL SET ... to change script
                                  paths at runtime? */
} sentinel;

2.看看initSentinel的实现
/* Perform the Sentinel mode initialization. */
void initSentinel(void) {
    unsigned int j;
    
    /* 将命令列表中的命令都清空,然后只添加哨兵模式下的命令 */
    /* Remove usual Redis commands from the command table, then just add
     * the SENTINEL command. */
    dictEmpty(server.commands,NULL);
    for (j = 0; j < sizeof(sentinelcmds)/sizeof(sentinelcmds[0]); j++) {
        int retval;
        struct redisCommand *cmd = sentinelcmds+j;

        retval = dictAdd(server.commands, sdsnew(cmd->name), cmd);
        serverAssert(retval == DICT_OK);

        /* Translate the command string flags description into an actual
         * set of flags. */
        if (populateCommandTableParseFlags(cmd,cmd->sflags) == C_ERR)
            serverPanic("Unsupported command flag");
    }

    /* Initialize various data structures. */
    /* 初始化各种哨兵模式下的数据结构 */
    /* 当前纪元的初始化 */
    sentinel.current_epoch = 0;
    /* 监控的主节点信息的字典初始化 */
    sentinel.masters = dictCreate(&instancesDictType,NULL);
    /* TILT模式是否开启初始化 */
    sentinel.tilt = 0;
    /* TILT模式的开始时间初始化 */
    sentinel.tilt_start_time = 0;
    /* 最后执行时间处理程序的时间初始化 */
    sentinel.previous_time = mstime();
    /* 正在执行的脚本数量初始化 */
    sentinel.running_scripts = 0;
    /* 用户脚本的队列初始化 */
    sentinel.scripts_queue = listCreate();
    /* 主服务器的ip和port初始化(Sentinel通过gossip协议接收) */
    sentinel.announce_ip = NULL;
    sentinel.announce_port = 0;
    /* 故障模拟标志初始化 */
    sentinel.simfailure_flags = SENTINEL_SIMFAILURE_NONE;
    /* 是否允许运行时修改脚本路径标志的初始化*/
    sentinel.deny_scripts_reconfig = SENTINEL_DEFAULT_DENY_SCRIPTS_RECONFIG;
    /* myid的初始化*/
    memset(sentinel.myid,0,sizeof(sentinel.myid));
}

2.3.5 loadServerConfig

2.3.5.1 loadServerConfig的实现

/* Load the server configuration from the specified filename.
 * The function appends the additional configuration directives stored
 * in the 'options' string to the config file before loading.
 *
 * Both filename and options can be NULL, in such a case are considered
 * empty. This way loadServerConfig can be used to just load a file or
 * just load a string. */
void loadServerConfig(char *filename, char *options) {
    sds config = sdsempty();
    char buf[CONFIG_MAX_LINE+1];

    /* Load the file content */
    if (filename) {
        FILE *fp;

        if (filename[0] == '-' && filename[1] == '\0') {
            fp = stdin;
        } else {
            if ((fp = fopen(filename,"r")) == NULL) {
                serverLog(LL_WARNING,
                    "Fatal error, can't open config file '%s': %s",
                    filename, strerror(errno));
                exit(1);
            }
        }
        while(fgets(buf,CONFIG_MAX_LINE+1,fp) != NULL)
            config = sdscat(config,buf);
        if (fp != stdin) fclose(fp);
    }
    /* Append the additional options */
    if (options) {
        config = sdscat(config,"\n");
        config = sdscat(config,options);
    }
    loadServerConfigFromString(config); // 关键步骤
    sdsfree(config);
}

2.3.5.2 loadServerConfig的实现

void loadServerConfigFromString(char *config) {
 ...
 for (i = 0; i < totlines; i++) {
 ...
 else if (!strcasecmp(argv[0],"sentinel")) {
            /* argc == 1 is handled by main() as we need to enter the sentinel
             * mode ASAP. */
            if (argc != 1) {
                if (!server.sentinel_mode) {
                    err = "sentinel directive while not in sentinel mode";
                    goto loaderr;
                }
                err = sentinelHandleConfiguration(argv+1,argc-1);
                if (err) goto loaderr;
            }
        }
 }
 ...
}

2.3.5.3 loadServerConfig的实现


3.sentinelHandleConfiguration的实现
/* ============================ Config handling ============================= */
char *sentinelHandleConfiguration(char **argv, int argc) {
    sentinelRedisInstance *ri;

    if (!strcasecmp(argv[0],"monitor") && argc == 5) {
        /* monitor     */
        int quorum = atoi(argv[4]);

        if (quorum <= 0) return "Quorum must be 1 or greater.";
        if (createSentinelRedisInstance(argv[1],SRI_MASTER,argv[2],
                                        atoi(argv[3]),quorum,NULL) == NULL)
        {
            switch(errno) {
            case EBUSY: return "Duplicated master name.";
            case ENOENT: return "Can't resolve master instance hostname.";
            case EINVAL: return "Invalid port number";
            }
        }
    } else if (!strcasecmp(argv[0],"down-after-milliseconds") && argc == 3) {
        /* down-after-milliseconds   */
        ri = sentinelGetMasterByName(argv[1]);
        if (!ri) return "No such master with specified name.";
        ri->down_after_period = atoi(argv[2]);
        if (ri->down_after_period <= 0)
            return "negative or zero time parameter.";
        sentinelPropagateDownAfterPeriod(ri);
    } else if (!strcasecmp(argv[0],"failover-timeout") && argc == 3) {
        /* failover-timeout   */
        ri = sentinelGetMasterByName(argv[1]);
        if (!ri) return "No such master with specified name.";
        ri->failover_timeout = atoi(argv[2]);
        if (ri->failover_timeout <= 0)
            return "negative or zero time parameter.";
    } else if (!strcasecmp(argv[0],"parallel-syncs") && argc == 3) {
        /* parallel-syncs   */
        ri = sentinelGetMasterByName(argv[1]);
        if (!ri) return "No such master with specified name.";
        ri->parallel_syncs = atoi(argv[2]);
    } else if (!strcasecmp(argv[0],"notification-script") && argc == 3) {
        /* notification-script   */
        ri = sentinelGetMasterByName(argv[1]);
        if (!ri) return "No such master with specified name.";
        if (access(argv[2],X_OK) == -1)
            return "Notification script seems non existing or non executable.";
        ri->notification_script = sdsnew(argv[2]);
    } else if (!strcasecmp(argv[0],"client-reconfig-script") && argc == 3) {
        /* client-reconfig-script   */
        ri = sentinelGetMasterByName(argv[1]);
        if (!ri) return "No such master with specified name.";
        if (access(argv[2],X_OK) == -1)
            return "Client reconfiguration script seems non existing or "
                   "non executable.";
        ri->client_reconfig_script = sdsnew(argv[2]);
    } else if (!strcasecmp(argv[0],"auth-pass") && argc == 3) {
        /* auth-pass   */
        ri = sentinelGetMasterByName(argv[1]);
        if (!ri) return "No such master with specified name.";
        ri->auth_pass = sdsnew(argv[2]);
    } else if (!strcasecmp(argv[0],"auth-user") && argc == 3) {
        /* auth-user   */
        ri = sentinelGetMasterByName(argv[1]);
        if (!ri) return "No such master with specified name.";
        ri->auth_user = sdsnew(argv[2]);
    } else if (!strcasecmp(argv[0],"current-epoch") && argc == 2) {
        /* current-epoch  */
        unsigned long long current_epoch = strtoull(argv[1],NULL,10);
        if (current_epoch > sentinel.current_epoch)
            sentinel.current_epoch = current_epoch;
    } else if (!strcasecmp(argv[0],"myid") && argc == 2) {
        if (strlen(argv[1]) != CONFIG_RUN_ID_SIZE)
            return "Malformed Sentinel id in myid option.";
        memcpy(sentinel.myid,argv[1],CONFIG_RUN_ID_SIZE);
    } else if (!strcasecmp(argv[0],"config-epoch") && argc == 3) {
        /* config-epoch   */
        ri = sentinelGetMasterByName(argv[1]);
        if (!ri) return "No such master with specified name.";
        ri->config_epoch = strtoull(argv[2],NULL,10);
        /* The following update of current_epoch is not really useful as
         * now the current epoch is persisted on the config file, but
         * we leave this check here for redundancy. */
        if (ri->config_epoch > sentinel.current_epoch)
            sentinel.current_epoch = ri->config_epoch;
    } else if (!strcasecmp(argv[0],"leader-epoch") && argc == 3) {
        /* leader-epoch   */
        ri = sentinelGetMasterByName(argv[1]);
        if (!ri) return "No such master with specified name.";
        ri->leader_epoch = strtoull(argv[2],NULL,10);
    } else if ((!strcasecmp(argv[0],"known-slave") ||
                !strcasecmp(argv[0],"known-replica")) && argc == 4)
    {
        sentinelRedisInstance *slave;

        /* known-replica    */
        ri = sentinelGetMasterByName(argv[1]);
        if (!ri) return "No such master with specified name.";
        if ((slave = createSentinelRedisInstance(NULL,SRI_SLAVE,argv[2],
                    atoi(argv[3]), ri->quorum, ri)) == NULL)
        {
            return "Wrong hostname or port for replica.";
        }
    } else if (!strcasecmp(argv[0],"known-sentinel") &&
               (argc == 4 || argc == 5)) {
        sentinelRedisInstance *si;

        if (argc == 5) { /* Ignore the old form without runid. */
            /* known-sentinel    [runid] */
            ri = sentinelGetMasterByName(argv[1]);
            if (!ri) return "No such master with specified name.";
            if ((si = createSentinelRedisInstance(argv[4],SRI_SENTINEL,argv[2],
                        atoi(argv[3]), ri->quorum, ri)) == NULL)
            {
                return "Wrong hostname or port for sentinel.";
            }
            si->runid = sdsnew(argv[4]);
            sentinelTryConnectionSharing(si);
        }
    } else if (!strcasecmp(argv[0],"rename-command") && argc == 4) {
        /* rename-command    */
        ri = sentinelGetMasterByName(argv[1]);
        if (!ri) return "No such master with specified name.";
        sds oldcmd = sdsnew(argv[2]);
        sds newcmd = sdsnew(argv[3]);
        if (dictAdd(ri->renamed_commands,oldcmd,newcmd) != DICT_OK) {
            sdsfree(oldcmd);
            sdsfree(newcmd);
            return "Same command renamed multiple times with rename-command.";
        }
    } else if (!strcasecmp(argv[0],"announce-ip") && argc == 2) {
        /* announce-ip  */
        if (strlen(argv[1]))
            sentinel.announce_ip = sdsnew(argv[1]);
    } else if (!strcasecmp(argv[0],"announce-port") && argc == 2) {
        /* announce-port  */
        sentinel.announce_port = atoi(argv[1]);
    } else if (!strcasecmp(argv[0],"deny-scripts-reconfig") && argc == 2) {
        /* deny-scripts-reconfig  */
        if ((sentinel.deny_scripts_reconfig = yesnotoi(argv[1])) == -1) {
            return "Please specify yes or no for the "
                   "deny-scripts-reconfig options.";
        }
    } else {
        return "Unrecognized sentinel configuration statement.";
    }
    return NULL;
}

其实sentinelRedisInstance做的事情就是将配置文件中解析出来的参数都加入到 sentinelRedisInstance *类型的变量ri中，所以接下来我们需要认真看一下sentinelRedisInstance这个结构体是如何定义的.

2.3.5.4 sentinelRedisInstance结构解析

typedef struct sentinelRedisInstance {
    /* 标识值,记录了当前Redis实例的类型和状态 */
    int flags;      /* See SRI_... defines */
    /* master实例的名字,主节点的名字由用户在配置文件中设置 */
    char *name;     /* Master name from the point of view of this sentinel. */
    /* 当前实例的运行id或者是一个哨兵的唯一id */
    char *runid;    /* Run ID of this instance, or unique ID if is a Sentinel.*/
    /* 配置纪元,用于实现故障转移*/
    uint64_t config_epoch;  /* Configuration epoch. */
    /* 实例地址：ip和port */
    sentinelAddr *addr; /* Master host. */
    /* 实例的连接,有可能是被Sentinel共享的 */
    instanceLink *link; /* Link to the instance, may be shared for Sentinels. */
    /* 最近一次通过 Pub/Sub 发送信息的时间 */
    mstime_t last_pub_time;   /* Last time we sent hello via Pub/Sub. */
    /* 最近一次接收到从Sentinel发送来hello的时间,仅仅当有SRI_SENTINEL属性时才有效 */
    mstime_t last_hello_time; /* Only used if SRI_SENTINEL is set. Last time
                                 we received a hello from this Sentinel
                                 via Pub/Sub. */
    /* 最近一次回复【SENTINEL is-master-down】这个命令的时间*/
    mstime_t last_master_down_reply_time; /* Time of last reply to
                                             SENTINEL is-master-down command. */
    /* 实例被判断为主观下线的时间 */
    mstime_t s_down_since_time; /* Subjectively down since time. */
    /* 实例被判断为客观下线的时间 */
    mstime_t o_down_since_time; /* Objectively down since time. */
    /* 实例无响应多少毫秒之后被判断为主观下线,由SENTINEL down-after-millisenconds配置设定 */
    mstime_t down_after_period; /* Consider it down after that period. */
    /* 从实例获取INFO命令回复的时间 */
    mstime_t info_refresh;  /* Time at which we received INFO output from it. */
    /* 被重命名之后的命令集合 */
    dict *renamed_commands;     /* Commands renamed in this instance:
                                   Sentinel will use the alternative commands
                                   mapped on this table to send things like
                                   SLAVEOF, CONFING, INFO, ... */

    /* Role and the first time we observed it.
     * This is useful in order to delay replacing what the instance reports
     * with our own configuration. We need to always wait some time in order
     * to give a chance to the leader to report the new configuration before
     * we do silly things. */
    /* 实例的角色 */
    int role_reported;
    /* 角色更新的时间 */
    mstime_t role_reported_time;
    /* 最近一次从节点的主节点地址变更的时间 */
    mstime_t slave_conf_change_time; /* Last time slave master addr changed. */
     
    /* Master specific. */
  /*----------------------------------主节点特有的属性----------------------------------*/

    /* 其他监控相同主节点的Sentinel */
    dict *sentinels;    /* Other sentinels monitoring the same master. */
    /* 
       如果当前实例是主节点,那么slaves保存着该主节点的所有从节点实例,
       键是从节点命令，值是从节点服务器对应的sentinelRedisInstance
    */
    dict *slaves;       /* Slaves for this master instance. */
    /* 
       判定该主节点客观下线的投票数,
       是【SENTINEL monitor    】配置中的
    */
    unsigned int quorum;/* Number of sentinels that need to agree on failure. */
    /*
       在故障转移时,可以同时对新的主节点进行同步的从节点数量, 
       由【sentinel parallel-syncs  】配置中的
    */
    int parallel_syncs; /* How many slaves to reconfigure at same time. */
    /* 连接主节点和从节点的认证密码 */
    char *auth_pass;    /* Password to use for AUTH against master & replica. */
    /* 连接主节点和从节点的用于ACLs验证的用户名 */
    char *auth_user;    /* Username for ACLs AUTH against master & replica. */

    /* Slave specific. */
  /*----------------------------------从节点特有的属性----------------------------------*/
    /* 从节点复制操作断开时间 */
    mstime_t master_link_down_time; /* Slave replication link down time. */
    /* 按照INFO命令输出的从节点优先级 */
    int slave_priority; /* Slave priority according to its INFO output. */
    /* 故障转移时，从节点发送【SLAVEOF 】命令的时间 */
    mstime_t slave_reconf_sent_time; /* Time at which we sent SLAVE OF  */
    /* 如果当前实例是从节点, master保存该从节点连接的主节点实例 */
    struct sentinelRedisInstance *master; /* Master instance if it's slave. */
    /* INFO命令的回复中记录的主节点的IP */
    char *slave_master_host;    /* Master host as reported by INFO */
    /* INFO命令的回复中记录的主节点的port */
    int slave_master_port;      /* Master port as reported by INFO */
    /* INFO命令的回复中记录的主从服务器连接的状态 */
    int slave_master_link_status; /* Master link status as reported by INFO */
    /* 从节点复制偏移量 */
    unsigned long long slave_repl_offset; /* Slave replication offset. */
   
    /* Failover */
    /*----------------------------------故障转移的属性----------------------------------*/
    /*
     如果这是一个主节点实例，那么leader保存的是执行故障转移的Sentinel的runid,
     如果这是一个Sentinel实例，那么leader保存的是当前这个Sentinel实例选举出来的领头的runid 
    */
    char *leader;       /* If this is a master instance, this is the runid of
                           the Sentinel that should perform the failover. If
                           this is a Sentinel, this is the runid of the Sentinel
                           that this Sentinel voted as leader. */
    /* leader字段的纪元 */
    uint64_t leader_epoch; /* Epoch of the 'leader' field. */
    /* 当前执行故障转移的纪元 */
    uint64_t failover_epoch; /* Epoch of the currently started failover. */
    /* 故障转移操作的状态 */
    int failover_state; /* See SENTINEL_FAILOVER_STATE_* defines. */
    /* 故障转移操作状态改变的时间 */
    mstime_t failover_state_change_time;
    /* 最近一次故障转移尝试开始的时间 */
    mstime_t failover_start_time;   /* Last failover attempt start time. */
    /* 更新故障转移状态的最大超时时间 */
    mstime_t failover_timeout;      /* Max time to refresh failover state. */
    /* 记录故障转移延迟的时间 */
    mstime_t failover_delay_logged; /* For what failover_start_time value we
                                       logged the failover delay. */
    /* 晋升为新主节点的从节点实例 */
    struct sentinelRedisInstance *promoted_slave; /* Promoted slave instance. */
    /* Scripts executed to notify admin or reconfigure clients: when they
     * are set to NULL no script is executed. */
    /* 通知admin的可执行脚本的地址，如果设置为空，则没有执行的脚本 */
    char *notification_script;
    /* 客户端重新配置的可执行脚本的地址，如果设置为空，则没有执行的脚本 */
    char *client_reconfig_script;
    /* 缓存INFO命令的输出 */
    sds info; /* cached INFO output */
} sentinelRedisInstance;

2.3.5.5 Sentinel Redis Instance

/* A Sentinel Redis Instance object is monitoring. */
#define SRI_MASTER  (1<<0)
#define SRI_SLAVE   (1<<1)
#define SRI_SENTINEL (1<<2)
#define SRI_S_DOWN (1<<3)   /* Subjectively down (no quorum). */
#define SRI_O_DOWN (1<<4)   /* Objectively down (confirmed by others). */
#define SRI_MASTER_DOWN (1<<5) /* A Sentinel with this flag set thinks that
                                   its master is down. */
#define SRI_FAILOVER_IN_PROGRESS (1<<6) /* Failover is in progress for
                                           this master. */
#define SRI_PROMOTED (1<<7)            /* Slave selected for promotion. */
#define SRI_RECONF_SENT (1<<8)     /* SLAVEOF  sent. */
#define SRI_RECONF_INPROG (1<<9)   /* Slave synchronization in progress. */
#define SRI_RECONF_DONE (1<<10)     /* Slave synchronized with new master. */
#define SRI_FORCE_FAILOVER (1<<11)  /* Force failover with master up. */
#define SRI_SCRIPT_KILL_SENT (1<<12) /* SCRIPT KILL already sent on -BUSY */

2.3.5.6 createSentinelRedisInstance#loadServerConfig

/* ========================== sentinelRedisInstance ========================= */

/* Create a redis instance, the following fields must be populated by the
 * caller if needed:
 * runid: set to NULL but will be populated once INFO output is received.
 * info_refresh: is set to 0 to mean that we never received INFO so far.
 *
 * If SRI_MASTER is set into initial flags the instance is added to
 * sentinel.masters table.
 *
 * if SRI_SLAVE or SRI_SENTINEL is set then 'master' must be not NULL and the
 * instance is added into master->slaves or master->sentinels table.
 *
 * If the instance is a slave or sentinel, the name parameter is ignored and
 * is created automatically as hostname:port.
 *
 * The function fails if hostname can't be resolved or port is out of range.
 * When this happens NULL is returned and errno is set accordingly to the
 * createSentinelAddr() function.
 *
 * The function may also fail and return NULL with errno set to EBUSY if
 * a master with the same name, a slave with the same address, or a sentinel
 * with the same ID already exists. */

sentinelRedisInstance *createSentinelRedisInstance(char *name, int flags, char *hostname, int port, int quorum, sentinelRedisInstance *master) {
    sentinelRedisInstance *ri;
    sentinelAddr *addr;
    dict *table = NULL;
    char slavename[NET_PEER_ID_LEN], *sdsname;

    serverAssert(flags & (SRI_MASTER|SRI_SLAVE|SRI_SENTINEL));
    serverAssert((flags & SRI_MASTER) || master != NULL);

    /* Check address validity. */
    addr = createSentinelAddr(hostname,port);
    if (addr == NULL) return NULL;

    /* For slaves use ip:port as name. */
    if (flags & SRI_SLAVE) {
        anetFormatAddr(slavename, sizeof(slavename), hostname, port);
        name = slavename;
    }

    /* Make sure the entry is not duplicated. This may happen when the same
     * name for a master is used multiple times inside the configuration or
     * if we try to add multiple times a slave or sentinel with same ip/port
     * to a master. */
    if (flags & SRI_MASTER) table = sentinel.masters;
    else if (flags & SRI_SLAVE) table = master->slaves;
    else if (flags & SRI_SENTINEL) table = master->sentinels;
    sdsname = sdsnew(name);
    if (dictFind(table,sdsname)) {
        releaseSentinelAddr(addr);
        sdsfree(sdsname);
        errno = EBUSY;
        return NULL;
    }

    /* Create the instance object. */
    ri = zmalloc(sizeof(*ri));
    /* Note that all the instances are started in the disconnected state,
     * the event loop will take care of connecting them. */
    ri->flags = flags;
    ri->name = sdsname;
    ri->runid = NULL;
    ri->config_epoch = 0;
    ri->addr = addr;
    ri->link = createInstanceLink();
    ri->last_pub_time = mstime();
    ri->last_hello_time = mstime();
    ri->last_master_down_reply_time = mstime();
    ri->s_down_since_time = 0;
    ri->o_down_since_time = 0;
    ri->down_after_period = master ? master->down_after_period :
                            SENTINEL_DEFAULT_DOWN_AFTER;
    ri->master_link_down_time = 0;
    ri->auth_pass = NULL;
    ri->auth_user = NULL;
    ri->slave_priority = SENTINEL_DEFAULT_SLAVE_PRIORITY;
    ri->slave_reconf_sent_time = 0;
    ri->slave_master_host = NULL;
    ri->slave_master_port = 0;
    ri->slave_master_link_status = SENTINEL_MASTER_LINK_STATUS_DOWN;
    ri->slave_repl_offset = 0;
    ri->sentinels = dictCreate(&instancesDictType,NULL);
    ri->quorum = quorum;
    ri->parallel_syncs = SENTINEL_DEFAULT_PARALLEL_SYNCS;
    ri->master = master;
    ri->slaves = dictCreate(&instancesDictType,NULL);
    ri->info_refresh = 0;
    ri->renamed_commands = dictCreate(&renamedCommandsDictType,NULL);

    /* Failover state. */
    ri->leader = NULL;
    ri->leader_epoch = 0;
    ri->failover_epoch = 0;
    ri->failover_state = SENTINEL_FAILOVER_STATE_NONE;
    ri->failover_state_change_time = 0;
    ri->failover_start_time = 0;
    ri->failover_timeout = SENTINEL_DEFAULT_FAILOVER_TIMEOUT;
    ri->failover_delay_logged = 0;
    ri->promoted_slave = NULL;
    ri->notification_script = NULL;
    ri->client_reconfig_script = NULL;
    ri->info = NULL;

    /* Role */
    ri->role_reported = ri->flags & (SRI_MASTER|SRI_SLAVE);
    ri->role_reported_time = mstime();
    ri->slave_conf_change_time = mstime();

    /* Add into the right table. */
    dictAdd(table, ri->name, ri);
    return ri;
}

2.3.5.7 sentinelGetMasterByName#loadServerConfig

/* Master lookup by name */
/* 通过名字来寻找主节点 */
sentinelRedisInstance *sentinelGetMasterByName(char *name) {
    sentinelRedisInstance *ri;
    sds sdsname = sdsnew(name);

    ri = dictFetchValue(sentinel.masters,sdsname);
    sdsfree(sdsname);
    return ri;
}

2.3.6 InitServerLast

见 Redis线程模型分析

2.3.7 sentinelIsRunning

/* This function gets called when the server is in Sentinel mode, started,
 * loaded the configuration, and is ready for normal operations. */
void sentinelIsRunning(void) {
    int j;
   
    if (server.configfile == NULL) {
        serverLog(LL_WARNING,
            "Sentinel started without a config file. Exiting...");
        exit(1);
    } else if (access(server.configfile,W_OK) == -1) {
        serverLog(LL_WARNING,
            "Sentinel config file %s is not writable: %s. Exiting...",
            server.configfile,strerror(errno));
        exit(1);
    }

    /* If this Sentinel has yet no ID set in the configuration file, we
     * pick a random one and persist the config on disk. From now on this
     * will be this Sentinel ID across restarts. */
    for (j = 0; j < CONFIG_RUN_ID_SIZE; j++)
        if (sentinel.myid[j] != 0) break;

    if (j == CONFIG_RUN_ID_SIZE) {
        /* Pick ID and persist the config. */
        getRandomHexChars(sentinel.myid,CONFIG_RUN_ID_SIZE);
        sentinelFlushConfig();
    }

    /* Log its ID to make debugging of issues simpler. */
    serverLog(LL_WARNING,"Sentinel ID is %s", sentinel.myid);

    /* We want to generate a +monitor event for every configured master
     * at startup. */
    sentinelGenerateInitialMonitorEvents();
}

函数sentinelIsRunning很简单，主要做的事情就是：

检查配置文件是否可写(因为后面的流程中可能有对配置文件修改的需求)；

为没有runid的哨兵节点分配 ID，并持久化到配置文件中；

日志提示中给出Sentinel ID的信息；

生成一个+monitor事件通知.

2.4 定时任务中哨兵相关流程

 
// 时间任务中的哨兵做的工作是哨兵主要的工作逻辑
int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
...
 /* Run the Sentinel timer if we are in sentinel mode. */
    if (server.sentinel_mode) sentinelTimer();
...
}

void sentinelTimer(void) {
    /* 查Sentinel是否需要进入tilt模式,更新最近一次执行Sentinel模式的周期函数的时间 */
    sentinelCheckTiltCondition();
    /* 对Sentinel监控的所有主节点进行递归式的执行周期性操作 */
    sentinelHandleDictOfRedisInstances(sentinel.masters);
    /* 运行在队列中等待的脚本 */
    sentinelRunPendingScripts();
    /* 清理已成功执行的脚本，重试执行错误的脚本 */
    sentinelCollectTerminatedScripts(); 
    /* 杀死执行超时的脚本，等到下个周期在sentinelCollectTerminatedScripts()函数中重试执行 */
    sentinelKillTimedoutScripts();

    /* We continuously change the frequency of the Redis "timer interrupt"
     * in order to desynchronize every Sentinel from every other.
     * This non-determinism avoids that Sentinels started at the same time
     * exactly continue to stay synchronized asking to be voted at the
     * same time again and again (resulting in nobody likely winning the
     * election because of split brain voting). */
    /*
      我们不断改变Redis定期任务的执行频率,以便使每个Sentinel节点都不同步,这种不确定性可以避免
      Sentinel在同一时间开始完全继续保持同步,当被要求进行投票时,一次又一次在同一时间进行投票,
      因为脑裂导致有可能没有胜选者
    */
    server.hz = CONFIG_DEFAULT_HZ + rand() % CONFIG_DEFAULT_HZ;
}

 
 
哨兵中每次执行serverCron时，都会调用sentinelTimer()函数。该函数会建立连接，并且定时发送心跳包
并采集信息。会在3.6.2中详细解说sentinelTimer的工作.

2.4.1 sentinelCheckTiltCondition-TILT 模式判断

/* This function checks if we need to enter the TITL mode.
 *
 * The TILT mode is entered if we detect that between two invocations of the
 * timer interrupt, a negative amount of time, or too much time has passed.
 * Note that we expect that more or less just 100 milliseconds will pass
 * if everything is fine. However we'll see a negative number or a
 * difference bigger than SENTINEL_TILT_TRIGGER milliseconds if one of the
 * following conditions happen:
 *
 * 1) The Sentiel process for some time is blocked, for every kind of
 * random reason: the load is huge, the computer was frozen for some time
 * in I/O or alike, the process was stopped by a signal. Everything.
 * 2) The system clock was altered significantly.
 *
 * Under both this conditions we'll see everything as timed out and failing
 * without good reasons. Instead we enter the TILT mode and wait
 * for SENTINEL_TILT_PERIOD to elapse before starting to act again.
 *
 * During TILT time we still collect information, we just do not act. */
void sentinelCheckTiltCondition(void) {
    mstime_t now = mstime();
    mstime_t delta = now - sentinel.previous_time;

    if (delta < 0 || delta > SENTINEL_TILT_TRIGGER) {
        sentinel.tilt = 1;
        sentinel.tilt_start_time = mstime();
        sentinelEvent(LL_WARNING,"+tilt",NULL,"#tilt mode entered");
    }
    sentinel.previous_time = mstime();
}

2.4.2 sentinelHandleDictOfRedisInstances-执行周期性任务

/* Perform scheduled operations for all the instances in the dictionary.
 * Recursively call the function against dictionaries of slaves. */
/*
 调用是sentinelHandleDictOfRedisInstances(sentinel.masters);
 传入的是sentinel.masters，处理每一个master,然后对每一个master的slaves也进行
 处理.
*/
void sentinelHandleDictOfRedisInstances(dict *instances) {
    dictIterator *di;
    dictEntry *de;
    sentinelRedisInstance *switch_to_promoted = NULL;

    /* There are a number of things we need to perform against every master. */
    /* 遍历字典中所有的实例 */
    di = dictGetIterator(instances);
    while((de = dictNext(di)) != NULL) {
        sentinelRedisInstance *ri = dictGetVal(de);
        /* 对指定的ri实例执行周期性操作 */
        sentinelHandleRedisInstance(ri);
        /* 如果ri实例是主节点 */
        if (ri->flags & SRI_MASTER) {
            /* 递归的对主节点从属的从节点执行周期性操作 */
            sentinelHandleDictOfRedisInstances(ri->slaves);
            /* 递归的对监控主节点的Sentinel节点执行周期性操作 */
            sentinelHandleDictOfRedisInstances(ri->sentinels);
            /* 如果ri实例处于完成故障转移操作的状态，所有从节点已经完成对新主节点的同步 */
            if (ri->failover_state == SENTINEL_FAILOVER_STATE_UPDATE_CONFIG) {
                /* 设置主从转换的标识 */
                switch_to_promoted = ri;
            }
        }
    }
    /* 如果主从节点发生了转换 */
    if (switch_to_promoted)
    /* 将原来的主节点从主节点表中删除,并用晋升的主节点替代,
       意味着已经用新晋升的主节点代替旧的主节点,包括所有从节点和旧的主节点从属当前新的主节点
     */
        sentinelFailoverSwitchToPromotedSlave(switch_to_promoted);
    dictReleaseIterator(di);
}

2.4.2.1 sentinelHandleRedisInstance-处理Redis实例

/* Perform scheduled operations for the specified Redis instance. */
void sentinelHandleRedisInstance(sentinelRedisInstance *ri) {
    /* ========== MONITORING HALF ============ */
    /* ========== 一半监控操作 ============ */
    
    /* Every kind of instance */
    /* 对所有的类型的实例进行操作 */
    
    /* 为Sentinel和ri实例创建一个网络连接，包括cc和pc */
    sentinelReconnectInstance(ri);
    /* 定期发送PING、PONG、PUBLISH命令到ri实例中 */
    sentinelSendPeriodicCommands(ri);

    /* ============== ACTING HALF ============= */
    /* ============== 一半故障检测 ============= */
    /* We don't proceed with the acting half if we are in TILT mode.
     * TILT happens when we find something odd with the time, like a
     * sudden change in the clock. */
    /*
      如果Sentinel处于TILT模式，则不进行故障检测
    */
    if (sentinel.tilt) {
        /* 如果TILT模式的时间没到,则不执行后面的动作,直接返回 */
        if (mstime()-sentinel.tilt_start_time < SENTINEL_TILT_PERIOD) return;
        /* 如果TILT模式时间已经到了,取消TILT模式的标识 */
        sentinel.tilt = 0;
        sentinelEvent(LL_WARNING,"-tilt",NULL,"#tilt mode exited");
    }

    /* Every kind of instance */
    /* 对于各种实例进行是否下线的检测，是否处于主观下线状态 */
    sentinelCheckSubjectivelyDown(ri);

    /* Masters and slaves */
    /* 目前对主节点和从节点的实例什么都不做 */
    if (ri->flags & (SRI_MASTER|SRI_SLAVE)) {
        /* Nothing so far. */
    }

    /* Only masters */
    /* 只对主节点进行操作 */
    if (ri->flags & SRI_MASTER) {
        /* 检查从节点是否客观下线 */
        sentinelCheckObjectivelyDown(ri);
        /* 如果处于客观下线状态,则进行故障转移的状态设置 */
        if (sentinelStartFailoverIfNeeded(ri))
            /* 
               强制向其他Sentinel节点发送【SENTINEL IS-MASTER-DOWN-BY-ADDR】给所有的
               Sentinel获取回复,尝试获得足够的票数,标记主节点为客观下线状态,触发故障转移
            */
            sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_ASK_FORCED);
        /* 执行故障转移操作 */
        sentinelFailoverStateMachine(ri);
        /* 节点ri没有处于客观下线的状态,那么也要尝试发送【SENTINEL IS-MASTER-DOWN-BY-ADDR】
           给所有的Sentinel获取回复,因为ri主节点如果有回复延迟等等状况,可以通过该命令,更新一
           些主节点状态
        */
        sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_NO_FLAGS);
    }
}

该函数将周期性的操作分为两个部分:一部分是对一个的实例进行监控的操作,另一部分是对该实例执行故障检
测.

2.4.2.1.1 sentinelReconnectInstance-重新建立连接

周期性操作执行的第一个函数就是sentinelReconnectInstance()函数，因为在载入配置的时候，我们将创建的主节点实例加入到sentinel.masters字典的时候，该主节点的连接是关闭的，所以第一件事就是为主节点和哨兵节点建立网络连接.

/* Create the async connections for the instance link if the link
 * is disconnected. Note that link->disconnected is true even if just
 * one of the two links (commands and pub/sub) is missing. */
void sentinelReconnectInstance(sentinelRedisInstance *ri) {
    /* 如果ri实例没有连接中断，则直接返回 */
    if (ri->link->disconnected == 0) return;
    /*  如果ri实例地址非法，则直接返回 */
    if (ri->addr->port == 0) return; /* port == 0 means invalid address. */
    /* 获取当前实例的连接 */
    instanceLink *link = ri->link;
    /* 获取当前时间 */
    mstime_t now = mstime();
    /* 
       如果当前时间距离上一次重连时间小于SENTINEL_PING_PERIOD,就直接返回
       #define SENTINEL_PING_PERIOD 1000 ---1000毫秒,1秒
    */
    if (now - ri->link->last_reconn_time < SENTINEL_PING_PERIOD) return;
    /* 重置最近重连的时间*/
    ri->link->last_reconn_time = now;

    /* Commands connection. */
    /* cc-commands connection,命令连接 */
    if (link->cc == NULL) {
        /* 绑定ri实例的连接地址并建立连接 */
        link->cc = redisAsyncConnectBind(ri->addr->ip,ri->addr->port,NET_FIRST_BIND_ADDR);
        /*
         link->cc->err为0代表【绑定ri实例的连接地址并建立连接】这一步没有错误, 
         !link->cc->err为非0也就代表【绑定ri实例的连接地址并建立连接】这一步没有错误.
         server.tls_replication为真代表server.tls_replication不为NULL,
         在前面两者为真的情况下instanceLinkNegotiateTLS返回C_ERR则表明TLS初始化失败
        */
        if (!link->cc->err && server.tls_replication &&
                (instanceLinkNegotiateTLS(link->cc) == C_ERR)) {
            sentinelEvent(LL_DEBUG,"-cmd-link-reconnection",ri,"%@ #Failed to initialize TLS");
            instanceLinkCloseConnection(link,link->cc);
        } 
        /* link->cc->err非0代表【绑定ri实例的连接地址并建立连接】这一步出现错误 */
        else if (link->cc->err) {
            sentinelEvent(LL_DEBUG,"-cmd-link-reconnection",ri,"%@ #%s",
                link->cc->errstr);
            instanceLinkCloseConnection(link,link->cc);
        }
        /* 其他场景 */ 
        else {
            /* 将当前redis连接的待执行命令数置为0 */
            link->pending_commands = 0;
            /* 设置当前redis命令连接的连接时间 */
            link->cc_conn_time = mstime();
            /* 将redis命令连接的data指向当前连接 */
            link->cc->data = link;
            /* 将当前redis命令事件与服务器的事件循环做关联 */
            redisAeAttach(server.el,link->cc);
            /* 设置确立连接的回调函数 */
            redisAsyncSetConnectCallback(link->cc,
                    sentinelLinkEstablishedCallback);
            /* 设置断开连接的回调函数 */
            redisAsyncSetDisconnectCallback(link->cc,
                    sentinelDisconnectCallback);
            /* 如果需要哨兵向master发送Auth命令*/
            sentinelSendAuthIfNeeded(ri,link->cc);
            /* 发送连接的名字 */
            sentinelSetClientName(ri,link->cc,"cmd");
            /* Send a PING ASAP when reconnecting. */
            /* 在重连的时候哨兵尽快发送PING消息 */
            sentinelSendPing(ri);
        }
    }
    /* Pub / Sub */
    /*pub/sub connection-pc, pub/sub连接,发布订阅连接 */
    /* 如果redis实例是master或者是slave并且发布订阅连接还没有建立 */
    if ((ri->flags & (SRI_MASTER|SRI_SLAVE)) && link->pc == NULL) {
        /* 绑定指定ri的连接地址并建立发布订阅连接 */
        link->pc = redisAsyncConnectBind(ri->addr->ip,ri->addr->port,NET_FIRST_BIND_ADDR);
         /* TLS初始化失败 */
        if (!link->pc->err && server.tls_replication &&
                (instanceLinkNegotiateTLS(link->pc) == C_ERR)) {
            sentinelEvent(LL_DEBUG,"-pubsub-link-reconnection",ri,"%@ #Failed to initialize TLS");
        } 
        /*【绑定ri实例的连接地址并建立连接】这一步出现错误 */
        else if (link->pc->err) {
            sentinelEvent(LL_DEBUG,"-pubsub-link-reconnection",ri,"%@ #%s",
                link->pc->errstr);
            instanceLinkCloseConnection(link,link->pc);
        }
        /* 其他情况 */ 
        else {
            int retval;

            link->pc_conn_time = mstime();
            link->pc->data = link;
            redisAeAttach(server.el,link->pc);
            redisAsyncSetConnectCallback(link->pc,
                    sentinelLinkEstablishedCallback);
            redisAsyncSetDisconnectCallback(link->pc,
                    sentinelDisconnectCallback);
            sentinelSendAuthIfNeeded(ri,link->pc);
            sentinelSetClientName(ri,link->pc,"pubsub");
            /* Now we subscribe to the Sentinels "Hello" channel. */
            /* 发送订阅 __sentinel__:hello 频道的命令,设置回调函数处理回复.
               sentinelReceiveHelloMessages是处理Pub/Sub的频道返回信息的回调函数,
               可以发现订阅同一master的Sentinel节点.
            */
            retval = redisAsyncCommand(link->pc,
                sentinelReceiveHelloMessages, ri, "%s %s",
                sentinelInstanceMapCommand(ri,"SUBSCRIBE"),
                SENTINEL_HELLO_CHANNEL);
            if (retval != C_OK) {
                /* If we can't subscribe, the Pub/Sub connection is useless
                 * and we can simply disconnect it and try again. */
                instanceLinkCloseConnection(link,link->pc);
                return;
            }
        }
    }
    /* Clear the disconnected status only if we have both the connections
     * (or just the commands connection if this is a sentinel instance). */
    /*
       所有角色的命令连接都建立了, 除了哨兵以外的所有发布订阅连接也建立了，
       则表示已经建立了连接，将link->disconnected置为0
     */
    if (link->cc && (ri->flags & SRI_SENTINEL || link->pc))
        link->disconnected = 0;
}

关于redisAsyncConnectBind的说明

建立连接的函数redisAsyncConnectBind()是Redis的官方C语言客户端hiredis的异步连接函数，当连接成功时需要调用redisAeAttach()函数来将服务器的事件循环(ae)与连接的上下文相关联起来（因为hiredis提供了多种适配器，包括事件ae，libev，libevent，libuv），在关联的时候，会设置了网络连接的可写可读事件的处理程序. 接下来还会设置该连接的确立时和断开时的回调函数redisAsyncSetConnectCallback()和redisAsyncSetDisconnectCallback()，为什么要这么做呢？因为该连接是异步的.

关于各个角色之间建立的连接说明

Sentinel在连接主服务器或者从服务器时，会同时创建命令连接和订阅连接，但是在连接其他Sentinel时，却只会创建命令连接，而不创建订阅连接. 这是因为Sentinel需要通过接收主服务器或者从服务器发来的频道信息来发现未知的新Sentinel，所以才需要建立订阅连接，而相互已知的Sentinel只要使用命令连接来进行通信就足够了.

建立命令连接之后执行了三个动作

执行sentinelSendAuthIfNeeded，它发送 AUTH 命令进行认证，在此函数中设置的回复处理的回调函数是sentinelDiscardReplyCallback，sentinelDiscardReplyCallback做的操作是丢弃回复并执行link->pending_commands--；

执行sentinelSetClientName，它发送 CLIENT SETNAME命令设置连接的名字，在此函数中设置的回复处理的回调函数也是sentinelDiscardReplyCallback，sentinelDiscardReplyCallback做的操作是丢弃回复并执行link->pending_commands--；

执行sentinelSendPing，它发送 PING 命令来判断连接状态，在此函数中设置的回复处理的回调函数是sentinelPingReplyCallback，sentinelPingReplyCallback函数根据回复的内容来更新一些连接交互时间等；

建立命令连接之后执行了三个动作

执行sentinelSendAuthIfNeeded，它发送 AUTH 命令进行认证，在此函数中设置的回复处理的回调函数是sentinelDiscardReplyCallback，sentinelDiscardReplyCallback做的操作是丢弃回复并执行link->pending_commands--；

执行sentinelSetClientName，它发送 CLIENT SETNAME命令设置连接的名字，在此函数中设置的回复处理的回调函数也是sentinelDiscardReplyCallback，sentinelDiscardReplyCallback做的操作是丢弃回复并执行link->pending_commands--；

执行redisAsyncCommand，它发送 SUBSCRIBE 命令来判订阅__sentinel__:hello 频道的事件提醒，在此函数中设置的回复处理的回调函数是sentinelReceiveHelloMessages，sentinelReceiveHelloMessages函数根据实例（主节点或从节点）发送过来的hello信息，来获取其他哨兵节点或主节点的从节点信息.

关于连接建立是否成功的说明

如果成功建立连接，之后会清除连接断开的标志，以表示连接已建立；如果不是第一次执行，那么会判断连接是否建立，如果断开，则重新给建立，如果没有断开，那么什么都不会做直接返回.

2.4.2.1.1.1 instanceLink结构体


/* The link to a sentinelRedisInstance. When we have the same set of Sentinels
 * monitoring many masters, we have different instances representing the
 * same Sentinels, one per master, and we need to share the hiredis connections
 * among them. Oherwise if 5 Sentinels are monitoring 100 masters we create
 * 500 outgoing connections instead of 5.
 *
 * So this structure represents a reference counted link in terms of the two
 * hiredis connections for commands and Pub/Sub, and the fields needed for
 * failure detection, since the ping/pong time are now local to the link: if
 * the link is available, the instance is avaialbe. This way we don't just
 * have 5 connections instead of 500, we also send 5 pings instead of 500.
 *
 * Links are shared only for Sentinels: master and slave instances have
 * a link with refcount = 1, always. */
typedef struct instanceLink {
    int refcount;          /* Number of sentinelRedisInstance owners. */
    int disconnected;      /* Non-zero if we need to reconnect cc or pc. */
    int pending_commands;  /* Number of commands sent waiting for a reply. */
    redisAsyncContext *cc; /* Hiredis context for commands. */
    redisAsyncContext *pc; /* Hiredis context for Pub / Sub. */
    mstime_t cc_conn_time; /* cc connection time. */
    mstime_t pc_conn_time; /* pc connection time. */
    mstime_t pc_last_activity; /* Last time we received any message. */
    mstime_t last_avail_time; /* Last time the instance replied to ping with
                                 a reply we consider valid. */
    mstime_t act_ping_time;   /* Time at which the last pending ping (no pong
                                 received after it) was sent. This field is
                                 set to 0 when a pong is received, and set again
                                 to the current time if the value is 0 and a new
                                 ping is sent. */
    mstime_t last_ping_time;  /* Time at which we sent the last ping. This is
                                 only used to avoid sending too many pings
                                 during failure. Idle time is computed using
                                 the act_ping_time field. */
    mstime_t last_pong_time;  /* Last time the instance replied to ping,
                                 whatever the reply was. That's used to check
                                 if the link is idle and must be reconnected. */
    mstime_t last_reconn_time;  /* Last reconnection attempt performed when
                                   the link was down. */
} instanceLink;

2.4.2.1.1.2 redisAsyncContext结构体

/* Context for an async connection to Redis */
typedef struct redisAsyncContext {
    /* Hold the regular context, so it can be realloc'ed. */
    redisContext c;

    /* Setup error flags so they can be used directly. */
    int err;
    char *errstr;

    /* Not used by hiredis */
    void *data;

    /* Event library data and hooks */
    struct {
        void *data;

        /* Hooks that are called when the library expects to start
         * reading/writing. These functions should be idempotent. */
        void (*addRead)(void *privdata);
        void (*delRead)(void *privdata);
        void (*addWrite)(void *privdata);
        void (*delWrite)(void *privdata);
        void (*cleanup)(void *privdata);
        void (*scheduleTimer)(void *privdata, struct timeval tv);
    } ev;

    /* Called when either the connection is terminated due to an error or per
     * user request. The status is set accordingly (REDIS_OK, REDIS_ERR). */
    redisDisconnectCallback *onDisconnect;

    /* Called when the first write event was received. */
    redisConnectCallback *onConnect;

    /* Regular command callbacks */
    redisCallbackList replies;

    /* Address used for connect() */
    struct sockaddr *saddr;
    size_t addrlen;

    /* Subscription callbacks */
    struct {
        redisCallbackList invalid;
        struct dict *channels;
        struct dict *patterns;
    } sub;
} redisAsyncContext;

2.4.2.1.1.3 redisContext结构体

/* Context for a connection to Redis */
typedef struct redisContext {
    const redisContextFuncs *funcs;   /* Function table */

    int err; /* Error flags, 0 when there is no error, 0代表无错误 */
    char errstr[128]; /* String representation of error when applicable */
    redisFD fd;
    int flags;
    char *obuf; /* Write buffer */
    redisReader *reader; /* Protocol reader */

    enum redisConnectionType connection_type;
    struct timeval *timeout;

    struct {
        char *host;
        char *source_addr;
        int port;
    } tcp;

    struct {
        char *path;
    } unix_sock;

    /* For non-blocking connect */
    struct sockadr *saddr;
    size_t addrlen;

    /* Additional private data for hiredis addons such as SSL */
    void *privdata;
} redisContext;

2.4.2.1.1.4 关联事件循环与redis异步连接的上下文

static int redisAeAttach(aeEventLoop *loop, redisAsyncContext *ac) {
    redisContext *c = &(ac->c);
    redisAeEvents *e;

    /* Nothing should be attached when something is already attached */
    if (ac->ev.data != NULL)
        return C_ERR;

    /* Create container for context and r/w events */
    e = (redisAeEvents*)zmalloc(sizeof(*e));
    e->context = ac;
    e->loop = loop;
    e->fd = c->fd;
    e->reading = e->writing = 0;

    /* Register functions to start/stop listening for events */
    ac->ev.addRead = redisAeAddRead;
    ac->ev.delRead = redisAeDelRead;
    ac->ev.addWrite = redisAeAddWrite;
    ac->ev.delWrite = redisAeDelWrite;
    ac->ev.cleanup = redisAeCleanup;
    ac->ev.data = e;

    return C_OK;
}

2.4.2.1.2 sentinelSendPeriodicCommands-发送监控命令

/* Send periodic PING, INFO, and PUBLISH to the Hello channel to
 * the specified master or slave instance. */
void sentinelSendPeriodicCommands(sentinelRedisInstance *ri) {
    mstime_t now = mstime();
    mstime_t info_period, ping_period;
    int retval;

    /* Return ASAP if we have already a PING or INFO already pending, or
     * in the case the instance is not properly connected. */
    if (ri->link->disconnected) return;

    /* For INFO, PING, PUBLISH that are not critical commands to send we
     * also have a limit of SENTINEL_MAX_PENDING_COMMANDS. We don't
     * want to use a lot of memory just because a link is not working
     * properly (note that anyway there is a redundant protection about this,
     * that is, the link will be disconnected and reconnected if a long
     * timeout condition is detected. */
    if (ri->link->pending_commands >=
        SENTINEL_MAX_PENDING_COMMANDS * ri->link->refcount) return;

    /* If this is a slave of a master in O_DOWN condition we start sending
     * it INFO every second, instead of the usual SENTINEL_INFO_PERIOD
     * period. In this state we want to closely monitor slaves in case they
     * are turned into masters by another Sentinel, or by the sysadmin.
     *
     * Similarly we monitor the INFO output more often if the slave reports
     * to be disconnected from the master, so that we can have a fresh
     * disconnection time figure. */
    if ((ri->flags & SRI_SLAVE) &&
        ((ri->master->flags & (SRI_O_DOWN|SRI_FAILOVER_IN_PROGRESS)) ||
         (ri->master_link_down_time != 0)))
    {
        info_period = 1000;
    } else {
        info_period = SENTINEL_INFO_PERIOD;
    }

    /* We ping instances every time the last received pong is older than
     * the configured 'down-after-milliseconds' time, but every second
     * anyway if 'down-after-milliseconds' is greater than 1 second. */
    ping_period = ri->down_after_period;
    if (ping_period > SENTINEL_PING_PERIOD) ping_period = SENTINEL_PING_PERIOD;

    /* Send INFO to masters and slaves, not sentinels. */
    if ((ri->flags & SRI_SENTINEL) == 0 &&
        (ri->info_refresh == 0 ||
        (now - ri->info_refresh) > info_period))
    {
        retval = redisAsyncCommand(ri->link->cc,
            sentinelInfoReplyCallback, ri, "%s",
            sentinelInstanceMapCommand(ri,"INFO"));
        if (retval == C_OK) ri->link->pending_commands++;
    }

    /* Send PING to all the three kinds of instances. */
    if ((now - ri->link->last_pong_time) > ping_period &&
               (now - ri->link->last_ping_time) > ping_period/2) {
        sentinelSendPing(ri);
    }

    /* PUBLISH hello messages to all the three kinds of instances. */
    if ((now - ri->last_pub_time) > SENTINEL_PUBLISH_PERIOD) {
        sentinelSendHello(ri);
    }
}

2.4.2.1.3 sentinelCheckSubjectivelyDown-判断节点的主观下线状态

/* ===================== SENTINEL availability checks ======================= */

/* Is this instance down from our point of view? */
void sentinelCheckSubjectivelyDown(sentinelRedisInstance *ri) {
    mstime_t elapsed = 0;

    if (ri->link->act_ping_time)
        elapsed = mstime() - ri->link->act_ping_time;
    else if (ri->link->disconnected)
        elapsed = mstime() - ri->link->last_avail_time;

    /* Check if we are in need for a reconnection of one of the
     * links, because we are detecting low activity.
     *
     * 1) Check if the command link seems connected, was connected not less
     *    than SENTINEL_MIN_LINK_RECONNECT_PERIOD, but still we have a
     *    pending ping for more than half the timeout. */
    if (ri->link->cc &&
        (mstime() - ri->link->cc_conn_time) >
        SENTINEL_MIN_LINK_RECONNECT_PERIOD &&
        ri->link->act_ping_time != 0 && /* There is a pending ping... */
        /* The pending ping is delayed, and we did not receive
         * error replies as well. */
        (mstime() - ri->link->act_ping_time) > (ri->down_after_period/2) &&
        (mstime() - ri->link->last_pong_time) > (ri->down_after_period/2))
    {
        instanceLinkCloseConnection(ri->link,ri->link->cc);
    }

    /* 2) Check if the pubsub link seems connected, was connected not less
     *    than SENTINEL_MIN_LINK_RECONNECT_PERIOD, but still we have no
     *    activity in the Pub/Sub channel for more than
     *    SENTINEL_PUBLISH_PERIOD * 3.
     */
    if (ri->link->pc &&
        (mstime() - ri->link->pc_conn_time) >
         SENTINEL_MIN_LINK_RECONNECT_PERIOD &&
        (mstime() - ri->link->pc_last_activity) > (SENTINEL_PUBLISH_PERIOD*3))
    {
        instanceLinkCloseConnection(ri->link,ri->link->pc);
    }

    /* Update the SDOWN flag. We believe the instance is SDOWN if:
     *
     * 1) It is not replying.
     * 2) We believe it is a master, it reports to be a slave for enough time
     *    to meet the down_after_period, plus enough time to get two times
     *    INFO report from the instance. */
    if (elapsed > ri->down_after_period ||
        (ri->flags & SRI_MASTER &&
         ri->role_reported == SRI_SLAVE &&
         mstime() - ri->role_reported_time >
          (ri->down_after_period+SENTINEL_INFO_PERIOD*2)))
    {
        /* Is subjectively down */
        if ((ri->flags & SRI_S_DOWN) == 0) {
            sentinelEvent(LL_WARNING,"+sdown",ri,"%@");
            ri->s_down_since_time = mstime();
            ri->flags |= SRI_S_DOWN;
        }
    } else {
        /* Is subjectively up */
        if (ri->flags & SRI_S_DOWN) {
            sentinelEvent(LL_WARNING,"-sdown",ri,"%@");
            ri->flags &= ~(SRI_S_DOWN|SRI_SCRIPT_KILL_SENT);
        }
    }
}

2.4.2.1.4 sentinelCheckObjectivelyDown-判断主节点的客观下线状态

/* Is this instance down according to the configured quorum?
 *
 * Note that ODOWN is a weak quorum, it only means that enough Sentinels
 * reported in a given time range that the instance was not reachable.
 * However messages can be delayed so there are no strong guarantees about
 * N instances agreeing at the same time about the down state. */
void sentinelCheckObjectivelyDown(sentinelRedisInstance *master) {
    dictIterator *di;
    dictEntry *de;
    unsigned int quorum = 0, odown = 0;

    if (master->flags & SRI_S_DOWN) {
        /* Is down for enough sentinels? */
        quorum = 1; /* the current sentinel. */
        /* Count all the other sentinels. */
        di = dictGetIterator(master->sentinels);
        while((de = dictNext(di)) != NULL) {
            sentinelRedisInstance *ri = dictGetVal(de);

            if (ri->flags & SRI_MASTER_DOWN) quorum++;
        }
        dictReleaseIterator(di);
        if (quorum >= master->quorum) odown = 1;
    }

    /* Set the flag accordingly to the outcome. */
    if (odown) {
        if ((master->flags & SRI_O_DOWN) == 0) {
            sentinelEvent(LL_WARNING,"+odown",master,"%@ #quorum %d/%d",
                quorum, master->quorum);
            master->flags |= SRI_O_DOWN;
            master->o_down_since_time = mstime();
        }
    } else {
        if (master->flags & SRI_O_DOWN) {
            sentinelEvent(LL_WARNING,"-odown",master,"%@");
            master->flags &= ~SRI_O_DOWN;
        }
    }
}

2.4.2.1.5 sentinelFailoverStateMachine-对主节点执行故障转移

void sentinelFailoverStateMachine(sentinelRedisInstance *ri) {
    serverAssert(ri->flags & SRI_MASTER);

    if (!(ri->flags & SRI_FAILOVER_IN_PROGRESS)) return;

    switch(ri->failover_state) {
        case SENTINEL_FAILOVER_STATE_WAIT_START:
            sentinelFailoverWaitStart(ri);
            break;
        case SENTINEL_FAILOVER_STATE_SELECT_SLAVE:
            sentinelFailoverSelectSlave(ri);
            break;
        case SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE:
            sentinelFailoverSendSlaveOfNoOne(ri);
            break;
        case SENTINEL_FAILOVER_STATE_WAIT_PROMOTION:
            sentinelFailoverWaitPromotion(ri);
            break;
        case SENTINEL_FAILOVER_STATE_RECONF_SLAVES:
            sentinelFailoverReconfNextSlave(ri);
            break;
    }
}

2.4.2.1.5.1 sentinelFailoverWaitStart-故障转移开始

/* ---------------- Failover state machine implementation ------------------- */
void sentinelFailoverWaitStart(sentinelRedisInstance *ri) {
    char *leader;
    int isleader;

    /* Check if we are the leader for the failover epoch. */
    leader = sentinelGetLeader(ri, ri->failover_epoch);
    isleader = leader && strcasecmp(leader,sentinel.myid) == 0;
    sdsfree(leader);

    /* If I'm not the leader, and it is not a forced failover via
     * SENTINEL FAILOVER, then I can't continue with the failover. */
    if (!isleader && !(ri->flags & SRI_FORCE_FAILOVER)) {
        int election_timeout = SENTINEL_ELECTION_TIMEOUT;

        /* The election timeout is the MIN between SENTINEL_ELECTION_TIMEOUT
         * and the configured failover timeout. */
        if (election_timeout > ri->failover_timeout)
            election_timeout = ri->failover_timeout;
        /* Abort the failover if I'm not the leader after some time. */
        if (mstime() - ri->failover_start_time > election_timeout) {
            sentinelEvent(LL_WARNING,"-failover-abort-not-elected",ri,"%@");
            sentinelAbortFailover(ri);
        }
        return;
    }
    sentinelEvent(LL_WARNING,"+elected-leader",ri,"%@");
    if (sentinel.simfailure_flags & SENTINEL_SIMFAILURE_CRASH_AFTER_ELECTION)
        sentinelSimFailureCrash();
    ri->failover_state = SENTINEL_FAILOVER_STATE_SELECT_SLAVE;
    ri->failover_state_change_time = mstime();
    sentinelEvent(LL_WARNING,"+failover-state-select-slave",ri,"%@");
}

2.4.2.1.5.2 sentinelFailoverSelectSlave-选择一个要晋升的从节点

void sentinelFailoverSelectSlave(sentinelRedisInstance *ri) {
    sentinelRedisInstance *slave = sentinelSelectSlave(ri);

    /* We don't handle the timeout in this state as the function aborts
     * the failover or go forward in the next state. */
    if (slave == NULL) {
        sentinelEvent(LL_WARNING,"-failover-abort-no-good-slave",ri,"%@");
        sentinelAbortFailover(ri);
    } else {
        sentinelEvent(LL_WARNING,"+selected-slave",slave,"%@");
        slave->flags |= SRI_PROMOTED;
        ri->promoted_slave = slave;
        ri->failover_state = SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE;
        ri->failover_state_change_time = mstime();
        sentinelEvent(LL_NOTICE,"+failover-state-send-slaveof-noone",
            slave, "%@");
    }
}

2.4.2.1.5.3 sentinelFailoverSendSlaveOfNoOne-使从节点变为主节点

void sentinelFailoverSendSlaveOfNoOne(sentinelRedisInstance *ri) {
    int retval;

    /* We can't send the command to the promoted slave if it is now
     * disconnected. Retry again and again with this state until the timeout
     * is reached, then abort the failover. */
    if (ri->promoted_slave->link->disconnected) {
        if (mstime() - ri->failover_state_change_time > ri->failover_timeout) {
            sentinelEvent(LL_WARNING,"-failover-abort-slave-timeout",ri,"%@");
            sentinelAbortFailover(ri);
        }
        return;
    }

    /* Send SLAVEOF NO ONE command to turn the slave into a master.
     * We actually register a generic callback for this command as we don't
     * really care about the reply. We check if it worked indirectly observing
     * if INFO returns a different role (master instead of slave). */
    retval = sentinelSendSlaveOf(ri->promoted_slave,NULL,0);
    if (retval != C_OK) return;
    sentinelEvent(LL_NOTICE, "+failover-state-wait-promotion",
        ri->promoted_slave,"%@");
    ri->failover_state = SENTINEL_FAILOVER_STATE_WAIT_PROMOTION;
    ri->failover_state_change_time = mstime();
}

2.4.2.1.5.4 sentinelFailoverReconfNextSlave-从节点同步新的主节点

/* Send SLAVE OF  to all the remaining slaves that
 * still don't appear to have the configuration updated. */
void sentinelFailoverReconfNextSlave(sentinelRedisInstance *master) {
    dictIterator *di;
    dictEntry *de;
    int in_progress = 0;

    di = dictGetIterator(master->slaves);
    while((de = dictNext(di)) != NULL) {
        sentinelRedisInstance *slave = dictGetVal(de);

        if (slave->flags & (SRI_RECONF_SENT|SRI_RECONF_INPROG))
            in_progress++;
    }
    dictReleaseIterator(di);

    di = dictGetIterator(master->slaves);
    while(in_progress < master->parallel_syncs &&
          (de = dictNext(di)) != NULL)
    {
        sentinelRedisInstance *slave = dictGetVal(de);
        int retval;

        /* Skip the promoted slave, and already configured slaves. */
        if (slave->flags & (SRI_PROMOTED|SRI_RECONF_DONE)) continue;

        /* If too much time elapsed without the slave moving forward to
         * the next state, consider it reconfigured even if it is not.
         * Sentinels will detect the slave as misconfigured and fix its
         * configuration later. */
        if ((slave->flags & SRI_RECONF_SENT) &&
            (mstime() - slave->slave_reconf_sent_time) >
            SENTINEL_SLAVE_RECONF_TIMEOUT)
        {
            sentinelEvent(LL_NOTICE,"-slave-reconf-sent-timeout",slave,"%@");
            slave->flags &= ~SRI_RECONF_SENT;
            slave->flags |= SRI_RECONF_DONE;
        }

        /* Nothing to do for instances that are disconnected or already
         * in RECONF_SENT state. */
        if (slave->flags & (SRI_RECONF_SENT|SRI_RECONF_INPROG)) continue;
        if (slave->link->disconnected) continue;

        /* Send SLAVEOF . */
        retval = sentinelSendSlaveOf(slave,
                master->promoted_slave->addr->ip,
                master->promoted_slave->addr->port);
        if (retval == C_OK) {
            slave->flags |= SRI_RECONF_SENT;
            slave->slave_reconf_sent_time = mstime();
            sentinelEvent(LL_NOTICE,"+slave-reconf-sent",slave,"%@");
            in_progress++;
        }
    }
    dictReleaseIterator(di);

    /* Check if all the slaves are reconfigured and handle timeout. */
    sentinelFailoverDetectEnd(master);
}

2.4.2.1.6 sentinelAskMasterStateToOtherSentinels-更新主节点的状态

/* If we think the master is down, we start sending
 * SENTINEL IS-MASTER-DOWN-BY-ADDR requests to other sentinels
 * in order to get the replies that allow to reach the quorum
 * needed to mark the master in ODOWN state and trigger a failover. */
#define SENTINEL_ASK_FORCED (1<<0)
void sentinelAskMasterStateToOtherSentinels(sentinelRedisInstance *master, int flags) {
    dictIterator *di;
    dictEntry *de;

    di = dictGetIterator(master->sentinels);
    while((de = dictNext(di)) != NULL) {
        sentinelRedisInstance *ri = dictGetVal(de);
        mstime_t elapsed = mstime() - ri->last_master_down_reply_time;
        char port[32];
        int retval;

        /* If the master state from other sentinel is too old, we clear it. */
        if (elapsed > SENTINEL_ASK_PERIOD*5) {
            ri->flags &= ~SRI_MASTER_DOWN;
            sdsfree(ri->leader);
            ri->leader = NULL;
        }

        /* Only ask if master is down to other sentinels if:
         *
         * 1) We believe it is down, or there is a failover in progress.
         * 2) Sentinel is connected.
         * 3) We did not receive the info within SENTINEL_ASK_PERIOD ms. */
        if ((master->flags & SRI_S_DOWN) == 0) continue;
        if (ri->link->disconnected) continue;
        if (!(flags & SENTINEL_ASK_FORCED) &&
            mstime() - ri->last_master_down_reply_time < SENTINEL_ASK_PERIOD)
            continue;

        /* Ask */
        ll2string(port,sizeof(port),master->addr->port);
        retval = redisAsyncCommand(ri->link->cc,
                    sentinelReceiveIsMasterDownReply, ri,
                    "%s is-master-down-by-addr %s %s %llu %s",
                    sentinelInstanceMapCommand(ri,"SENTINEL"),
                    master->addr->ip, port,
                    sentinel.current_epoch,
                    (master->failover_state > SENTINEL_FAILOVER_STATE_NONE) ?
                    sentinel.myid : "*");
        if (retval == C_OK) ri->link->pending_commands++;
    }
    dictReleaseIterator(di);
}

2.4.2.2 sentinelHandleDictOfRedisInstances-处理主从切换

/* Perform scheduled operations for all the instances in the dictionary.
 * Recursively call the function against dictionaries of slaves. */
void sentinelHandleDictOfRedisInstances(dict *instances) {
    dictIterator *di;
    dictEntry *de;
    sentinelRedisInstance *switch_to_promoted = NULL;

    /* There are a number of things we need to perform against every master. */
    di = dictGetIterator(instances);
    while((de = dictNext(di)) != NULL) {
        sentinelRedisInstance *ri = dictGetVal(de);

        sentinelHandleRedisInstance(ri);
        if (ri->flags & SRI_MASTER) {
            sentinelHandleDictOfRedisInstances(ri->slaves);
            sentinelHandleDictOfRedisInstances(ri->sentinels);
            if (ri->failover_state == SENTINEL_FAILOVER_STATE_UPDATE_CONFIG) {
                switch_to_promoted = ri;
            }
        }
    }
    if (switch_to_promoted)
        sentinelFailoverSwitchToPromotedSlave(switch_to_promoted);
    dictReleaseIterator(di);
}

2.4.3 sentinelRunPendingScripts-执行脚本任务

/* Run pending scripts if we are not already at max number of running
 * scripts. */
void sentinelRunPendingScripts(void) {
    listNode *ln;
    listIter li;
    mstime_t now = mstime();

    /* Find jobs that are not running and run them, from the top to the
     * tail of the queue, so we run older jobs first. */
    listRewind(sentinel.scripts_queue,&li);    
    /* 跳过正在运行的脚本 */
    while (sentinel.running_scripts < SENTINEL_SCRIPT_MAX_RUNNING &&
           (ln = listNext(&li)) != NULL)
    {
        sentinelScriptJob *sj = ln->value;
        pid_t pid;

        /* Skip if already running. */
        /* 该脚本没有到达重新执行的时间, 跳过 */
        if (sj->flags & SENTINEL_SCRIPT_RUNNING) continue;

        /* Skip if it's a retry, but not enough time has elapsed. */
        /* 设置正在执行标志 */
        if (sj->start_time && sj->start_time > now) continue;
        /* 开始执行时间 */
        sj->flags |= SENTINEL_SCRIPT_RUNNING;
        sj->start_time = mstime();
        /* 执行次数加1 */
        sj->retry_num++;
        /* 创建子进程执行 */
        pid = fork();
        /* fork()失败,报告错误 */
        if (pid == -1) {
            /* Parent (fork error).
             * We report fork errors as signal 99, in order to unify the
             * reporting with other kind of errors. */
            sentinelEvent(LL_WARNING,"-script-error",NULL,
                          "%s %d %d", sj->argv[0], 99, 0);
            sj->flags &= ~SENTINEL_SCRIPT_RUNNING;
            sj->pid = 0;
        } 
        /* 子进程执行的代码 */
        else if (pid == 0) 
        {
            /* Child */
            execve(sj->argv[0],sj->argv,environ);
            /* If we are here an error occurred. */
            _exit(2); /* Don't retry execution. */
        } 
        /* 父进程,更新脚本的pid,和同时执行脚本的个数 */
        else {
            sentinel.running_scripts++;
            sj->pid = pid;
            /* 通知事件 */
            sentinelEvent(LL_DEBUG,"+script-child",NULL,"%ld",(long)pid);
        }
    }
}

执行脚本需要创建一个子进程，

子进程：执行没有正在执行和已经到了执行时间的脚本任务；

父进程：更新脚本的信息。例如：正在执行的个数和执行脚本的子进程的pid等等；

父进程更新完脚本的信息后就会继续执行下一个sentinelCollectTerminatedScripts()函数.

2.4.4 sentinelCollectTerminatedScripts-脚本清理工作

如果在子进程执行的脚本已经执行完成,则可以从脚本队列中将其删除;
如果在子进程执行的脚本执行出错,但是可以在规定时间后重新执行,那么设置其执行的时间,下个周期重新执
行;
如果在子进程执行的脚本执行出错,但是无法在执行,那么也会脚本队里中将其删除.

/* Check for scripts that terminated, and remove them from the queue if the
 * script terminated successfully. If instead the script was terminated by
 * a signal, or returned exit code "1", it is scheduled to run again if
 * the max number of retries did not already elapsed. */
void sentinelCollectTerminatedScripts(void) {
    int statloc;
    pid_t pid;
    /* 接受子进程退出码, WNOHANG-如果没有子进程退出,则立刻返回 */
    while ((pid = wait3(&statloc,WNOHANG,NULL)) > 0) {
        int exitcode = WEXITSTATUS(statloc);
        int bysignal = 0;
        listNode *ln;
        sentinelScriptJob *sj;
        /* 获取造成脚本终止的信号 */
        if (WIFSIGNALED(statloc)) bysignal = WTERMSIG(statloc);
        sentinelEvent(LL_DEBUG,"-script-child",NULL,"%ld %d %d",
            (long)pid, exitcode, bysignal);
        /* 根据pid查找并返回正在运行的脚本节点 */
        ln = sentinelGetScriptListNodeByPid(pid);
        if (ln == NULL) {
            serverLog(LL_WARNING,"wait3() returned a pid (%ld) we can't find in our scripts execution queue!", (long)pid);
            continue;
        }
        sj = ln->value;

        /* If the script was terminated by a signal or returns an
         * exit code of "1" (that means: please retry), we reschedule it
         * if the max number of retries is not already reached. */
        /* 如果退出码是1并且没到脚本最大的重试数量 */
        if ((bysignal || exitcode == 1) &&
            sj->retry_num != SENTINEL_SCRIPT_MAX_RETRY)
        {
            /* 取消正在执行的标志 */
            sj->flags &= ~SENTINEL_SCRIPT_RUNNING;
            sj->pid = 0;
            /* 设置下次执行脚本的时间 */
            sj->start_time = mstime() +
                             sentinelScriptRetryDelay(sj->retry_num);
        }
        /* 脚本不能重新执行 */ 
        else 
        {
            /* Otherwise let's remove the script, but log the event if the
             * execution did not terminated in the best of the ways. */
            /* 发送脚本错误的事件通知 */
            if (bysignal || exitcode != 0) {
                sentinelEvent(LL_WARNING,"-script-error",NULL,
                              "%s %d %d", sj->argv[0], bysignal, exitcode);
            }
            /* 从脚本队列中删除脚本 */
            listDelNode(sentinel.scripts_queue,ln);
            /* 释放一个脚本任务结构和所有关联的数据 */
            sentinelReleaseScriptJob(sj);
        }
        /* 目前正在执行脚本的数量减1 */
        sentinel.running_scripts--;
    }
}

2.4.4 sentinelKillTimedoutScripts-杀死超时脚本

/* Kill scripts in timeout, they'll be collected by the
 * sentinelCollectTerminatedScripts() function. */
/*
Sentinel规定一个脚本最多执行60s,如果执行超时,则会杀死正在执行的脚本.
*/
void sentinelKillTimedoutScripts(void) {
    listNode *ln;
    listIter li;
    mstime_t now = mstime();
    /* 遍历脚本队列 */
    listRewind(sentinel.scripts_queue,&li);
    /* 如果当前脚本正在执行且执行,且脚本执行的时间超过60s */
    while ((ln = listNext(&li)) != NULL) {
        sentinelScriptJob *sj = ln->value;

        if (sj->flags & SENTINEL_SCRIPT_RUNNING &&
            (now - sj->start_time) > SENTINEL_SCRIPT_MAX_RUNTIME)
        {
            /* 发送脚本超时的事件 */
            sentinelEvent(LL_WARNING,"-script-timeout",NULL,"%s %ld",
                sj->argv[0], (long)sj->pid);
            /* 杀死执行脚本的子进程 */
            kill(sj->pid,SIGKILL);
        }
    }
}

2.4.5 server.hz的调整防脑裂

官网-脑裂

+----+         +----+
| M1 |---------| R1 |
| S1 |         | S2 |
+----+         +----+

Configuration: quorum = 1
// M1是主节点
// R1是从节点
// S1、S2是哨兵节点

在此种情况中,如果主节点M1出现故障,那么R1将被晋升为主节点,因为两个Sentinel节点可以就配置的
quorum = 1达成一致, 并且会执行故障转移操作, 如下图所示：
+----+           +------+
| M1 |----//-----| [M1] |
| S1 |           | S2   |
+----+           +------+

如果执行了故障转移之后,就会完全以对称的方式创建了两个主节点. 客户端可能会不明确的写入数据到
两个主节点,这就可能造成很多严重的后果,例如：争抢服务器的资源, 争抢应用服务, 数据损坏等等.

因此，最好不要进行这样的部署.

在哨兵模式的主函数sentinelTimer(),为了防止这样的部署造成的一些后果,所以每次执行后都会更改服
务器的周期任务执行频率,如下所述：

server.hz = CONFIG_DEFAULT_HZ + rand() % CONFIG_DEFAULT_HZ;

不断改变Redis定期任务的执行频率,以便使每个Sentinel节点都不同步,这种不确定性可以避免Sentinel在同一
时间开始完全继续保持同步,当被要求进行投票时,一次又一次在同一时间进行投票,因为脑裂导致有可能没有胜选
者.

你可能感兴趣的:(redis)

docker创建的mysql没有配置文件_使用docker安装mysql, redis, kafka等各类服务 Gyrolt
前言大致说来,docker的作用如下绝大部分应用，开发者都可以通过dockerbuild创建镜像，通过dockerpush上传镜像，用户通过dockerpull下载镜像，用dockerrun运行应用。用户不需要再去关心如何搭建环境，如何安装，如何解决不同发行版的库冲突——而且通常不会需要消耗更多的硬件资源，不会明显降低性能。也就是实现了标准化、集装箱如果想要简单使用,可以看答主的这一片文章:番茄番
Tiny RDM：为什么说程序员都需要他，这款开源项目，太好用，轻量化的跨平台Redis桌面客户端，谁用谁知道！！小华同学ai 开源 redis 数据库
嗨，大家好，我是小华同学，关注我们获得“最新、最全、最优质”开源项目和高效工作学习方法TinyRDM是一款现代化、轻量级的跨平台Redis桌面客户端。它支持Mac、Windows和Linux系统，提供了丰富的功能特性，旨在为开发者提供便捷、高效的Redis操作体验。功能特性极度轻量TinyRDM基于Webview2构建，不内嵌浏览器，这使得它在保持轻量级的同时，也拥有出色的性能。感谢Wails框架
「差生文具多系列」推荐两个好看的 Redis 客户端古时的风筝杂说 redis 数据库缓存 Redis客户端
声明：大家好，我是风筝作者主页：【古时的风筝CSDN主页】。⚠️本文目的为个人学习记录及知识分享。如果有什么不正确、不严谨的地方请及时指正，不胜感激。直达博主：「古时的风筝」。（搜索或点击扫码）————————————————大家好，我是风筝软件推荐时间到，推荐两款我常用的Redis客户端，都是免费的，且支持Mac、Windows，如果你之前的Redis客户端用的不顺手，可以试试下面这两个。Re
Tiny RDM：轻量级跨平台Redis桌面管理工具廉峥旭
TinyRDM：轻量级跨平台Redis桌面管理工具tiny-rdmAModernRedisGUIClient项目地址:https://gitcode.com/gh_mirrors/ti/tiny-rdm项目基础介绍TinyRDM（TinyRedisDesktopManager）是一款现代化的轻量级Redis桌面管理工具，适用于Mac、Windows和Linux平台。该项目主要使用Go、Vue和Ja
Redis桌面工具:Tiny RDM 微刻时光微秒速递 redis 数据库缓存笔记
1.TinyRDM介绍TinyRDM（TinyRedisDesktopManager）是一个现代化、轻量级的Redis桌面客户端，支持Linux、Mac和Windows操作系统。它专为开发和运维人员设计，使得与Redis服务器的交互操作更加便捷愉快。TinyRDM提供了丰富的Redis数据操作功能，具备现代化的界面设计和良好的用户体验，使得Redis的管理和运维变得更加简单高效。2.核心功能极致轻
若依框架二次开发——启动 RuoYi-Cloud 微服务项目 bjzhang75 项目开发实践微服务若依
文章目录前期准备第一步：拉取RuoYi-Cloud项目源码第二步：初始化数据库1.创建数据库2.导入数据第三步：配置Nacos并启用持久化1.下载并解压Nacos2.启动Nacos3.访问Nacos控制台第四步：安装并运行Redis1.安装Redis2.启动Redis第五步：修改后端配置第六步：启动后端服务第七步：启动前端项目1.进入前端项目目录2.安装前端依赖3.启动前端第八步：访问系统总结Ru
Orange 单体架构 - 快速启动 mmd0308 Orange 开源项目架构开源
1后端服务1.1基础设施组件说明版本MySQLMySQL数据库服务5.7/8+JavaJava17redis-stackRedis向量数据库最新版本Node安装Node22.11.0+1.2orange-dependencies-parent项目Maven依赖版本管理1.2.1项目克隆GitHubgitclonehttps://github.com/hengzq/orange-dependenci
基于Redis分布锁+事务补偿解决数据不一致性问题 yiridancan 并发编程 Redis 分布式 redis 数据库缓存
基于Redis的分布式设备库存服务设计与实现概述本文介绍一个基于Redis实现的分布式设备库存服务方案，通过分布式锁、重试机制和事务补偿等关键技术，保证在并发场景下库存操作的原子性和一致性。该方案适用于物联网设备管理、分布式资源调度等场景。代码实现importjava.util.HashMap;importjava.util.Map;importorg.slf4j.Logger;importorg
Linux------Redis(软件安装，Linux下和Windows下)，NoSQL（简单了解） .墨迹. Linux redis 大数据 java
文章目录NoSql1.历史1.单机MySql2.Memcached(缓存)+MySql+垂直拆分(读写分离)3.分库分表+水平拆分+MySql集群4.如今最近的年代5.为什么要使用NoSQL2.什么是NoSQL1.NOSQL2.特点3.3v+3高3.NoSQL的四大分类1.kv键值对：2.文档型数据库（bson和json一样）：3.列存储数据库：4.图关系型数据库Redis1.初始redis1.简
基于跳表实现的轻量级KV存储引擎项目总结码云笔记后端 KV存储
项目介绍KV存储引擎众所周知，非关系型数据库redis，以及levedb，rockdb其核心存储引擎的数据结构就是跳表。本项目就是基于跳表实现的轻量级键值型存储引擎，使用C++实现。插入数据、删除数据、查询数据、数据展示、数据落盘、文件加载数据，以及数据库大小显示。在随机写读情况下，该项目每秒可处理啊请求数（QPS）:24.39w，每秒可处理读请求数（QPS）:18.41w项目存储文件main.c
从零实现KV存储项目实战程序员老舅 C++Linux后端 c++c++存储 kv存储分布式存储后端项目 c++项目 cpp项目
本项目是从零实现一个完整的、兼容Redis协议的KV数据库项目。通过每一行代码的编写。你会对整个系统了如指拿，这样对自己基本功的锻炼、对编程能力的提升都是很大的项目提供完整的视频教程+代码下面是关于KV存储项目的技术大纲：如果你在学习的过程当中，遇到有任何问题，都可以在项目社群提出了，有专人给大家答疑的。适用人群这个KV存储项目对以下同学应该都非常的合适,包括但不限于:●想入门数据库的同学，存储对
硬核项目 KV 存储，轻松拿捏面试官！程序员老舅 C++Linux后端 KV存储 C++C++后端开发 Redis 内存索引 C++数据结构
硬核项目KV存储，轻松拿捏面试官！在简历上如何写这个项目？项目概述基于Bitcask模型，兼容Redis数据结构和协议的高性能KV存储引擎设计细节采用Key/Value的数据模型，实现数据存储和检索的快速、稳定、高效存储模型：采用Bitcask存储模型，具备高吞吐量和低读写放大的特征持久化：实现了数据的持久化，确保数据的可靠性和可恢复性索引：多种内存索引结构，高效、快速数据访问并发控制：使用锁机制
mysql的数据如何进kafka_MySQL数据实时增量同步到Kafka IT巫师
一、go-mysql-transfergo-mysql-transfer是一款MySQL实时、增量数据同步工具。能够实时解析MySQL二进制日志binlog，并生成指定格式的消息，同步到接收端。go-mysql-transfer具有如下特点：1、不依赖其它组件，一键部署2、集成多种接收端，如：Redis、MongoDB、Elasticsearch、RabbitMQ、Kafka、RocketMQ，不
PHP框架为基础的购物平台设计思路分步骤说明星糖曙光后端语言（node javascript vue等等）学习课程设计 vue.js python php
以下是以PHP框架为基础的购物平台设计思路分步骤说明：一、技术选型阶段技术栈={后端框架：Laravel/Yii2（提供ORM、路由、中间件支持）前端框架：Vue.js/React（可选SPA方案）数据库：MySQL8.0+（事务型数据存储）缓存：Redis（会话/商品缓存）队列：RabbitMQ（异步处理订单）\text{技术栈}=\begin{cases}后端框架：Laravel/Yii2（提
夜莺[n9e] v6 中心机房部署 DuanHao_ prometheus
文章目录夜莺v6中心机房部署n9e监控服务VictoriaMetrics时序数据库Categraf采集器夜莺v6中心机房部署n9e监控服务项目介绍-快猫星云(flashcat.cloud)IP：192.168.*.*端口：17000安装部署安装路径192.168.*.*/opt/n9eMysql:存放配置类别信息，如用户，监控大盘，告警规则等Redis:存放访问令牌(JWTToken)，心跳信息，
如何进行PHP性能优化？破碎的天堂鸟 PHP学习 php 性能优化开发语言
PHP性能优化是一个复杂且多方面的过程，涉及从代码层面到服务器配置的多个方面。以下是一些关键的优化技巧和最佳实践：选择合适的数据结构（如数组、对象等）可以显著提高程序的运行效率。缓存是提升PHP性能的有效手段之一。可以通过页面缓存、数据缓存、内存缓存等方式来减少重复计算。例如，使用APC、Memcached或Redis进行内存缓存，或者利用文件系统进行数据缓存。使用索引、优化SQL查询语句以及使用
Cursor怎样设置中文 dkgee vscode
要将Cursor软件设置为中文，可以按照以下步骤进行操作：1.打开Cursor软件：首先，启动Cursor软件。2.打开命令面板：按下键盘组合键`Ctrl+Shift+P`，这将打开命令面板。3.输入语言配置命令：在命令面板的搜索框中输入`ConfigureDisplayLanguage`，然后按下回车键。4.选择中文：在弹出的选项中选择中文（Chinese），系统会提示您重启软件以应用更改。5.
【Docker系列四】Docker 网络 Kwan的解忧杂货铺@新空间代码工作室 s4 Docker系列 docker 网络容器
欢迎来到我的博客，很高兴能够在这里和您见面！希望您在这里可以感受到一份轻松愉快的氛围，不仅可以获得有趣的内容和知识，也可以畅所欲言、分享您的想法和见解。推荐:kwan的首页,持续学习,不断总结,共同进步,活到老学到老导航檀越剑指大厂系列:全面总结java核心技术,jvm,并发编程redis,kafka,Spring,微服务等常用开发工具系列:常用的开发工具,IDEA,Mac,Alfred,Git,
python爬虫Redis数据库 Æther_9 Python爬虫零基础入门数据库 python 爬虫
Redis数据库Redis简介Redis是完全开源免费的，遵守BSD协议，是一个高性能的key-value数据库。Redis与其他key-value缓存产品有以下三个特点：Redis支持数据的持久化，可以将内存中的数据保存在磁盘中，重启的时候可以再次加载进行使用。Redis不仅仅支持简单的key-value类型的数据，同时还提供list，set，zset，hash等数据结构的存储。redis：半持
Swoole v6 要来了！即将增加多线程支持 phpswoole
在PHP+Swoole的服务器编程开发中，协程的支持已经解决了大部分难题，但是我们发现跨进程读写数据依然很难，需要借助进程间通信（IPC）、Redis、Swoole\Table或其他共享内存实现。Redis、IPC进程间通信方式性能较差。而Swoole\Table的问题是需要固定分配内存，无法扩容，存在诸多限制。除此之外，多进程的调试非常麻烦，例如我们要使用gdb就需要gdb-p逐个进程去追踪，而
docker 内容器访问另一个容器中的服务 docker容器网络
docker中有两个容器，分别名为mq和hyperf，想在hyperf中访问mq可以使用下面的方法1.创建网络dockernetworkcreatemynetwork2.将使用到的容器(如业务容器，mysql，redis，mq等容器)都加入到网络中dockernetworkconnectmynetworkhyperfdockernetworkconnectmynetworkmqdockernetw
宝塔安装mayfly-go mayans005 数据库
mayfly-go:web版linux(终端文件脚本进程)、数据库(mysqlpgsql)、redis(单机哨兵集群)、mongo统一管理操作平台。1、终端执行命令下载程序包wgethttps://gitee.com/objs/mayfly-go/releases/download/v1.3.0/mayfly-go-linux-amd64.zip2、在宝塔新建一个MySQL数据库，将下载程序包中的
开源项目推荐：Mayfly-go 周风队
开源项目推荐：Mayfly-gomayfly-goweb版linux(终端文件脚本进程)、数据库(mysqlpgsql高斯达梦)、redis(单机哨兵集群)、mongo统一管理操作平台。项目地址:https://gitcode.com/gh_mirrors/ma/mayfly-goMayfly-go是一个基于浏览器的统一管理操作平台，它支持多种数据库和系统管理功能。该项目主要使用Go语言和前端框架
推荐项目：Mayfly-Go - 高性能的时间序列数据库齐游菊Rosemary
推荐项目：Mayfly-Go-高性能的时间序列数据库mayfly-goweb版linux(终端文件脚本进程)、数据库(mysqlpgsql高斯达梦)、redis(单机哨兵集群)、mongo统一管理操作平台。项目地址:https://gitcode.com/gh_mirrors/ma/mayfly-go项目简介是一款由Dromara团队开发的高性能、轻量级时间序列数据库（TimeSeriesData
Mayfly-Go 开源项目教程方蕾嫒Falcon
Mayfly-Go开源项目教程mayfly-goweb版linux(终端文件脚本进程)、数据库(mysqlpgsql高斯达梦)、redis(单机哨兵集群)、mongo统一管理操作平台。项目地址:https://gitcode.com/gh_mirrors/ma/mayfly-go项目介绍Mayfly-Go是一个基于Go语言开发的开源项目，旨在提供一个轻量级、高性能的微服务框架。该项目由Dromar
Redis中的数据类型与适用场景 cooldream2009 数据库 redis 数据库缓存
目录前言1.字符串(String)1.1特点1.2适用场景2.哈希(Hash)2.1特点2.2适用场景3.列表(List)3.1特点3.2适用场景4.集合(Set)4.1特点4.2适用场景5.有序集合(SortedSet)5.1特点5.2适用场景6.Redis数据类型的选型建议结语前言Redis作为一款高性能的内存数据库，以其卓越的速度和丰富的数据类型广泛应用于各类高并发场景。相较于传统的关系型数
Redis大key 不7夜宵 redis bootstrap 数据库
Redis大key基本概念，影响Redis大key指在Redis中存储了大量数据的键，它会对Redis的性能和内存管理产生影响。大key的定义与value的大小和元素数量有关，但这个定义并不是绝对的，而是相对的，具体取决于系统的使用场景和性能要求。大key通常有以下两种情况：Value存储占用空间大集合类型的Key中元素过多![[Pastedimage20250227151208.png]]造成的
秒杀场景的设计思考思无邪6675 后端
秒杀场景的设计思考在学习Redis的之后，一个绕不开的话题就是秒杀系统的设计。本文将从下面几个方面展开一下个人简单的理解：秒杀场景的介绍设计的核心思路怎么限流、削峰、异步planB总结‍秒杀场景的介绍秒杀场景是大家常说的高并发场景，但是实际上其与单纯的高并发还有一点不同，主要区别就是其流量来的猛增，几乎是一个垂直的增长，而非线性增长的并发。其具有如下特点：瞬时高并发读多写少不能超卖设计的核心思路在
如何保证 Redis 缓存与数据库双写一致性？凌志学java 后端数据库缓存 redis 数据库
在做系统优化时，想到了将数据进行分级存储的思路。因为在系统中会存在一些数据，有些数据的实时性要求不高，比如一些配置信息。基本上配置了很久才会变一次。而有一些数据实时性要求非常高，比如订单和流水的数据。所以这里根据数据要求实时性不同将数据分为三级。第1级：订单数据和支付流水数据；这两块数据对实时性和精确性要求很高，所以不添加任何缓存，读写操作将直接操作数据库。第2级：用户相关数据；这些数据和用户相关
java队列实现限流_如何使用队列实现微服务限流算法？纽太普 java队列实现限流
队列在平时开发中可能是出现频率最高的数据结构之一了，但是大部分情况下，我们都是用别人已经实现好的，比如kafka，比如redis里的list，以至于让人怀疑为什么还要去学习队列呢？希望今天的内容可以给你一些启发。什么是队列为了整个文章的完整性，我们还是来介绍一下什么是队列。我们举个生活中常见的案例，假设你在周杰伦的奶茶店买奶茶，由于人很多，为了保持公平和秩序，你被要求排队，最先来的人排到最前面，这
解读Servlet原理篇二---GenericServlet与HttpServlet 周凡杨 java HttpServlet 源理 GenericService 源码
在上一篇《解读Servlet原理篇一》中提到，要实现javax.servlet.Servlet接口（即写自己的Servlet应用），你可以写一个继承自javax.servlet.GenericServletr的generic Servlet ，也可以写一个继承自java.servlet.http.HttpServlet的HTTP Servlet（这就是为什么我们自定义的Servlet通常是exte
MySQL性能优化 bijian1013 数据库 mysql
性能优化是通过某些有效的方法来提高MySQL的运行速度，减少占用的磁盘空间。性能优化包含很多方面，例如优化查询速度，优化更新速度和优化MySQL服务器等。本文介绍方法的主要有： a.优化查询 b.优化数据库结构
ThreadPool定时重试 dai_lm java ThreadPool thread timer timertask
项目需要当某事件触发时，执行http请求任务，失败时需要有重试机制，并根据失败次数的增加，重试间隔也相应增加，任务可能并发。由于是耗时任务，首先考虑的就是用线程来实现，并且为了节约资源，因而选择线程池。为了解决不定间隔的重试，选择Timer和TimerTask来完成 package threadpool; public class ThreadPoolTest {
Oracle 查看数据库的连接情况周凡杨 sql oracle 连接
首先要说的是，不同版本数据库提供的系统表会有不同，你可以根据数据字典查看该版本数据库所提供的表。 select * from dict where table_name like '%SESSION%'; 就可以查出一些表，然后根据这些表就可以获得会话信息 select sid,serial#,status,username,schemaname,osuser,terminal,ma
类的继承朱辉辉33 java
类的继承可以提高代码的重用行，减少冗余代码；还能提高代码的扩展性。Java继承的关键字是extends 格式:public class 类名（子类）extends 类名（父类）{ } 子类可以继承到父类所有的属性和普通方法，但不能继承构造方法。且子类可以直接使用父类的public和 protected属性，但要使用private属性仍需通过调用。子类的方法可以重写，但必须和父类的返回值类
android 悬浮窗特效肆无忌惮_ android
最近在开发项目的时候需要做一个悬浮层的动画，类似于支付宝掉钱动画。但是区别在于，需求是浮出一个窗口，之后边缩放边位移至屏幕右下角标签处。效果图如下：一开始考虑用自定义View来做。后来发现开线程让其移动很卡，ListView+动画也没法精确定位到目标点。后来想利用Dialog的dismiss动画来完成。自定义一个Dialog后，在styl
hadoop伪分布式搭建林鹤霄 hadoop
要修改4个文件 1: vim hadoop-env.sh 第九行 2: vim core-site.xml <configuration> &n
gdb调试命令 aigo gdb
原文：http://blog.csdn.net/hanchaoman/article/details/5517362 一、GDB常用命令简介 r run 运行.程序还没有运行前使用 c cuntinue
Socket编程的HelloWorld实例 alleni123 socket
public class Client { public static void main(String[] args) { Client c=new Client(); c.receiveMessage(); } public void receiveMessage(){ Socket s=null; BufferedRea
线程同步和异步百合不是茶线程同步异步
多线程和同步 : 如进程、线程同步，可理解为进程或线程A和B一块配合，A执行到一定程度时要依靠B的某个结果，于是停下来，示意B运行；B依言执行，再将结果给A；A再继续操作。所谓同步，就是在发出一个功能调用时，在没有得到结果之前，该调用就不返回，同时其它线程也不能调用这个方法多线程和异步:多线程可以做不同的事情,涉及到线程通知 &
JSP中文乱码分析 bijian1013 java jsp 中文乱码
在JSP的开发过程中，经常出现中文乱码的问题。首先了解一下Java中文问题的由来： Java的内核和class文件是基于unicode的，这使Java程序具有良好的跨平台性，但也带来了一些中文乱码问题的麻烦。原因主要有两方面，
js实现页面跳转重定向的几种方式 bijian1013 JavaScript 重定向
js实现页面跳转重定向有如下几种方式：一.window.location.href <script language="javascript"type="text/javascript"> window.location.href="http://www.baidu.c
【Struts2三】Struts2 Action转发类型 bit1129 struts2
在【Struts2一】 Struts Hello World http://bit1129.iteye.com/blog/2109365中配置了一个简单的Action，配置如下 <!DOCTYPE struts PUBLIC "-//Apache Software Foundation//DTD Struts Configurat
【HBase十一】Java API操作HBase bit1129 hbase
Admin类的主要方法注释： 1. 创建表 /** * Creates a new table. Synchronous operation. * * @param desc table descriptor for table * @throws IllegalArgumentException if the table name is res
nginx gzip ronin47 nginx gzip
Nginx GZip 压缩 Nginx GZip 模块文档详见：http://wiki.nginx.org/HttpGzipModule 常用配置片段如下： gzip on; gzip_comp_level 2; # 压缩比例，比例越大，压缩时间越长。默认是1 gzip_types text/css text/javascript; # 哪些文件可以被压缩 gzip_disable &q
java-7.微软亚院之编程判断俩个链表是否相交给出俩个单向链表的头指针，比如 h1 ， h2 ，判断这俩个链表是否相交 bylijinnan java
public class LinkListTest { /** * we deal with two main missions: * * A. * 1.we create two joined-List(both have no loop) * 2.whether list1 and list2 join * 3.print the join
Spring源码学习-JdbcTemplate batchUpdate批量操作 bylijinnan java spring
Spring JdbcTemplate的batch操作最后还是利用了JDBC提供的方法，Spring只是做了一下改造和封装 JDBC的batch操作： String sql = "INSERT INTO CUSTOMER " + "(CUST_ID, NAME, AGE) VALUES (?, ?, ?)";
[JWFD开源工作流]大规模拓扑矩阵存储结构最新进展 comsci 工作流
生成和创建类已经完成,构造一个100万个元素的矩阵模型,存储空间只有11M大,请大家参考我在博客园上面的文档"构造下一代工作流存储结构的尝试",更加相信的设计和代码将陆续推出......... 竞争对手的能力也很强.......,我相信..你们一定能够先于我们推出大规模拓扑扫描和分析系统的....
base64编码和url编码 cuityang base64 url
import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import java.io.PrintWriter; import java.io.StringWriter; import java.io.UnsupportedEncodingException;
web应用集群Session保持 dalan_123 session
关于使用 memcached 或redis 存储 session ，以及使用 terracotta 服务器共享。建议使用 redis，不仅仅因为它可以将缓存的内容持久化，还因为它支持的单个对象比较大，而且数据类型丰富，不只是缓存 session，还可以做其他用途，一举几得啊。1、使用 filter 方法存储这种方法比较推荐，因为它的服务器使用范围比较多，不仅限于tomcat ，而且实现的原理比较简
Yii 框架里数据库操作详解-[增加、查询、更新、删除的方法 'AR模式'] dcj3sjt126com 数据库
public function getMinLimit () { $sql = "..."; $result = yii::app()->db->createCo
solr StatsComponent（聚合统计） eksliang solr聚合查询 solr stats
StatsComponent 转载请出自出处：http://eksliang.iteye.com/blog/2169134 http://eksliang.iteye.com/ 一、概述 Solr可以利用StatsComponent 实现数据库的聚合统计查询，也就是min、max、avg、count、sum的功能二、参数
百度一道面试题 greemranqq 位运算百度面试寻找奇数算法 bitmap 算法
那天看朋友提了一个百度面试的题目：怎么找出{1,1,2,3,3,4,4,4,5,5,5,5} 找出出现次数为奇数的数字. 我这里复制的是原话，当然顺序是不一定的，很多拿到题目第一反应就是用map,当然可以解决，但是效率不高。还有人觉得应该用算法xxx,我是没想到用啥算法好...！还有觉得应该先排序... 还有觉
Spring之在开发中使用SpringJDBC ihuning spring
在实际开发中使用SpringJDBC有两种方式： 1. 在Dao中添加属性JdbcTemplate并用Spring注入； JdbcTemplate类被设计成为线程安全的，所以可以在IOC 容器中声明它的单个实例，并将这个实例注入到所有的 DAO 实例中。JdbcTemplate也利用了Java 1.5 的特定(自动装箱，泛型，可变长度
JSON API 1.0 核心开发者自述 | 你所不知道的那些技术细节 justjavac json
2013年5月，Yehuda Katz 完成了JSON API(英文，中文) 技术规范的初稿。事情就发生在 RailsConf 之后，在那次会议上他和 Steve Klabnik 就 JSON 雏形的技术细节相聊甚欢。在沟通单一 Rails 服务器库—— ActiveModel::Serializers 和单一 JavaScript 客户端库——&
网站项目建设流程概述 macroli 工作
一.概念网站项目管理就是根据特定的规范、在预算范围内、按时完成的网站开发任务。二.需求分析项目立项　　我们接到客户的业务咨询，经过双方不断的接洽和了解，并通过基本的可行性讨论够，初步达成制作协议，这时就需要将项目立项。较好的做法是成立一个专门的项目小组，小组成员包括：项目经理，网页设计，程序员，测试员，编辑/文档等必须人员。项目实行项目经理制。客户的需求说明书　　第一步是需
AngularJs 三目运算表达式判断 qiaolevip 每天进步一点点学习永无止境众观千象 AngularJS
事件回顾：由于需要修改同一个模板，里面包含2个不同的内容，第一个里面使用的时间差和第二个里面名称不一样，其他过滤器，内容都大同小异。希望杜绝If这样比较傻的来判断if-show or not，继续追究其源码。 var b = "{{", a = "}}"; this.startSymbol = function(a) {
Spark算子：统计RDD分区中的元素及数量 superlxw1234 spark spark算子 Spark RDD分区元素
关键字：Spark算子、Spark RDD分区、Spark RDD分区元素数量 Spark RDD是被分区的，在生成RDD时候，一般可以指定分区的数量，如果不指定分区数量，当RDD从集合创建时候，则默认为该程序所分配到的资源的CPU核数，如果是从HDFS文件创建，默认为文件的Block数。可以利用RDD的mapPartitionsWithInd
Spring 3.2.x将于2016年12月31日停止支持 wiselyman Spring 3
Spring 团队公布在2016年12月31日停止对Spring Framework 3.2.x（包含tomcat 6.x）的支持。在此之前spring团队将持续发布3.2.x的维护版本。请大家及时准备及时升级到Spring
fis纯前端解决方案fis-pure zccst JavaScript
作者：zccst FIS通过插件扩展可以完美的支持模块化的前端开发方案，我们通过FIS的二次封装能力，封装了一个功能完备的纯前端模块化方案pure。 1，fis-pure的安装 $ fis install -g fis-pure $ pure -v 0.1.4 2，下载demo到本地 git clone https://github.com/hefangshi/f

【Redis-6.0.8】哨兵源码解析上

目录

0.阅读引用

1.复习一下

1.1 配置文件

1.2 哨兵的启动模式

2.源码分析

2.1 相关源码路径

2.2 sentinelcmds

2.3 主程序启动流程

2.3.1 主流程的脉络

2.3.2 checkForSentinelMode

2.3.3 initSentinelConfig

2.3.4 initSentinel

2.3.5 loadServerConfig

2.3.5.1 loadServerConfig的实现

2.3.5.2 loadServerConfig的实现

2.3.5.3 loadServerConfig的实现

2.3.5.4 sentinelRedisInstance结构解析

2.3.5.5 Sentinel Redis Instance

2.3.5.6 createSentinelRedisInstance#loadServerConfig

2.3.5.7 sentinelGetMasterByName#loadServerConfig

2.3.6 InitServerLast

2.3.7 sentinelIsRunning

2.4 定时任务中哨兵相关流程

2.4.1 sentinelCheckTiltCondition-TILT 模式判断

2.4.2 sentinelHandleDictOfRedisInstances-执行周期性任务

2.4.2.1 sentinelHandleRedisInstance-处理Redis实例

2.4.2.1.1 sentinelReconnectInstance-重新建立连接

2.4.2.1.1.1 instanceLink结构体

2.4.2.1.1.2 redisAsyncContext结构体

2.4.2.1.1.3 redisContext结构体

2.4.2.1.1.4 关联事件循环与redis异步连接的上下文

2.4.2.1.2 sentinelSendPeriodicCommands-发送监控命令

2.4.2.1.3 sentinelCheckSubjectivelyDown-判断节点的主观下线状态

2.4.2.1.4 sentinelCheckObjectivelyDown-判断主节点的客观下线状态

2.4.2.1.5 sentinelFailoverStateMachine-对主节点执行故障转移

2.4.2.1.5.1 sentinelFailoverWaitStart-故障转移开始

2.4.2.1.5.2 sentinelFailoverSelectSlave-选择一个要晋升的从节点

2.4.2.1.5.3 sentinelFailoverSendSlaveOfNoOne-使从节点变为主节点

2.4.2.1.5.4 sentinelFailoverReconfNextSlave-从节点同步新的主节点

2.4.2.1.6 sentinelAskMasterStateToOtherSentinels-更新主节点的状态

2.4.2.2 sentinelHandleDictOfRedisInstances-处理主从切换

2.4.3 sentinelRunPendingScripts-执行脚本任务

2.4.4 sentinelCollectTerminatedScripts-脚本清理工作

2.4.4 sentinelKillTimedoutScripts-杀死超时脚本

2.4.5 server.hz的调整防脑裂

你可能感兴趣的:(redis)