目录
一、通信节点选择
1.每0.1秒,如果发现有其他节点连不上,则尝试重连
2.每1秒,从5个随机节点中,选出一个其中最久没有通信的节点,进行ping
3.每0.1秒,如果发现有超过cluster-node-time/2没有通信成功的节点,则向这个节点发送ping
二、gossip ping 所发送的信息
1.节点自身的信息
2.附带1/10的其他节点信息,如果1/10少于3,那么至少附带3个其他节点的信息
通过clusterCron() /* This is executed 10 times every second */
/* Check if we have disconnected nodes and re-establish the connection. */
di = dictGetSafeIterator(server.cluster->nodes);
while((de = dictNext(di)) != NULL) {
clusterNode *node = dictGetVal(de);
...
link = createClusterLink(node);
link->fd = fd;
node->link = link;
aeCreateFileEvent(server.el,link->fd,AE_READABLE,
clusterReadHandler,link);
/* Queue a PING in the new connection ASAP: this is crucial
* to avoid false positives in failure detection.
*
* If the node is flagged as MEET, we send a MEET message instead
* of a PING one, to force the receiver to add us in its node
* table. */
old_ping_sent = node->ping_sent;
clusterSendPing(link, node->flags & CLUSTER_NODE_MEET ?
CLUSTERMSG_TYPE_MEET : CLUSTERMSG_TYPE_PING);
if (old_ping_sent) {
/* If there was an active ping before the link was
* disconnected, we want to restore the ping time, otherwise
* replaced by the clusterSendPing() call. */
node->ping_sent = old_ping_sent;
}
/* We can clear the flag after the first packet is sent.
* If we'll never receive a PONG, we'll never send new packets
* to this node. Instead after the PONG is received and we
* are no longer in meet/handshake status, we want to send
* normal PING packets. */
node->flags &= ~CLUSTER_NODE_MEET;
serverLog(LL_DEBUG,"Connecting with Node %.40s at %s:%d",
node->name, node->ip, node->port+CLUSTER_PORT_INCR);
}
}
dictReleaseIterator(di);
例子:
可以看到每个0.1s都会去ping连不上的节点
76879:M 09 Jun 17:00:49.276 . Connecting with Node 64cdc10096644b5bc3624f41ade916983806c47c at 10.200.35.93:12222
76879:M 09 Jun 17:00:49.276 . I/O error reading from node link: Connection refused
76879:M 09 Jun 17:00:49.376 . Connecting with Node 64cdc10096644b5bc3624f41ade916983806c47c at 10.200.35.93:12222
76879:M 09 Jun 17:00:49.376 . I/O error reading from node link: Connection refused
76879:M 09 Jun 17:00:49.477 . Connecting with Node 64cdc10096644b5bc3624f41ade916983806c47c at 10.200.35.93:12222
76879:M 09 Jun 17:00:49.477 . I/O error reading from node link: Connection refused
76879:M 09 Jun 17:00:49.577 . Connecting with Node 64cdc10096644b5bc3624f41ade916983806c47c at 10.200.35.93:12222
76879:M 09 Jun 17:00:49.578 . I/O error reading from node link: Connection refused
76879:M 09 Jun 17:00:49.678 . Connecting with Node 64cdc10096644b5bc3624f41ade916983806c47c at 10.200.35.93:12222
76879:M 09 Jun 17:00:49.678 . I/O error reading from node link: Connection refused
76879:M 09 Jun 17:00:49.778 . Connecting with Node 64cdc10096644b5bc3624f41ade916983806c47c at 10.200.35.93:12222
76879:M 09 Jun 17:00:49.778 . I/O error reading from node link: Connection refused
/* Ping some random node 1 time every 10 iterations, so that we usually ping
* one random node every second. */
if (!(iteration % 10)) {
int j;
/* Check a few random nodes and ping the one with the oldest
* pong_received time. */
for (j = 0; j < 5; j++) {
de = dictGetRandomKey(server.cluster->nodes);
clusterNode *this = dictGetVal(de);
/* Don't ping nodes disconnected or with a ping currently active. */
if (this->link == NULL || this->ping_sent != 0) continue;
if (this->flags & (CLUSTER_NODE_MYSELF|CLUSTER_NODE_HANDSHAKE))
continue;
if (min_pong_node == NULL || min_pong > this->pong_received) {
min_pong_node = this;
min_pong = this->pong_received;
}
}
if (min_pong_node) {
serverLog(LL_DEBUG,"Pinging node %.40s", min_pong_node->name);
clusterSendPing(min_pong_node->link, CLUSTERMSG_TYPE_PING);
}
}
例子:
可以看到,每隔一秒,去ping一个节点
76879:M 09 Jun 17:00:49.879 . Pinging node 178424affacc711aa18b46f67751072576592944 /*ping*/
76879:M 09 Jun 17:00:49.879 . --- Processing packet of type 1, 2520 bytes
76879:M 09 Jun 17:00:49.880 . pong packet received: 0x2aadd9849e00
76879:M 09 Jun 17:00:49.880 . GOSSIP 64cdc10096644b5bc3624f41ade916983806c47c 10.200.35.93:2222 master
76879:M 09 Jun 17:00:49.880 . GOSSIP 3f81377c4930f5a90479e6ec2f93941e00c5ad67 10.200.35.94:2222 slave
76879:M 09 Jun 17:00:49.880 . GOSSIP bb7664a96fc83d3f31c2649ec37894a4944ed38b 10.200.35.93:3333 master
76879:M 09 Jun 17:00:50.882 . Pinging node 8a3f1674530b066e84149d2f107c400551066b7d /*ping*/
76879:M 09 Jun 17:00:50.882 . --- Processing packet of type 1, 2520 bytes
76879:M 09 Jun 17:00:50.882 . pong packet received: 0x2aadd984a800
76879:M 09 Jun 17:00:50.882 . GOSSIP 3f81377c4930f5a90479e6ec2f93941e00c5ad67 10.200.35.94:2222 slave
76879:M 09 Jun 17:00:50.882 . GOSSIP e353bf55e229998bc77408b5b0fe8194cbcd2c99 10.200.35.93:4444 master
76879:M 09 Jun 17:00:50.882 . GOSSIP 64cdc10096644b5bc3624f41ade916983806c47c 10.200.35.93:2222 master
76879:M 09 Jun 17:00:51.886 . Pinging node 3f81377c4930f5a90479e6ec2f93941e00c5ad67 /*ping*/
76879:M 09 Jun 17:00:51.887 . --- Processing packet of type 1, 2520 bytes
76879:M 09 Jun 17:00:51.887 . pong packet received: 0x2aadd984b200
76879:M 09 Jun 17:00:51.887 . GOSSIP 178424affacc711aa18b46f67751072576592944 10.200.35.94:3333 slave
76879:M 09 Jun 17:00:51.887 . GOSSIP 64cdc10096644b5bc3624f41ade916983806c47c 10.200.35.93:2222 master
76879:M 09 Jun 17:00:51.887 . GOSSIP bb7664a96fc83d3f31c2649ec37894a4944ed38b 10.200.35.93:3333 master
/* If we are waiting for the PONG more than half the cluster
* timeout, reconnect the link: maybe there is a connection
* issue even if the node is alive. */
if (node->link && /* is connected */
now - node->link->ctime >
server.cluster_node_timeout && /* was not already reconnected */
node->ping_sent && /* we already sent a ping */
node->pong_received < node->ping_sent && /* still waiting pong */
/* and we are waiting for the pong more than timeout/2 */
now - node->ping_sent > server.cluster_node_timeout/2)
{
/* Disconnect the link, it will be reconnected automatically. */
freeClusterLink(node->link);
}
所以总结起来:
1.每0.1s,如果发现有其他节点连不上,则尝试重连
2.每秒随机找出5个节点,然后选择其中最久未通信的节点发送ping
3.每0.1s,如果发现有超过cluster-node-time/2没有通信成功的节点,则向这个节点发送ping
void clusterSendPing(clusterLink *link, int type) {
unsigned char *buf;
clusterMsg *hdr;
int gossipcount = 0; /* Number of gossip sections added so far. */
int wanted; /* Number of gossip sections we want to append if possible. */
int totlen; /* Total packet length. */
/* freshnodes is the max number of nodes we can hope to append at all:
* nodes available minus two (ourself and the node we are sending the
* message to). However practically there may be less valid nodes since
* nodes in handshake state, disconnected, are not considered. */
int freshnodes = dictSize(server.cluster->nodes)-2;
/* How many gossip sections we want to add? 1/10 of the number of nodes
* and anyway at least 3. Why 1/10?
*
* If we have N masters, with N/10 entries, and we consider that in
* node_timeout we exchange with each other node at least 4 packets
* (we ping in the worst case in node_timeout/2 time, and we also
* receive two pings from the host), we have a total of 8 packets
* in the node_timeout*2 falure reports validity time. So we have
* that, for a single PFAIL node, we can expect to receive the following
* number of failure reports (in the specified window of time):
*
* PROB * GOSSIP_ENTRIES_PER_PACKET * TOTAL_PACKETS:
*
* PROB = probability of being featured in a single gossip entry,
* which is 1 / NUM_OF_NODES.
* ENTRIES = 10.
* TOTAL_PACKETS = 2 * 4 * NUM_OF_MASTERS.
*
* If we assume we have just masters (so num of nodes and num of masters
* is the same), with 1/10 we always get over the majority, and specifically
* 80% of the number of nodes, to account for many masters failing at the
* same time.
*
* Since we have non-voting slaves that lower the probability of an entry
* to feature our node, we set the number of entires per packet as
* 10% of the total nodes we have. */
wanted = floor(dictSize(server.cluster->nodes)/10);
if (wanted < 3) wanted = 3;
if (wanted > freshnodes) wanted = freshnodes;
例子:
在3主3从的集群中,由于6/10=0.6 < 3,所以ping 需要包含3个节点的信息
某个节点ping e353bf55e229998bc77408b5b0fe8194cbcd2c99:
76879:M 09 Jun 17:00:52.889 . Pinging node e353bf55e229998bc77408b5b0fe8194cbcd2c99
在e353bf55e229998bc77408b5b0fe8194cbcd2c99中,可以看到发来的ping消息,共包含3个节点
77004:M 09 Jun 17:00:52.889 . --- Processing packet of type 0, 2520 bytes
77004:M 09 Jun 17:00:52.889 . Ping packet received: (nil)
77004:M 09 Jun 17:00:52.889 . ping packet received: (nil)
77004:M 09 Jun 17:00:52.889 . GOSSIP 64cdc10096644b5bc3624f41ade916983806c47c xxx:2222 master
77004:M 09 Jun 17:00:52.889 . GOSSIP 3f81377c4930f5a90479e6ec2f93941e00c5ad67 xxx:2222 slave
77004:M 09 Jun 17:00:52.889 . GOSSIP 178424affacc711aa18b46f67751072576592944 xxx:3333 slave