Gossip协议是Cassandra维护各节点状态的一个重要组件,下面我们以Gossip协议三次握手为线索逐步分析Gossip协议源码。
Gossip协议通过判断节点的generation和version 来确认节点状态信息新旧,如果节点重启,则generation加一,version每次从零开始计算。所以 generation是大版本号,version为小版本号,理解这个概念对后面的握手逻辑有很大帮助。
Gossip协议最重要的一个属性是endpointStateMap ,这个map以address为key,以EndpointState为value维护节点自身状态信息。EndopointState 包含了节点 net_version,host_id,rpc_address,release_version,dc,rack,load,status,tokens 信息。总体来说,所有节点维护的endpointStateMap应该是一致的,如果出现不一致信息或者新增,替换,删除节点 ,这中间的状态维护就要靠Gossip来实现了。
另外一个重要属性subscribers ,当节点状态变更时候,gossip 会通知各个subscribers。
Gossip启动时候,会每隔一秒会在集群中随机选择一个节点发送一条GossipDigestSyn消息,开始和其他节点的通信,如下图:
接下来我们根据上面的了流程图一步步分析gossip代码,GossipDigestSyn 消息是在GossipTask构造的,
1 // syn消息包含 集群名字,分区器,和gDigests消息
2 GossipDigestSyn digestSynMessage = new GossipDigestSyn(DatabaseDescriptor.getClusterName(),DatabaseDescriptor.getPartitionerName(),gDigests);
3 MessageOut message = new MessageOut(MessagingService.Verb.GOSSIP_DIGEST_SYN,digestSynMessage,GossipDigestSyn.serializer);
GossipDigestSyn 消息的主要部分在gDigests里面,gDigests是通过方法Gossiper.instance.makeRandomGossipDigest(gDigests) 生成的,
private void makeRandomGossipDigest(List gDigests)
02 {
03 EndpointState epState;
04 int generation = 0 ;
05 int maxVersion = 0 ;
06
07 // local epstate will be part of endpointStateMap
08 //当前节点维护的节点列表
09 List endpoints = new ArrayList(endpointStateMap.keySet());
10 //乱序处理
11 Collections.shuffle(endpoints, random);
12 for (InetAddress endpoint : endpoints)
13 {
14 epState = endpointStateMap.get(endpoint);
15 if (epState != null )
16 {
17 //获取generation版本号
18 generation = epState.getHeartBeatState().getGeneration();
19 //EndpointState包含了tokens,hostid,status,load等信息,所以冒泡排序获取其中最大的maxVersion
20 maxVersion = getMaxEndpointStateVersion(epState);
21 }
22 gDigests.add( new GossipDigest(endpoint, generation, maxVersion));
23 }
24
25 if (logger.isTraceEnabled())
26 {
27 StringBuilder sb = new StringBuilder();
28 for (GossipDigest gDigest : gDigests)
29 {
30 sb.append(gDigest);
31 sb.append( " " );
32 }
33 logger.trace( "Gossip Digests are : {}" , sb);
34 }
35 }
A节点发出GossipDigestSyn后,B节点会通过GossipDigestSynVerbHandler 来处理GossipDigestSyn 消息,具体处理逻辑在Gossiper.instance.examineGossiper中,
01 void examineGossiper(List gDigestList, List deltaGossipDigestList, Map deltaEpStateMap)
02 {
03
04 for ( GossipDigest gDigest : gDigestList )
05 {
06 int remoteGeneration = gDigest.getGeneration();
07 int maxRemoteVersion = gDigest.getMaxVersion();
08 /* Get state associated with the end point in digest */
09 EndpointState epStatePtr = endpointStateMap.get(gDigest.getEndpoint());
10 /*
11 Here we need to fire a GossipDigestAckMessage. If we have some data associated with this endpoint locally
12 then we follow the "if" path of the logic. If we have absolutely nothing for this endpoint we need to
13 request all the data for this endpoint.
14 */
15 if (epStatePtr != null)
16 {
17 int localGeneration = epStatePtr.getHeartBeatState().getGeneration();
18 /* get the max version of all keys in the state associated with this endpoint */
19 int maxLocalVersion = getMaxEndpointStateVersion(epStatePtr);
20 if (remoteGeneration == localGeneration && maxRemoteVersion == maxLocalVersion)
21 //如果generation和version版本号都一致,说明本地节点和远程节点版本号都一致,则跳过下面逻辑
22 continue;
23
24 if (remoteGeneration > localGeneration)
25 {
26 /* we request everything from the gossiper */
27 //如果远程节点generation版本大于本地,则向远程节点请求所有该endpoint信息
28 requestAll(gDigest, deltaGossipDigestList, remoteGeneration);
29 }
30 else if (remoteGeneration < localGeneration)
31 {
32 /* send all data with generation = localgeneration and version > 0 */
33 //如果远程节点generation 小于本地,则向远程节点发送该endpoint信息
34 sendAll(gDigest, deltaEpStateMap, 0);
35 }
36 else if (remoteGeneration == localGeneration)
37 {
38 /*
39 If the max remote version is greater then we request the remote endpoint send us all the data
40 for this endpoint with version greater than the max version number we have locally for this
41 endpoint.
42 If the max remote version is lesser, then we send all the data we have locally for this endpoint
43 with version greater than the max remote version.
44 */
45 //如果remoteVersion大于本地,则请求信息,小于本地则发送信息
46 if (maxRemoteVersion > maxLocalVersion)
47 {
48 deltaGossipDigestList.add(new GossipDigest(gDigest.getEndpoint(), remoteGeneration, maxLocalVersion));
49 }
50 else if (maxRemoteVersion < maxLocalVersion)
51 {
52 /* send all data with generation = localgeneration and version > maxRemoteVersion */
53 sendAll(gDigest, deltaEpStateMap, maxRemoteVersion);
54 }
55 }
56 }
57 else
58 {
59 /* We are here since we have no data for this endpoint locally so request everything. */
60 requestAll(gDigest, deltaGossipDigestList, remoteGeneration);
61 }
62 }
63 }
上面方法对比版本号以后,主要处理逻辑在senall方法和requestAll方法,继续跟进:
1 private void requestAll(GossipDigest gDigest, List deltaGossipDigestList, int remoteGeneration)
2 {
3 /* We are here since we have no data for this endpoint locally so request everthing. */
4 //生成一个Digest,等待对方节点发送消息
5 deltaGossipDigestList.add( new GossipDigest(gDigest.getEndpoint(), remoteGeneration, 0 ));
6 if (logger.isTraceEnabled())
7 logger.trace( "requestAll for {}" , gDigest.getEndpoint());
8 }
9 private void sendAll(GossipDigest gDigest, Map deltaEpStateMap, int maxRemoteVersion)
10 {
11 EndpointState localEpStatePtr = getStateForVersionBiggerThan(gDigest.getEndpoint(), maxRemoteVersion);
12 if (localEpStatePtr != null )
13 //将endpintState信息通过ack 消息发送给对方
14 deltaEpStateMap.put(gDigest.getEndpoint(), localEpStatePtr);
15 }
到这里我们发现向对方节点发送的ack消息已经构造完成了,包含了deltaGossipDigestList(对方节点信息最新,我们需要对方节点给我们发endpointState) 和 deltaEpStateMap(当前节点新,我们发送给对方节点) 。
Gossip 通过GossipDigestAckVerbHandler 处理ack消息,主要逻辑有两块:
1.如果deltaEpStateMap有数据,则说明需要更新本地applicationState,执行Gossiper.instance.applyStateLocally方法
2.如果deltaGossipDigestList 有数据,则说明对方节点需要更新,构造EndpointState,并发送ack2消息给对方
GossipDigestAck2VerbHandler 用来处理 ack2消息,主要逻辑也在Gossiper.instance.applyStateLocally中,我们看一下Gossiper.instance.applyStateLocally的逻辑:
01 void applyStateLocally(Map epStateMap)
02 {
03 for (Entry entry : epStateMap.entrySet())
04 {
05 InetAddress ep = entry.getKey();
06 if ( ep.equals(FBUtilities.getBroadcastAddress()) && !isInShadowRound())
07 continue ;
08 if (justRemovedEndpoints.containsKey(ep))
09 {
10 if (logger.isTraceEnabled())
11 logger.trace( "Ignoring gossip for {} because it is quarantined" , ep);
12 continue ;
13 }
14
15 EndpointState localEpStatePtr = endpointStateMap.get(ep);
16 EndpointState remoteState = entry.getValue();
17
18 /*
19 If state does not exist just add it. If it does then add it if the remote generation is greater.
20 If there is a generation tie, attempt to break it by heartbeat version.
21 */
22 if (localEpStatePtr != null)
23 {
24 int localGeneration = localEpStatePtr.getHeartBeatState().getGeneration();
25 int remoteGeneration = remoteState.getHeartBeatState().getGeneration();
26 long localTime = System.currentTimeMillis()/1000;
27 if (logger.isTraceEnabled())
28 logger.trace("{} local generation {}, remote generation {}", ep, localGeneration, remoteGeneration);
29
30 // We measure generation drift against local time, based on the fact that generation is initialized by time
31 if (remoteGeneration > localTime + MAX_GENERATION_DIFFERENCE)
32 {
33 // assume some peer has corrupted memory and is broadcasting an unbelievable generation about another peer (or itself)
34 logger.warn("received an invalid gossip generation for peer {}; local time = {}, received generation = {}", ep, localTime, remoteGeneration);
35 }
36 else if (remoteGeneration > localGeneration)
37 {
38 if (logger.isTraceEnabled())
39 logger.trace("Updating heartbeat state generation to {} from {} for {}", remoteGeneration, localGeneration, ep);
40 // major state change will handle the update by inserting the remote state directly
41 //通知订阅者节点状态发生变化
42 handleMajorStateChange(ep, remoteState);
43 }
44 else if (remoteGeneration == localGeneration) // generation has not changed, apply new states
45 {
46 /* find maximum state */
47 int localMaxVersion = getMaxEndpointStateVersion(localEpStatePtr);
48 int remoteMaxVersion = getMaxEndpointStateVersion(remoteState);
49 if (remoteMaxVersion > localMaxVersion)
50 {
51 // apply states, but do not notify since there is no major change
52 //更新新的状态,因为是version变化,不做订阅者通知
53 applyNewStates(ep, localEpStatePtr, remoteState);
54 }
55 else if (logger.isTraceEnabled())
56 logger.trace( "Ignoring remote version {} <= {} for {}" , remoteMaxVersion, localMaxVersion, ep);
57
58 if (!localEpStatePtr.isAlive() && !isDeadState(localEpStatePtr)) // unless of course, it was dead
59 markAlive(ep, localEpStatePtr);
60 }
61 else
62 {
63 if (logger.isTraceEnabled())
64 logger.trace( "Ignoring remote generation {} < {}" , remoteGeneration, localGeneration);
65 }
66 }
67 else
68 {
69 // this is a new node, report it to the FD in case it is the first time we are seeing it AND it's not alive
70 FailureDetector.instance.report(ep);
71 //通知订阅者有新节点加入
72 handleMajorStateChange(ep, remoteState);
73 }
74 }
75
76 boolean any30 = anyEndpointOn30();
77 if (any30 != anyNodeOn30)
78 {
79 logger.info(any30
80 ? "There is at least one 3.0 node in the cluster - will store and announce compatible schema version"
81 : "There are no 3.0 nodes in the cluster - will store and announce real schema version" );
82
83 anyNodeOn30 = any30;
84 executor.submit(Schema.instance::updateVersionAndAnnounce);
85 }
86 }
到这里Gossip 三次握手的全过程就分析完了。
点击这里→了解更多精彩内容