ES选举-类Bully算法

Bully算法

bully算法是一个分布式系统中动态选择master节点的算法,进程号最大的非失效的节点将被选为master。
算法用三种消息类型:

1)选举消息 (Election Message: Sent to announce election.)
2)应答消息(Answer (Alive) Message: Responds to the Election message.)
3)选举成功消息 (Coordinator (Victory) Message: Sent by winner of the election to announce victory.)

当一个进程P从失败中恢复,或者接收到主节点失效信息,进程P将做以下事情:

1)如果进程P有最大的进程ID,那么它则会向其他节点广播Coordinator (Victory) Message。否则进程P向进程号比它大的进程发送Election Message
2)如果进程P发送Election Message后,没有接收到应答,它就会向其他节点广播Coordinator (Victory) Message,并成为master。
3)如果进程P接收到比它进程号更高的进程的Answer (Alive) Message信息,那么它这次的选举就失败了,等待接收其他节点的Coordinator (Victory) Message。
4)如果进程P接收到比它进程号低的进程的Election message,那么它会向该节点返回一个Answer (Alive) Message,并启动选举进程,并向比它更高的进程发送Election Message。
5)如果进程P接收到Coordinator (Victory) Message,那么它就会把发送这条消息的节点看作为master进程。

ES Master选举过程

我看的源码是5.6版本的。因此以下的解释都是依据5.6的源码来说的。
当master节点失效之后,各个节点find master的过程如下:
1)每个节点会ping配置文件中discovery.zen.ping.unicast.hosts的IP,找到所有存活的节点并过滤
2)找到非本身的active master

  List activeMasters = new ArrayList<>();
    for (ZenPing.PingResponse pingResponse : pingResponses) {
        // We can't include the local node in pingMasters list, otherwise we may up electing ourselves without
        // any check / verifications from other nodes in ZenDiscover#innerJoinCluster()
        if (pingResponse.master() != null && !localNode.equals(pingResponse.master())) {
            activeMasters.add(pingResponse.master());
        }
    }

3)找到所有的可成为master的节点集合masterCandidates ,包含自己

 // nodes discovered during pinging
    List masterCandidates = new ArrayList<>();
    for (ZenPing.PingResponse pingResponse : pingResponses) {
        if (pingResponse.node().isMasterNode()) {
            masterCandidates.add(new ElectMasterService.MasterCandidate(pingResponse.node(), pingResponse.getClusterStateVersion()));
        }
    }

4)如果activeMasters 为空,也就是说不存在活着的master节点,同时当前活着的节点满足配置中discovery.zen.minimum_master_nodes的数量,那么就从masterCandidates 挑出ID最小的节点,让其成为master节点。如果activeMasters 不为空,则从中选择最小的ID成为Master节点即可。

if (activeMasters.isEmpty()) {
        if (electMaster.hasEnoughCandidates(masterCandidates)) {
            final ElectMasterService.MasterCandidate winner = electMaster.electMaster(masterCandidates);
            logger.trace("candidate {} won election", winner);
            return winner.getNode();
        } else {
            // if we don't have enough master nodes, we bail, because there are not enough master to elect from
            logger.warn("not enough master nodes discovered during pinging (found [{}], but needed [{}]), pinging again",
                        masterCandidates, electMaster.minimumMasterNodes());
            return null;
        }
    } else {
        assert !activeMasters.contains(localNode) : "local node should never be elected as master when other nodes indicate an active master";
        // lets tie break between discovered nodes
        return electMaster.tieBreakActiveMasters(activeMasters);
    }

electMaster.electMaster方法和electMaster.tieBreakActiveMasters方法则都是从集合中选取最小节点的ID:

  public MasterCandidate electMaster(Collection candidates) {
    assert hasEnoughCandidates(candidates);
    List sortedCandidates = new ArrayList<>(candidates);
    sortedCandidates.sort(MasterCandidate::compare);
    return sortedCandidates.get(0);
}
public DiscoveryNode tieBreakActiveMasters(Collection activeMasters) {
    return activeMasters.stream().min(ElectMasterService::compareNodes).get();
}

如果当前不存在active master,那么activeMasters 一定为空,则从masterCandidates 从选出id最小的节点即可。
如果当前存在active master,且当前节点不是active maste,那么从activeMasters 中选出id最小的节点。
如果当前存在active master,且当前节点是active maste,那么activeMasters 为空,从masterCandidates 中选出id最小的节点即自己。

在我的感觉中,当前active master的个数要么为空,要么为1,这边不知道为什么要用一个链表。。。为了防止脑裂情况出现吗??

你可能感兴趣的:(ES选举-类Bully算法)