Leader选举又称为master选举是zookeeper中最为经典的应用场景了。那为什么需要leader 选举呢。
ZooKeeper需要在所有的服务(可理解为服务器)中选举出一个Leader,然后让这个Leader来负责管理集群。此时,集群中的其他服务器则成了此Leader的follower。并且,当Leader出现故障的时候,ZooKeeper要能够快速地在Follower中选举出下一个Leader。这就是ZooKeeper的Leader机制,下面我们将简单介绍如何使用ZooKeeper实现Leader选举(Leader Election)。
此操作实现的核心思想是:首先创建一个EPHEMERAL的节点,例如"/election"。然后每一个ZooKeeper服务器在此目录下创建一个SEQUENCE|EPHEMERAL类型的节点,例如“/election/n_”。在SEQUENCE标志下,ZooKeeper将自动地为每一个ZooKeeper服务分配一个比前面所分配的序号要大的序号。此时创建节点ZooKeeper服务器中拥有最小编号的服务器将成为Leader。
在实际的操作中,还需要保证:当Leader服务器发生故障的时候,系统能够快速地选出下一个ZooKeeper服务器作为Leader。一个简单的方案是,让所有的Follower监视leader所对应的节点。当Leader发生故障时,Leader所对应的临时节点会被自动删除,此操作将会触发所有监视Leader的服务器的watch。这样这些服务器就会收到Leader故障的消息,进而进行下一次的Leader选举操作。但是,这种操作将会导致“从众效应”的发生,尤其是当集群中服务器众多并且宽带延迟比较大的时候更为明显。在ZooKeeper中,为了避免从众效应的发生,它是这样来实现的:每一个Follower为Follower集群中对应着比自己节点序号小的节点中x序号最大的节点设置一个watch。只有当Followers所设置的watch被触发时,它才惊醒Leader选举操作,一般情况下它将成为集群中的下一个Leader。很明显,此Leader选举操作的速度是很快的。因为每一次Leader选举几乎只涉及单个Follower的操作。
下面我们看下源码是怎么实现的 org.apache.zookeeper.recipes.leader.LeaderElectionSupport
具体的实现逻辑在这个类中 。首先有一个start 方法 我们来看下 在这个方法中
可以看到 首先调用了 makeOffer();然后是 determineElectionStatus();
/**
*选举的开始方法
*/
public synchronized void start() {
state = State.START;
// 广播选举开始
dispatchEvent(EventType.START);
LOG.info("Starting leader election support");
if (zooKeeper == null) {
throw new IllegalStateException(
"No instance of zookeeper provided. Hint: use setZooKeeper()");
}
if (hostName == null) {
throw new IllegalStateException(
"No hostname provided. Hint: use setHostName()");
}
try {
makeOffer();
determineElectionStatus();
} catch (KeeperException | InterruptedException e) {
becomeFailed(e);
}
}
我们一起来看下 makeOffer()方法,这个方法主要就是创建临时节点
/**
* 真正开始选举的方法 在root 目录下创建节点
* @throws KeeperException
* @throws InterruptedException
*/
private void makeOffer() throws KeeperException, InterruptedException {
state = State.OFFER;
dispatchEvent(EventType.OFFER_START);
LeaderOffer newLeaderOffer = new LeaderOffer();
byte[] hostnameBytes;
synchronized (this) {
newLeaderOffer.setHostName(hostName);
hostnameBytes = hostName.getBytes();
newLeaderOffer.setNodePath(zooKeeper.create(rootNodeName + "/" + "n_",
hostnameBytes, ZooDefs.Ids.OPEN_ACL_UNSAFE,
// 零时节点
CreateMode.EPHEMERAL_SEQUENTIAL));
leaderOffer = newLeaderOffer;
}
LOG.debug("Created leader offer {}", leaderOffer);
dispatchEvent(EventType.OFFER_COMPLETE);
}
然后就是 determineElectionStatus()
这个方法获取文件列表下面所有的文件最小的那个设置为leader 其他的节点添加对上一个的监听
/**
*
* 选出最小序号的文件 对应的机器就是leader
* @throws KeeperException
* @throws InterruptedException
*/
private void determineElectionStatus() throws KeeperException, InterruptedException {
state = State.DETERMINE;
dispatchEvent(EventType.DETERMINE_START);
LeaderOffer currentLeaderOffer = getLeaderOffer();
String[] components = currentLeaderOffer.getNodePath().split("/");
currentLeaderOffer.setId(Integer.valueOf(components[components.length - 1].substring("n_".length())));
List leaderOffers = toLeaderOffers(zooKeeper.getChildren(rootNodeName, false));
/*
* For each leader offer, find out where we fit in. If we're first, we
* become the leader. If we're not elected the leader, attempt to stat the
* offer just less than us. If they exist, watch for their failure, but if
* they don't, become the leader.
*/
for (int i = 0; i < leaderOffers.size(); i++) {
LeaderOffer leaderOffer = leaderOffers.get(i);
if (leaderOffer.getId().equals(currentLeaderOffer.getId())) {
LOG.debug("There are {} leader offers. I am {} in line.", leaderOffers.size(), i);
dispatchEvent(EventType.DETERMINE_COMPLETE);
if (i == 0) {
// 最小的那个变成leader
becomeLeader();
} else {
// 其他的是非leader
becomeReady(leaderOffers.get(i - 1));
}
/* Once we've figured out where we are, we're done. */
break;
}
}
}
如果没有成为leader 的节点监听上一个节点 如果上一个节点故障了 则重新执行上面的方法
private void becomeReady(LeaderOffer neighborLeaderOffer)
throws KeeperException, InterruptedException {
LOG.info(
"{} not elected leader. Watching node: {}",
getLeaderOffer().getNodePath(),
neighborLeaderOffer.getNodePath());
/*
* Make sure to pass an explicit Watcher because we could be sharing this
* zooKeeper instance with someone else.
*/
/**
*
* 进行watch,监视上一个节点 如果上一个节点删除了 就重新掉用determineElectionStatus
*/
Stat stat = zooKeeper.exists(neighborLeaderOffer.getNodePath(), this);
if (stat != null) {
dispatchEvent(EventType.READY_START);
LOG.debug(
"We're behind {} in line and they're alive. Keeping an eye on them.",
neighborLeaderOffer.getNodePath());
state = State.READY;
dispatchEvent(EventType.READY_COMPLETE);
} else {
/*
* If the stat fails, the node has gone missing between the call to
* getChildren() and exists(). We need to try and become the leader.
*/
LOG.info(
"We were behind {} but it looks like they died. Back to determination.",
neighborLeaderOffer.getNodePath());
determineElectionStatus();
}
}
更多的注释可以看这里
https://github.com/haha174/zookeeper/commit/1174717483578074654bbc6a8a1e4744b9c255a9