Leader选举又称为master选举是zookeeper中最为经典的应用场景了。
在分布式环境中,相同的业务应用分布在不同的机器上,有些业务逻辑(例如一些耗时的计算,网络I/O处理),往往只需要让整个集群中的某一台机器进行执行,其余机器可以共享这个结果,这样可以大大减少重复劳动,提高性能,于是这Leader选举便是这种场景下的碰到的主要问题。
ZooKeeper需要在所有的服务(可理解为服务器)中选举出一个Leader,然后让这个Leader来负责管理集群。此时,集群中的其他服务器则成了此Leader的follower。并且,当Leader出现故障的时候,ZooKeeper要能够快速地在Follower中选举出下一个Leader。这就是ZooKeeper的Leader机制,下面我们将简单介绍如何使用ZooKeeper实现Leader选举(Leader Election)。
此操作实现的核心思想是:首先创建一个EPHEMERAL的节点,例如"/election"。然后每一个ZooKeeper服务器在此目录下创建一个SEQUENCE|EPHEMERAL类型的节点,例如“/election/n_”。在SEQUENCE标志下,ZooKeeper将自动地为每一个ZooKeeper服务分配一个比前面所分配的序号要大的序号。此时创建节点ZooKeeper服务器中拥有最小编号的服务器将成为Leader。
在实际的操作中,还需要保证:当Leader服务器发生故障的时候,系统能够快速地选出下一个ZooKeeper服务器作为Leader。一个简单的方案是,让所有的Follower监视leader所对应的节点。当Leader发生故障时,Leader所对应的临时节点会被自动删除,此操作将会触发所有监视Leader的服务器的watch。这样这些服务器就会收到Leader故障的消息,进而进行下一次的Leader选举操作。但是,这种操作将会导致“从众效应”的发生,尤其是当集群中服务器众多并且宽带延迟比较大的时候更为明显。在ZooKeeper中,为了避免从众效应的发生,它是这样来实现的:每一个Follower为Follower集群中对应着比自己节点序号小的节点中x序号最大的节点设置一个watch。只有当Followers所设置的watch被触发时,它才惊醒Leader选举操作,一般情况下它将成为集群中的下一个Leader。很明显,此Leader选举操作的速度是很快的。因为每一次Leader选举几乎只涉及单个Follower的操作。
总结:集群管理中所有客户端创建请求,最终只有一个能够创建成功。在这里稍微变化下,就是允许所有请求都能够创建成功,但是得有个创建顺序,于是所有的请求最终在ZK上创建结果的一种可能情况是这样:/currentMaster/{sessionId}-1,?/currentMaster/{sessionId}-2 ,?/currentMaster/{sessionId}-3 ….. 每次选取序列号最小的那个机器作为Master,如果这个机器挂了,由于他创建的节点会马上消失,那么之后最小的那个机器就是Master了。
在搜索系统中,如果集群中每个机器都生成一份全量索引,不仅耗时,而且不能保证彼此之间索引数据一致。因此让集群中的Master来进行全量索引的生成,然后同步到集群中其它机器。另外,Master选举的容灾措施是,可以随时进行手动指定master,就是说应用在zk在无法获取master信息时,可以通过比如http方式,向一个地方获取master。
在Hbase中,也是使用ZooKeeper来实现动态HMaster的选举。在Hbase实现中,会在ZK上存储一些ROOT表的地址和HMaster的地址,HRegionServer也会把自己以临时节点(Ephemeral)的方式注册到Zookeeper中,使得HMaster可以随时感知到各个HRegionServer的存活状态,同时,一旦HMaster出现问题,会重新选举出一个HMaster来运行,从而避免了HMaster的单点问题。
下面的例子来自:zookeeper-3.4.*/recipes
一、选举状态的变化
1、EVENT.START
2、makeOffer() :OFFER_START、OFFER_COMPLETE;进行投票准备
3、determineElectionStatus() DETERMINE_START、DETERMINE_COMPLETE;进行投票
投票结果:becomeLeader()或becomeReady()
二、becomReady方法比较重要
1) Stat stat = zooKeeper.exists(neighborLeaderOffer.getNodePath(), this); 进行watch,见watch方法
2)如果stat为null,就进行determineElectionStatus()
三 LeaderEelctionAware接口的实现
onElectionEvent(EventType)的实现,需要根据eventtype类型不同,进行不同的操作。例如:makeoffer时,可以发送消息向各个client,进行投票准备,而如何发送消息需要自己实现(个人看法)。
package org.apache.zookeeper.recipes.leader;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import org.apache.zookeeper.CreateMode;
import org.apache.zookeeper.KeeperException;
import org.apache.zookeeper.WatchedEvent;
import org.apache.zookeeper.Watcher;
import org.apache.zookeeper.ZooDefs;
import org.apache.zookeeper.ZooKeeper;
import org.apache.zookeeper.data.Stat;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
/**
*
* A leader election support library implementing the ZooKeeper election recipe.
*
*
* This support library is meant to simplify the construction of an exclusive
* leader system on top of Apache ZooKeeper. Any application that can become the
* leader (usually a process that provides a service, exclusively) would
* configure an instance of this class with their hostname, at least one
* listener (an implementation of {@link LeaderElectionAware}), and either an
* instance of {@link ZooKeeper} or the proper connection information. Once
* configured, invoking {@link #start()} will cause the client to connect to
* ZooKeeper and create a leader offer. The library then determines if it has
* been elected the leader using the algorithm described below. The client
* application can follow all state transitions via the listener callback.
*
*
* Leader election algorithm
*
*
* The library starts in a START state. Through each state transition, a state
* start and a state complete event are sent to all listeners. When
* {@link #start()} is called, a leader offer is created in ZooKeeper. A leader
* offer is an ephemeral sequential node that indicates a process that can act
* as a leader for this service. A read of all leader offers is then performed.
* The offer with the lowest sequence number is said to be the leader. The
* process elected leader will transition to the leader state. All other
* processes will transition to a ready state. Internally, the library creates a
* ZooKeeper watch on the leader offer with the sequence ID of N - 1 (where N is
* the process's sequence ID). If that offer disappears due to a process
* failure, the watching process will run through the election determination
* process again to see if it should become the leader. Note that sequence ID
* may not be contiguous due to failed processes. A process may revoke its offer
* to be the leader at any time by calling {@link #stop()}.
*
*
* Guarantees (not) Made and Caveats
*
*
*
* - It is possible for a (poorly implemented) process to create a leader
* offer, get the lowest sequence ID, but have something terrible occur where it
* maintains its connection to ZK (and thus its ephemeral leader offer node) but
* doesn't actually provide the service in question. It is up to the user to
* ensure any failure to become the leader - and whatever that means in the
* context of the user's application - results in a revocation of its leader
* offer (i.e. that {@link #stop()} is called).
* - It is possible for ZK timeouts and retries to play a role in service
* liveliness. In other words, if process A has the lowest sequence ID but
* requires a few attempts to read the other leader offers' sequence IDs,
* election can seem slow. Users should apply timeouts during the determination
* process if they need to hit a specific SLA.
* - The library makes a "best effort" to detect catastrophic failures of the
* process. It is possible that an unforeseen event results in (for instance) an
* unchecked exception that propagates passed normal error handling code. This
* normally doesn't matter as the same exception would almost certain destroy
* the entire process and thus the connection to ZK and the leader offer
* resulting in another round of leader determination.
*
*
*/
public class LeaderElectionSupport implements Watcher {
private static final Logger logger = LoggerFactory
.getLogger(LeaderElectionSupport.class);
private ZooKeeper zooKeeper;
private State state;
private Set listeners;
private String rootNodeName;
private LeaderOffer leaderOffer;
private String hostName;
public LeaderElectionSupport() {
state = State.STOP;
listeners = Collections.synchronizedSet(new HashSet());
}
/**
*
* Start the election process. This method will create a leader offer,
* determine its status, and either become the leader or become ready. If an
* instance of {@link ZooKeeper} has not yet been configured by the user, a
* new instance is created using the connectString and sessionTime specified.
*
*
* Any (anticipated) failures result in a failed event being sent to all
* listeners.
*
*/
public synchronized void start() {
state = State.START;
dispatchEvent(EventType.START);
logger.info("Starting leader election support");
if (zooKeeper == null) {
throw new IllegalStateException(
"No instance of zookeeper provided. Hint: use setZooKeeper()");
}
if (hostName == null) {
throw new IllegalStateException(
"No hostname provided. Hint: use setHostName()");
}
try {
makeOffer();
determineElectionStatus();
} catch (KeeperException e) {
becomeFailed(e);
return;
} catch (InterruptedException e) {
becomeFailed(e);
return;
}
}
/**
* Stops all election services, revokes any outstanding leader offers, and
* disconnects from ZooKeeper.
*/
public synchronized void stop() {
state = State.STOP;
dispatchEvent(EventType.STOP_START);
logger.info("Stopping leader election support");
if (leaderOffer != null) {
try {
zooKeeper.delete(leaderOffer.getNodePath(), -1);
logger.info("Removed leader offer {}", leaderOffer.getNodePath());
} catch (InterruptedException e) {
becomeFailed(e);
} catch (KeeperException e) {
becomeFailed(e);
}
}
dispatchEvent(EventType.STOP_COMPLETE);
}
private void makeOffer() throws KeeperException, InterruptedException {
state = State.OFFER;
dispatchEvent(EventType.OFFER_START);
leaderOffer = new LeaderOffer();
leaderOffer.setHostName(hostName);
leaderOffer.setNodePath(zooKeeper.create(rootNodeName + "/" + "n_",
hostName.getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE,
CreateMode.EPHEMERAL_SEQUENTIAL));
logger.debug("Created leader offer {}", leaderOffer);
dispatchEvent(EventType.OFFER_COMPLETE);
}
private void determineElectionStatus() throws KeeperException,
InterruptedException {
state = State.DETERMINE;
dispatchEvent(EventType.DETERMINE_START);
String[] components = leaderOffer.getNodePath().split("/");
leaderOffer.setId(Integer.valueOf(components[components.length - 1]
.substring("n_".length())));
List leaderOffers = toLeaderOffers(zooKeeper.getChildren(
rootNodeName, false));
/*
* For each leader offer, find out where we fit in. If we're first, we
* become the leader. If we're not elected the leader, attempt to stat the
* offer just less than us. If they exist, watch for their failure, but if
* they don't, become the leader.
*/
for (int i = 0; i < leaderOffers.size(); i++) {
LeaderOffer leaderOffer = leaderOffers.get(i);
if (leaderOffer.getId().equals(this.leaderOffer.getId())) {
logger.debug("There are {} leader offers. I am {} in line.",
leaderOffers.size(), i);
dispatchEvent(EventType.DETERMINE_COMPLETE);
if (i == 0) {
becomeLeader();
} else {
becomeReady(leaderOffers.get(i - 1));
}
/* Once we've figured out where we are, we're done. */
break;
}
}
}
private void becomeReady(LeaderOffer neighborLeaderOffer)
throws KeeperException, InterruptedException {
dispatchEvent(EventType.READY_START);
logger.info("{} not elected leader. Watching node:{}",
leaderOffer.getNodePath(), neighborLeaderOffer.getNodePath());
/*
* Make sure to pass an explicit Watcher because we could be sharing this
* zooKeeper instance with someone else.
*/
Stat stat = zooKeeper.exists(neighborLeaderOffer.getNodePath(), this);
if (stat != null) {
logger.debug(
"We're behind {} in line and they're alive. Keeping an eye on them.",
neighborLeaderOffer.getNodePath());
state = State.READY;
dispatchEvent(EventType.READY_COMPLETE);
} else {
/*
* If the stat fails, the node has gone missing between the call to
* getChildren() and exists(). We need to try and become the leader.
*/
logger
.info(
"We were behind {} but it looks like they died. Back to determination.",
neighborLeaderOffer.getNodePath());
determineElectionStatus();
}
}
private void becomeLeader() {
state = State.ELECTED;
dispatchEvent(EventType.ELECTED_START);
logger.info("Becoming leader with node:{}", leaderOffer.getNodePath());
dispatchEvent(EventType.ELECTED_COMPLETE);
}
private void becomeFailed(Exception e) {
logger.error("Failed in state {} - Exception:{}", state, e);
state = State.FAILED;
dispatchEvent(EventType.FAILED);
}
/**
* Fetch the (user supplied) hostname of the current leader. Note that by the
* time this method returns, state could have changed so do not depend on this
* to be strongly consistent. This method has to read all leader offers from
* ZooKeeper to deterime who the leader is (i.e. there is no caching) so
* consider the performance implications of frequent invocation. If there are
* no leader offers this method returns null.
*
* @return hostname of the current leader
* @throws KeeperException
* @throws InterruptedException
*/
public String getLeaderHostName() throws KeeperException,
InterruptedException {
List leaderOffers = toLeaderOffers(zooKeeper.getChildren(
rootNodeName, false));
if (leaderOffers.size() > 0) {
return leaderOffers.get(0).getHostName();
}
return null;
}
private List toLeaderOffers(List strings)
throws KeeperException, InterruptedException {
List leaderOffers = new ArrayList(strings.size());
/*
* Turn each child of rootNodeName into a leader offer. This is a tuple of
* the sequence number and the node name.
*/
for (String offer : strings) {
String hostName = new String(zooKeeper.getData(
rootNodeName + "/" + offer, false, null));
leaderOffers.add(new LeaderOffer(Integer.valueOf(offer.substring("n_"
.length())), rootNodeName + "/" + offer, hostName));
}
/*
* We sort leader offers by sequence number (which may not be zero-based or
* contiguous) and keep their paths handy for setting watches.
*/
Collections.sort(leaderOffers, new LeaderOffer.IdComparator());
return leaderOffers;
}
@Override
public void process(WatchedEvent event) {
if (event.getType().equals(Watcher.Event.EventType.NodeDeleted)) {
if (!event.getPath().equals(leaderOffer.getNodePath())
&& state != State.STOP) {
logger.debug(
"Node {} deleted. Need to run through the election process.",
event.getPath());
try {
determineElectionStatus();
} catch (KeeperException e) {
becomeFailed(e);
} catch (InterruptedException e) {
becomeFailed(e);
}
}
}
}
private void dispatchEvent(EventType eventType) {
logger.debug("Dispatching event:{}", eventType);
synchronized (listeners) {
if (listeners.size() > 0) {
for (LeaderElectionAware observer : listeners) {
observer.onElectionEvent(eventType);
}
}
}
}
/**
* Adds {@code listener} to the list of listeners who will receive events.
*
* @param listener
*/
public void addListener(LeaderElectionAware listener) {
listeners.add(listener);
}
/**
* Remove {@code listener} from the list of listeners who receive events.
*
* @param listener
*/
public void removeListener(LeaderElectionAware listener) {
listeners.remove(listener);
}
@Override
public String toString() {
return "{ state:" + state + " leaderOffer:" + leaderOffer + " zooKeeper:"
+ zooKeeper + " hostName:" + hostName + " listeners:" + listeners
+ " }";
}
/**
*
* Gets the ZooKeeper root node to use for this service.
*
*
* For instance, a root node of {@code /mycompany/myservice} would be the
* parent of all leader offers for this service. Obviously all processes that
* wish to contend for leader status need to use the same root node. Note: We
* assume this node already exists.
*
*
* @return a znode path
*/
public String getRootNodeName() {
return rootNodeName;
}
/**
*
* Sets the ZooKeeper root node to use for this service.
*
*
* For instance, a root node of {@code /mycompany/myservice} would be the
* parent of all leader offers for this service. Obviously all processes that
* wish to contend for leader status need to use the same root node. Note: We
* assume this node already exists.
*
*/
public void setRootNodeName(String rootNodeName) {
this.rootNodeName = rootNodeName;
}
/**
* The {@link ZooKeeper} instance to use for all operations. Provided this
* overrides any connectString or sessionTimeout set.
*/
public ZooKeeper getZooKeeper() {
return zooKeeper;
}
public void setZooKeeper(ZooKeeper zooKeeper) {
this.zooKeeper = zooKeeper;
}
/**
* The hostname of this process. Mostly used as a convenience for logging and
* to respond to {@link #getLeaderHostName()} requests.
*/
public String getHostName() {
return hostName;
}
public void setHostName(String hostName) {
this.hostName = hostName;
}
/**
* The type of event.
*/
public static enum EventType {
START, OFFER_START, OFFER_COMPLETE, DETERMINE_START, DETERMINE_COMPLETE, ELECTED_START, ELECTED_COMPLETE, READY_START, READY_COMPLETE, FAILED, STOP_START, STOP_COMPLETE,
}
/**
* The internal state of the election support service.
*/
public static enum State {
START, OFFER, DETERMINE, ELECTED, READY, FAILED, STOP
}
}
package org.apache.zookeeper.recipes.leader;
import java.util.Comparator;
/**
* A leader offer is a numeric id / path pair. The id is the sequential node id
* assigned by ZooKeeper where as the path is the absolute path to the ZNode.
*/
public class LeaderOffer {
private Integer id;
private String nodePath;
private String hostName;
public LeaderOffer() {
// Default constructor
}
public LeaderOffer(Integer id, String nodePath, String hostName) {
this.id = id;
this.nodePath = nodePath;
this.hostName = hostName;
}
@Override
public String toString() {
return "{ id:" + id + " nodePath:" + nodePath + " hostName:" + hostName
+ " }";
}
public Integer getId() {
return id;
}
public void setId(Integer id) {
this.id = id;
}
public String getNodePath() {
return nodePath;
}
public void setNodePath(String nodePath) {
this.nodePath = nodePath;
}
public String getHostName() {
return hostName;
}
public void setHostName(String hostName) {
this.hostName = hostName;
}
/**
* Compare two instances of {@link LeaderOffer} using only the {code}id{code}
* member.
*/
public static class IdComparator implements Comparator {
@Override
public int compare(LeaderOffer o1, LeaderOffer o2) {
return o1.getId().compareTo(o2.getId());
}
}
}
package org.apache.zookeeper.recipes.leader;
import org.apache.zookeeper.recipes.leader.LeaderElectionSupport.EventType;
/**
* An interface to be implemented by clients that want to receive election
* events.
*/
public interface LeaderElectionAware {
/**
* Called during each state transition. Current, low level events are provided
* at the beginning and end of each state. For instance, START may be followed
* by OFFER_START, OFFER_COMPLETE, DETERMINE_START, DETERMINE_COMPLETE, and so
* on.
*
* @param eventType
*/
public void onElectionEvent(EventType eventType);
}