集群是为一组互联的完整计算机,一起作为一个统一的计算资源而工作,给人以一台机器的感觉。
集群有三大优点,所以很多系统都以集群的形式出现:
elasticsearch以集群的形式服务也是基于上面原因。
分布式集群业界一般有两种模型,中心化集群和去中心化集群。
elasticsearch集群属于中心化集群。
主要包含了es的gateway模块(元数据读写)、cluster模块(集群状态处理)、discovery模块(集群发现服务)三大模块完成集群功能。
流程解析:
在集群节点启动时,先初始化nodeid表明自己身份。优先loadstate导入元数据信息,在clusterstate中解析出nodeid;如果没有clusterstate元数据则直接uuid生成nodeid。
/**
* scans the node paths and loads existing metaData file. If not found a new meta data will be generated
* and persisted into the nodePaths
*/
private static NodeMetaData loadOrCreateNodeMetaData(Settings settings, Logger logger,
NodePath... nodePaths) throws IOException {
final Path[] paths = Arrays.stream(nodePaths).map(np -> np.path).toArray(Path[]::new);
NodeMetaData metaData = NodeMetaData.FORMAT.loadLatestState(logger, NamedXContentRegistry.EMPTY, paths);
if (metaData == null) {
metaData = new NodeMetaData(generateNodeId(settings));
}
// we write again to make sure all paths have the latest state file
NodeMetaData.FORMAT.write(metaData, paths);
return metaData;
}
在node类启动时候通过ZenDiscovery.innerJoinCluster加入集群。该方法保证加入群集或失败后生成一个新的连接线程尝试加入集群。下面是该方法解析:
1、findMaster。寻找自己所属集群的master。
①首先通过pingAndWait同步获得所有ping得到的节点fullPingResponses、加上自己本地节点
java
List
...
fullPingResponses.add(new ZenPing.PingResponse(localNode, null, clusterService.state()));
②然后进行初步过滤得到pingResponses(如果我们启用了discovery.zen.master_election.ignore_non_master_pings则就会把那些node.master = false那些节点都忽略掉)
“`java
static List
// nodes discovered during pinging
List masterCandidates = new ArrayList<>();
for (ZenPing.PingResponse pingResponse : pingResponses) {
if (pingResponse.node().isMasterNode()) {
masterCandidates.add(new ElectMasterService.MasterCandidate(pingResponse.node(), pingResponse.getClusterStateVersion()));
}
}
if (activeMasters.isEmpty()) {
if (electMaster.hasEnoughCandidates(masterCandidates)) {
final ElectMasterService.MasterCandidate winner = electMaster.electMaster(masterCandidates);
logger.trace("candidate {} won election", winner);
return winner.getNode();
} else {
// if we don't have enough master nodes, we bail, because there are not enough master to elect from
logger.warn("not enough master nodes discovered during pinging (found [{}], but needed [{}]), pinging again",
masterCandidates, electMaster.minimumMasterNodes());
return null;
}
} else {
assert !activeMasters.contains(localNode) : "local node should never be elected as master when other nodes indicate an active master";
// lets tie break between discovered nodes
return electMaster.tieBreakActiveMasters(activeMasters);
}
2、判断找到的master是否是自己节点
①如果是本节点则waitToBeElectedAsMaster(wait的原因是discovery.zen.minimum_master_nodes,一般配置为集群数量的一半防止脑裂,等待足够的节点join加入认同你为master,选举结束),然后启动nodesFD.updateNodesAndPing
②如果不是则发送join到master(joinElectedMaster:node join master、master反connect该node),加入不成功则重新启动加入流程(startNewThreadIfNotRunning)。
if (clusterService.localNode().equals(masterNode)) {
... nodeJoinController.waitToBeElectedAsMaster(requiredJoins, masterElectionWaitForJoinsTimeout,
new NodeJoinController.ElectionCallback() {
@Override
public void onElectedAsMaster(ClusterState state) {
joinThreadControl.markThreadAsDone(currentThread);
...
nodesFD.updateNodesAndPing(state); // start the nodes FD
}
@Override
public void onFailure(Throwable t) {
...
joinThreadControl.markThreadAsDoneAndStartNew(currentThread);
}
}
);
} else {
...
// send join request
final boolean success = joinElectedMaster(masterNode);
// finalize join through the cluster state update thread
final DiscoveryNode finalMasterNode = masterNode;
clusterService.submitStateUpdateTask("finalize_join (" + masterNode + ")", new LocalClusterUpdateTask() {
@Override
public ClusterTasksResult execute(ClusterState currentState) throws Exception {
if (!success) {
// failed to join. Try again...
joinThreadControl.markThreadAsDoneAndStartNew(currentThread);
return unchanged();
}
...
}
@Override
public void onFailure(String source, @Nullable Exception e) {
...
joinThreadControl.markThreadAsDoneAndStartNew(currentThread);
}
});
}
在加入集群之后都执行了clusterService.submitStateUpdateTask,提交状态变化。主要做了以下一些事情:
发布状态
// if we are the master, publish the new state to all nodes
// we publish here before we send a notification to all the listeners, since if it fails
// we don't want to notify
//作为master,发布ClusterState到各个node节点
if (newClusterState.nodes().isLocalNodeElectedMaster()) {
logger.debug("publishing cluster state version [{}]", newClusterState.version());
try {
clusterStatePublisher.accept(clusterChangedEvent, ackListener);
} catch (Discovery.FailedToCommitClusterStateException t) {
final long version = newClusterState.version();
logger.warn(
(Supplier>) () -> new ParameterizedMessage(
"failing [{}]: failed to commit cluster state version [{}]", taskInputs.summary, version),
t);
// ensure that list of connected nodes in NodeConnectionsService is in-sync with the nodes of the current cluster state
nodeConnectionsService.connectToNodes(previousClusterState.nodes());
nodeConnectionsService.disconnectFromNodesExcept(previousClusterState.nodes());
taskOutputs.publishingFailed(t);
return;
}
}
state元数据读写参照elasticsearch源码分析之Gateway(六)
错误检测分为两种:MasterFaultDetection和NodeFaultDetection。
这样通过以上几个关键步骤完成了集群建设容错框架,具体集群的不同角色会提供不同服务功能。
参考
《操作系统第六版》
《分布式系统原理介绍》
es源码