趣味的解释一下,如果把一个ES集群,比喻成王朝的话,3个状态,是这样的
绿色,太平盛世,国家一片大好
黄色,奸臣当道,国家危在旦夕
红色,皇上不上朝,是可忍孰不可忍
绿色的话,男耕女织,该干啥干啥,就不用管了,黄色的话,哪个王朝没有奸臣啊,也可以忍了。但是如果是红色的话,很严重,非常严重,基本上等一会儿集群就能恢复过来了。好了,已经有一个感性的认识了,那到底是咋回事呢?
绿色,一切正常
黄色,副本丢失
红色,主分片丢失
看到这里豁然开朗,就这么简单啊,我明白了, 但是等下,先别关闭博客,作为一个码农,有追求的码农,能就这么容易被糊弄过去吗? 必须看到代码,才是真理,代码才是最真实的。如果同学们不满足于比喻,那我们继续,我们要来真的了。
curl http://localhost:9200/_cluster/health?pretty=true { "cluster_name" : "mycluster", "status" : "green", "timed_out" : false, "number_of_nodes" : 4, "number_of_data_nodes" : 4, "active_primary_shards" : 778, "active_shards" : 1556, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0 }
返回的这些信息,都是什么意思呢? 打开IDE一顿找。。。。最后
集群状态
public enum ClusterHealthStatus { GREEN((byte) 0), YELLOW((byte) 1), RED((byte) 2); //....
分片路由表状态
public enum ShardRoutingState { /** * 分片没有被分配到任意节点. */ UNASSIGNED((byte) 1), /** * 分片正在初始化 (可能从一个分片或者时间之门正在恢复 ). */ INITIALIZING((byte) 2), /** * 分片已经开始. */ STARTED((byte) 3), /** * 分片正在迁移. */ RELOCATING((byte) 4); //....
分片路由表
public class ImmutableShardRouting implements Streamable, Serializable, ShardRouting { //... @Override public boolean unassigned() { return state == ShardRoutingState.UNASSIGNED; } @Override public boolean initializing() { return state == ShardRoutingState.INITIALIZING; } @Override public boolean active() { return started() || relocating(); } @Override public boolean started() { return state == ShardRoutingState.STARTED; } @Override public boolean relocating() { return state == ShardRoutingState.RELOCATING; } //...
看到这里,貌似有点明白了,开始了或者迁移中的分片,就是活动分片,恰好它是主分片,那就是活动主分片。原来如此,但是同学们又要说了,这些个只是实体类,充其量都是些小喽啰啊。别着急,咱们继续看。
TransportClusterHealthAction类的clusterHealth()方法负责集群健康的计算,它还从它的父类,继承了优良的传统,在Master节点上执行这些操作,如你没有往Master节点发送这个请求,没关系,它会替你转发。前面会做一些个等待信息的处理,我们暂且不关心,直奔主题。
private ClusterHealthResponse clusterHealth(ClusterHealthRequest request, ClusterState clusterState) { if (logger.isTraceEnabled()) { logger.trace("基于集群状态计算集群健康,版本 [{}]", clusterState.version()); } //上来第一件事情,做个验证,这里主要是 routingTable 和 metaData 做个比对。 //比如 :新建索引的时候,用户指定了5个分片,但是实际routingTable里,只有4个,那么完蛋了。 RoutingTableValidation validation = clusterState.routingTable().validate(clusterState.metaData()); ClusterHealthResponse response = new ClusterHealthResponse(clusterName.value(), validation.failures()); response.numberOfNodes = clusterState.nodes().size(); response.numberOfDataNodes = clusterState.nodes().dataNodes().size(); String[] concreteIndices; try { concreteIndices = clusterState.metaData().concreteIndicesIgnoreMissing(request.indices()); } catch (IndexMissingException e) { return response; } //整个判断,分成3个层次,同一逻辑,分别计算 for (String index : concreteIndices) { IndexRoutingTable indexRoutingTable = clusterState.routingTable().index(index); IndexMetaData indexMetaData = clusterState.metaData().index(index); if (indexRoutingTable == null) { continue; } ClusterIndexHealth indexHealth = new ClusterIndexHealth(index, indexMetaData.numberOfShards(), indexMetaData.numberOfReplicas(), validation.indexFailures(indexMetaData.index())); for (IndexShardRoutingTable shardRoutingTable : indexRoutingTable) { ClusterShardHealth shardHealth = new ClusterShardHealth(shardRoutingTable.shardId().id()); for (ShardRouting shardRouting : shardRoutingTable) { if (shardRouting.active()) { //如果分片是活动的,什么叫活动的,你懂的 shardHealth.activeShards++; if (shardRouting.relocating()) { // the shard is relocating, the one he is relocating to will be in initializing state, so we don't count it shardHealth.relocatingShards++; //计算迁移证中的 } if (shardRouting.primary()) { shardHealth.primaryActive = true; //恰好,它是个主分片 } } else if (shardRouting.initializing()) { shardHealth.initializingShards++; //计算初始化中的 } else if (shardRouting.unassigned()) { shardHealth.unassignedShards++; //没分配的 } } if (shardHealth.primaryActive) { if (shardHealth.activeShards == shardRoutingTable.size()) { //如果所有分片都是活动的话 shardHealth.status = ClusterHealthStatus.GREEN; } else { shardHealth.status = ClusterHealthStatus.YELLOW; } } else { //如果主分片,不是活动的,那不出意外,整个集群都是红色的 shardHealth.status = ClusterHealthStatus.RED; } indexHealth.shards.put(shardHealth.getId(), shardHealth); } for (ClusterShardHealth shardHealth : indexHealth) { if (shardHealth.isPrimaryActive()) { indexHealth.activePrimaryShards++; } indexHealth.activeShards += shardHealth.activeShards; indexHealth.relocatingShards += shardHealth.relocatingShards; indexHealth.initializingShards += shardHealth.initializingShards; indexHealth.unassignedShards += shardHealth.unassignedShards; } // 假设他是健康的绿色 indexHealth.status = ClusterHealthStatus.GREEN; if (!indexHealth.getValidationFailures().isEmpty()) { indexHealth.status = ClusterHealthStatus.RED; } else if (indexHealth.getShards().isEmpty()) { // might be since none has been created yet (two phase index creation) indexHealth.status = ClusterHealthStatus.RED; } else { for (ClusterShardHealth shardHealth : indexHealth) { if (shardHealth.getStatus() == ClusterHealthStatus.RED) { //只要有一个分片是红色的,那索引健康就是红色的 indexHealth.status = ClusterHealthStatus.RED; break; } if (shardHealth.getStatus() == ClusterHealthStatus.YELLOW) { indexHealth.status = ClusterHealthStatus.YELLOW; } } } response.indices.put(indexHealth.getIndex(), indexHealth); } for (ClusterIndexHealth indexHealth : response) { response.activePrimaryShards += indexHealth.activePrimaryShards; response.activeShards += indexHealth.activeShards; response.relocatingShards += indexHealth.relocatingShards; response.initializingShards += indexHealth.initializingShards; response.unassignedShards += indexHealth.unassignedShards; } response.status = ClusterHealthStatus.GREEN; if (!response.getValidationFailures().isEmpty()) { response.status = ClusterHealthStatus.RED; } else if (clusterState.blocks().hasGlobalBlock(RestStatus.SERVICE_UNAVAILABLE)) { //Ping不通了 response.status = ClusterHealthStatus.RED; } else { //下面这个循环的意思,就是官方文档说的那句 // The cluster status is controlled by the worst index status. for (ClusterIndexHealth indexHealth : response) { if (indexHealth.getStatus() == ClusterHealthStatus.RED) { response.status = ClusterHealthStatus.RED; break; } if (indexHealth.getStatus() == ClusterHealthStatus.YELLOW) { response.status = ClusterHealthStatus.YELLOW; } } } return response; }