elasticsearch 源代码分析之集群健康

        趣味的解释一下,如果把一个ES集群,比喻成王朝的话,3个状态,是这样的

  • 绿色,太平盛世,国家一片大好

  • 黄色,奸臣当道,国家危在旦夕

  • 红色,皇上不上朝,是可忍孰不可忍

        绿色的话,男耕女织,该干啥干啥,就不用管了,黄色的话,哪个王朝没有奸臣啊,也可以忍了。但是如果是红色的话,很严重,非常严重,基本上等一会儿集群就能恢复过来了。好了,已经有一个感性的认识了,那到底是咋回事呢?

  • 绿色,一切正常

  • 黄色,副本丢失

  • 红色,主分片丢失

        看到这里豁然开朗,就这么简单啊,我明白了, 但是等下,先别关闭博客,作为一个码农,有追求的码农,能就这么容易被糊弄过去吗? 必须看到代码,才是真理,代码才是最真实的。如果同学们不满足于比喻,那我们继续,我们要来真的了。

查看集群健康

curl http://localhost:9200/_cluster/health?pretty=true
{
  "cluster_name" : "mycluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 4,
  "number_of_data_nodes" : 4,
  "active_primary_shards" : 778,
  "active_shards" : 1556,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0
}

集群健康类

返回的这些信息,都是什么意思呢? 打开IDE一顿找。。。。最后

集群状态

public enum ClusterHealthStatus {
    GREEN((byte) 0),
    YELLOW((byte) 1),
    RED((byte) 2);
    //....

分片路由表状态

public enum ShardRoutingState {
    /**
     * 分片没有被分配到任意节点.
     */
    UNASSIGNED((byte) 1),
    /**
     * 分片正在初始化 (可能从一个分片或者时间之门正在恢复 ).
     */
    INITIALIZING((byte) 2),
    /**
     * 分片已经开始.
     */
    STARTED((byte) 3),
    /**
     * 分片正在迁移.
     */
    RELOCATING((byte) 4);
    //....

分片路由表

public class ImmutableShardRouting implements Streamable, Serializable, ShardRouting {
//...
@Override
    public boolean unassigned() {
        return state == ShardRoutingState.UNASSIGNED;
    }

    @Override
    public boolean initializing() {
        return state == ShardRoutingState.INITIALIZING;
    }

    @Override
    public boolean active() {
        return started() || relocating();
    }

    @Override
    public boolean started() {
        return state == ShardRoutingState.STARTED;
    }

    @Override
    public boolean relocating() {
        return state == ShardRoutingState.RELOCATING;
    }
 //...

看到这里,貌似有点明白了,开始了或者迁移中的分片,就是活动分片,恰好它是主分片,那就是活动主分片。原来如此,但是同学们又要说了,这些个只是实体类,充其量都是些小喽啰啊。别着急,咱们继续看。

集群健康计算

        TransportClusterHealthAction类的clusterHealth()方法负责集群健康的计算,它还从它的父类,继承了优良的传统,在Master节点上执行这些操作,如你没有往Master节点发送这个请求,没关系,它会替你转发。前面会做一些个等待信息的处理,我们暂且不关心,直奔主题。

private ClusterHealthResponse clusterHealth(ClusterHealthRequest request, ClusterState clusterState) {
        if (logger.isTraceEnabled()) {
            logger.trace("基于集群状态计算集群健康,版本 [{}]", clusterState.version());
        }
        
        //上来第一件事情,做个验证,这里主要是 routingTable 和 metaData 做个比对。
        //比如 :新建索引的时候,用户指定了5个分片,但是实际routingTable里,只有4个,那么完蛋了。
        RoutingTableValidation validation = clusterState.routingTable().validate(clusterState.metaData());
        
        ClusterHealthResponse response = new ClusterHealthResponse(clusterName.value(), validation.failures());
        response.numberOfNodes = clusterState.nodes().size();
        response.numberOfDataNodes = clusterState.nodes().dataNodes().size();

        String[] concreteIndices;
        try {
            concreteIndices = clusterState.metaData().concreteIndicesIgnoreMissing(request.indices());
        } catch (IndexMissingException e) {
            return response;
        }
        //整个判断,分成3个层次,同一逻辑,分别计算
        for (String index : concreteIndices) {
            IndexRoutingTable indexRoutingTable = clusterState.routingTable().index(index);
            IndexMetaData indexMetaData = clusterState.metaData().index(index);
            if (indexRoutingTable == null) {
                continue;
            }
            ClusterIndexHealth indexHealth = new ClusterIndexHealth(index, indexMetaData.numberOfShards(), indexMetaData.numberOfReplicas(), validation.indexFailures(indexMetaData.index()));

            for (IndexShardRoutingTable shardRoutingTable : indexRoutingTable) {
                ClusterShardHealth shardHealth = new ClusterShardHealth(shardRoutingTable.shardId().id());
                for (ShardRouting shardRouting : shardRoutingTable) {
                    if (shardRouting.active()) {	//如果分片是活动的,什么叫活动的,你懂的
                        shardHealth.activeShards++;
                        if (shardRouting.relocating()) {
                            // the shard is relocating, the one he is relocating to will be in initializing state, so we don't count it
                            shardHealth.relocatingShards++;		//计算迁移证中的
                        }
                        if (shardRouting.primary()) {
                            shardHealth.primaryActive = true;	//恰好,它是个主分片
                        }
                    } else if (shardRouting.initializing()) {
                        shardHealth.initializingShards++;	//计算初始化中的
                    } else if (shardRouting.unassigned()) {
                        shardHealth.unassignedShards++;	//没分配的
                    }
                }
                if (shardHealth.primaryActive) {
                    if (shardHealth.activeShards == shardRoutingTable.size()) {	//如果所有分片都是活动的话
                        shardHealth.status = ClusterHealthStatus.GREEN;
                    } else {
                        shardHealth.status = ClusterHealthStatus.YELLOW;
                    }
                } else {
                	//如果主分片,不是活动的,那不出意外,整个集群都是红色的
                    shardHealth.status = ClusterHealthStatus.RED;
                }
                indexHealth.shards.put(shardHealth.getId(), shardHealth);
            }

            for (ClusterShardHealth shardHealth : indexHealth) {
                if (shardHealth.isPrimaryActive()) {
                    indexHealth.activePrimaryShards++;
                }
                indexHealth.activeShards += shardHealth.activeShards;
                indexHealth.relocatingShards += shardHealth.relocatingShards;
                indexHealth.initializingShards += shardHealth.initializingShards;
                indexHealth.unassignedShards += shardHealth.unassignedShards;
            }
            // 假设他是健康的绿色
            indexHealth.status = ClusterHealthStatus.GREEN;
            if (!indexHealth.getValidationFailures().isEmpty()) {
                indexHealth.status = ClusterHealthStatus.RED;
            } else if (indexHealth.getShards().isEmpty()) { // might be since none has been created yet (two phase index creation)
                indexHealth.status = ClusterHealthStatus.RED;
            } else {
                for (ClusterShardHealth shardHealth : indexHealth) {
                    if (shardHealth.getStatus() == ClusterHealthStatus.RED) {	//只要有一个分片是红色的,那索引健康就是红色的
                        indexHealth.status = ClusterHealthStatus.RED;
                        break;
                    }
                    if (shardHealth.getStatus() == ClusterHealthStatus.YELLOW) {
                        indexHealth.status = ClusterHealthStatus.YELLOW;
                    }
                }
            }

            response.indices.put(indexHealth.getIndex(), indexHealth);
        }

        for (ClusterIndexHealth indexHealth : response) {
            response.activePrimaryShards += indexHealth.activePrimaryShards;
            response.activeShards += indexHealth.activeShards;
            response.relocatingShards += indexHealth.relocatingShards;
            response.initializingShards += indexHealth.initializingShards;
            response.unassignedShards += indexHealth.unassignedShards;
        }
        response.status = ClusterHealthStatus.GREEN;
        if (!response.getValidationFailures().isEmpty()) {
            response.status = ClusterHealthStatus.RED;
        } else if (clusterState.blocks().hasGlobalBlock(RestStatus.SERVICE_UNAVAILABLE)) {	//Ping不通了
            response.status = ClusterHealthStatus.RED;
        } else {
        	//下面这个循环的意思,就是官方文档说的那句
        	// The cluster status is controlled by the worst index status.
            for (ClusterIndexHealth indexHealth : response) {
                if (indexHealth.getStatus() == ClusterHealthStatus.RED) {
                    response.status = ClusterHealthStatus.RED;
                    break;
                }
                if (indexHealth.getStatus() == ClusterHealthStatus.YELLOW) {
                    response.status = ClusterHealthStatus.YELLOW;
                }
            }
        }

        return response;
    }


你可能感兴趣的:(elasticsearch,源代码)