Spring Cloud config之三:config-server因为server端和client端的健康检查导致服务超时阻塞问题

springcloud线上一个问题,当config-server连不上git时,微服务集群慢慢的都挂掉。

在入口层增加了日志跟踪问题:

org.springframework.cloud.config.server.environment.EnvironmentController.java

    @RequestMapping("/{name}/{profiles}/{label:.*}")
    public Environment labelled(@PathVariable String name, @PathVariable String profiles,
            @PathVariable String label) {
        if (name != null && name.contains("(_)")) {
            // "(_)" is uncommon in a git repo name, but "/" cannot be matched
            // by Spring MVC
            name = name.replace("(_)", "/");
        }
        if (label != null && label.contains("(_)")) {
            // "(_)" is uncommon in a git branch name, but "/" cannot be matched
            // by Spring MVC
            label = label.replace("(_)", "/");
        }
        StopWatch sw = new StopWatch("labelled");
        sw.start();
        logger.info("EnvironmentController.labelled()开始,name={},profiles={},label={}", name, profiles, label);
        Environment environment = this.repository.findOne(name, profiles, label);
        sw.stop();
        logger.info("EnvironmentController.labelled()结束,name={},profiles={},label={},耗时={}毫秒,耗时={}秒", name, profiles, label, sw.getTotalTimeMillis(), sw.getTotalTimeSeconds());
        return environment;
    }

健康检查的入口ConfigServerHealthIndicator.java增加日志:

@Override
    protected void doHealthCheck(Health.Builder builder) throws Exception {
        StopWatch sw = new StopWatch("doHealthCheck");
        sw.start();
        logger.info("ConfigServerHealthIndicator.doHealthCheck()开始,builder={}", builder);
        builder.up();
        List> details = new ArrayList<>();
        for (String name : this.repositories.keySet()) {
            Repository repository = this.repositories.get(name);
            String application = (repository.getName() == null)? name : repository.getName();
            String profiles = repository.getProfiles();

            try {
                Environment environment = this.environmentRepository.findOne(application, profiles, repository.getLabel());

                HashMap detail = new HashMap<>();
                detail.put("name", environment.getName());
                detail.put("label", environment.getLabel());
                if (environment.getProfiles() != null && environment.getProfiles().length > 0) {
                    detail.put("profiles", Arrays.asList(environment.getProfiles()));
                }

                if (!CollectionUtils.isEmpty(environment.getPropertySources())) {
                    List sources = new ArrayList<>();
                    for (PropertySource source : environment.getPropertySources()) {
                        sources.add(source.getName());
                    }
                    detail.put("sources", sources);
                }
                details.add(detail);
            } catch (Exception e) {
                HashMap map = new HashMap<>();
                map.put("application", application);
                map.put("profiles", profiles);
                builder.withDetail("repository", map);
                builder.down(e);
                return;
            }
        }
        builder.withDetail("repositories", details);
        sw.stop();
        logger.info("ConfigServerHealthIndicator.doHealthCheck()结束,耗时={}毫秒,耗时={}秒,builder={}", sw.getTotalTimeMillis(), sw.getTotalTimeSeconds(), builder);
    }

通过耗时统计的日志分析后,发现是EnvironmentController和ConfigServerHealthIndicator调用次数太多,这两个调用最终会调用JGitEnvironmentRepository.fetch()方法,这个fetch方法会去请求git,超时时间大概是5秒。

由于请求的数量过多,服务请求不过来,线程阻塞了很长时间。

分析:

1、EnvironmentController的调用是每个微服务模块发起的,为什么?

2、ConfigServerHealthIndicator的调用是config-server的健康检查,可以通过设置检查的间隔时间缓解问题。

    consul:
      host: 10.200.110.100
      port: 8500
      enabled: true
      discovery:
        enabled: true
        hostname: 10.200.110.100
        healthCheckInterval: 30s
        queryPassing: true

 

EnvironmentController的请求时用config-server的client端的健康检查发起的调用。看源码:

各个客户端在连接注册中心,获取到配置中心实例后,会调用上面这段代码逻辑从配置中心获取到 Environment数据变量,上线环境后,遇到了一个问题,查看日志,发现这块逻辑被不停的调用,每20多秒就会调用一次,application的name为 app,通过查看SpringCloudConfig的官方文档知道Config Server 通过一个健康指示器来检测配置的EnvironmentRepository是否正常工作。 默认情况下会向EnvironmentRepository询问一个名字为app的应用配置,EnvironmentRepository实例回应default配置。   也就是说当健康监视器默认开启的时候,会不停的调用findOne来检测,配置是否可用,是否会出现异常,

这段代码是org.springframework.cloud.config.server.config.ConfigServerHealthIndicator类里初始化名称为application名字为app的代码

@ConfigurationProperties("spring.cloud.config.server.health")
public class ConfigServerHealthIndicator extends AbstractHealthIndicator {

    private EnvironmentRepository environmentRepository;

    private Map repositories = new LinkedHashMap<>();

    public ConfigServerHealthIndicator(EnvironmentRepository environmentRepository) {
        this.environmentRepository = environmentRepository;
    }

    @PostConstruct
    public void init() {
        if (this.repositories.isEmpty()) {
            this.repositories.put("app", new Repository());
        }
    }
       //...
}

如果想停止掉这样的检测可以通过配置health.config.enabled=false去关闭此功能。

看源码:org.springframework.cloud.config.client.ConfigClientAutoConfiguration.java

@Configuration
public class ConfigClientAutoConfiguration {
    @Configuration
    @ConditionalOnClass(HealthIndicator.class)
    @ConditionalOnBean(ConfigServicePropertySourceLocator.class)
    @ConditionalOnProperty(value = "health.config.enabled", matchIfMissing = true)
    protected static class ConfigServerHealthIndicatorConfiguration {

        @Bean
        public ConfigServerHealthIndicator configServerHealthIndicator(
                ConfigServicePropertySourceLocator locator,
                ConfigClientHealthProperties properties, Environment environment) {
            return new ConfigServerHealthIndicator(locator, environment, properties);
        }
    }
//...

 

你可能感兴趣的:(Spring Cloud config之三:config-server因为server端和client端的健康检查导致服务超时阻塞问题)