Sentinel熔断降级

系列

  • Sentinel流程介绍
  • Sentinel资源节点树构成
  • Sentinel滑动窗口介绍
  • Sentinel流量控制
  • Sentinel的职责链slot介绍
  • Sentinel熔断降级
  • Sentinel Dashboard和应用通信
  • Sentinel 控制台

开篇

  • 现代微服务架构都是分布式的,由非常多的服务组成。不同服务之间相互调用,组成复杂的调用链路。以上的问题在链路调用中会产生放大的效果。复杂链路上的某一环不稳定,就可能会层层级联,最终导致整个链路都不可用。因此我们需要对不稳定的弱依赖服务调用进行熔断降级,暂时切断不稳定调用,避免局部不稳定因素导致整体的雪崩。熔断降级作为保护自身的手段,通常在客户端(调用端)进行配置。

  • Sentinel的熔断策略属于事后熔断,所谓的事后熔断指本次是否熔断是基于本次之前的统计来进行判断。每次调用过程中DegradeSlot会在调用完成后的exit过程中统计各类异常比例(包括慢请求、异常请求)等指标并判定是否进入熔断状态,如果处于熔断状态那么下次请求就直接熔断返回

熔断策略

  • 慢调用比例 (SLOW_REQUEST_RATIO):选择以慢调用比例作为阈值,需要设置允许的慢调用 RT(即最大的响应时间),请求的响应时间大于该值则统计为慢调用。当单位统计时长(statIntervalMs)内请求数目大于设置的最小请求数目,并且慢调用的比例大于阈值,则接下来的熔断时长内请求会自动被熔断。经过熔断时长后熔断器会进入探测恢复状态(HALF-OPEN 状态),若接下来的一个请求响应时间小于设置的慢调用 RT 则结束熔断,若大于设置的慢调用 RT 则会再次被熔断。

  • 异常比例 (ERROR_RATIO):当单位统计时长(statIntervalMs)内请求数目大于设置的最小请求数目,并且异常的比例大于阈值,则接下来的熔断时长内请求会自动被熔断。经过熔断时长后熔断器会进入探测恢复状态(HALF-OPEN 状态),若接下来的一个请求成功完成(没有错误)则结束熔断,否则会再次被熔断。异常比率的阈值范围是 [0.0, 1.0],代表 0% - 100%。

  • 异常数 (ERROR_COUNT):当单位统计时长内的异常数目超过阈值之后会自动进行熔断。经过熔断时长后熔断器会进入探测恢复状态(HALF-OPEN 状态),若接下来的一个请求成功完成(没有错误)则结束熔断,否则会再次被熔断。

熔断状态,

  • 熔断有三种状态,分别为OPEN、HALF_OPEN、CLOSED。
  • OPEN:表示熔断开启,拒绝所有请求
  • HALF_OPEN:探测恢复状态,如果接下来一个请求通过则结束熔断,否则继续熔断
  • CLOSED:表示熔断关闭,请求顺利通过

熔断降级规则

  • 熔断规则参数说明如上图所示。
  • 熔断规则的配置说明如上图所示。
public abstract class AbstractRule implements Rule {
    private String resource;
    private String limitApp;
}

public class DegradeRule extends AbstractRule {
    private int grade = RuleConstant.DEGRADE_GRADE_RT;
    private double count;
    private int timeWindow;
    private int minRequestAmount = RuleConstant.DEGRADE_DEFAULT_MIN_REQUEST_AMOUNT;
    private double slowRatioThreshold = 1.0d;
    private int statIntervalMs = 1000;
}
  • 熔断参数的源码定义如上图所示。

慢调用策略的规则参数

[
    {
        "count": 3000,
        "grade": 0,
        "limitApp": "default",
        "minRequestAmount": 100,
        "resource": "degrade01",
        "slowRatioThreshold": 0.5,
        "statIntervalMs": 1000,
        "timeWindow": 5
    }
]

异常比例的规则参数

{
    "count": 0.3,
    "grade": 1,
    "limitApp": "default",
    "minRequestAmount": 200,
    "resource": "degrade02",
    "slowRatioThreshold": 1,
    "statIntervalMs": 1000,
    "timeWindow": 5
}

异常数的规则参数

{
    "count": 1000,
    "grade": 2,
    "limitApp": "default",
    "minRequestAmount": 300,
    "resource": "degrade03",
    "slowRatioThreshold": 1,
    "statIntervalMs": 1000,
    "timeWindow": 5
}

DegradeSlot熔断过程

规则构建

public final class DegradeRuleManager {

    private static CircuitBreaker newCircuitBreakerFrom(/*@Valid*/ DegradeRule rule) {
        switch (rule.getGrade()) {
            // 慢调用策略返回ResponseTimeCircuitBreaker
            case RuleConstant.DEGRADE_GRADE_RT:
                return new ResponseTimeCircuitBreaker(rule);
            // 异常比例和异常数返回ExceptionCircuitBreaker
            case RuleConstant.DEGRADE_GRADE_EXCEPTION_RATIO:
            case RuleConstant.DEGRADE_GRADE_EXCEPTION_COUNT:
                return new ExceptionCircuitBreaker(rule);

            default:
                return null;
        }
    }
  • 慢调用策略根据DegradeRule生成ResponseTimeCircuitBreaker。
  • 异常比例和异常数策略跟进DegradeRule生成ExceptionCircuitBreaker。

熔断执行

public class DegradeSlot extends AbstractLinkedProcessorSlot {

    @Override
    public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node, int count,
                      boolean prioritized, Object... args) throws Throwable {
        // 检测是否需要熔断
        performChecking(context, resourceWrapper);

        fireEntry(context, resourceWrapper, node, count, prioritized, args);
    }

    void performChecking(Context context, ResourceWrapper r) throws BlockException {
        // 获取所有规则
        List circuitBreakers = DegradeRuleManager.getCircuitBreakers(r.getName());
        // 挨个遍历规则进行熔断
        for (CircuitBreaker cb : circuitBreakers) {
            if (!cb.tryPass(context)) {
                throw new DegradeException(cb.getRule().getLimitApp(), cb.getRule());
            }
        }
    }

    @Override
    public void exit(Context context, ResourceWrapper r, int count, Object... args) {
        Entry curEntry = context.getCurEntry();
        if (curEntry.getBlockError() != null) {
            fireExit(context, r, count, args);
            return;
        }

        List circuitBreakers = DegradeRuleManager.getCircuitBreakers(r.getName());
        if (circuitBreakers == null || circuitBreakers.isEmpty()) {
            fireExit(context, r, count, args);
            return;
        }

        if (curEntry.getBlockError() == null) {
            // passed request
            for (CircuitBreaker circuitBreaker : circuitBreakers) {
                circuitBreaker.onRequestComplete(context);
            }
        }

        fireExit(context, r, count, args);
    }
}
  • entry过程中获取所有的拦截器对象CircuitBreaker进行熔断降级的判断。
  • exit过程中onRequestComplete统计完成的数据。
  • 熔断拦截器主要包含ResponseTimeCircuitBreaker和ExceptionCircuitBreaker。
public abstract class AbstractCircuitBreaker implements CircuitBreaker {

    protected final DegradeRule rule;
    protected final int recoveryTimeoutMs;
    private final EventObserverRegistry observerRegistry;
    protected final AtomicReference currentState = new AtomicReference<>(State.CLOSED);
    protected volatile long nextRetryTimestamp;

    AbstractCircuitBreaker(DegradeRule rule, EventObserverRegistry observerRegistry) {
        this.observerRegistry = observerRegistry;
        this.rule = rule;
        this.recoveryTimeoutMs = rule.getTimeWindow() * 1000;
    }

    public boolean tryPass(Context context) {
        // 表示熔断关闭,请求顺利通过
        if (currentState.get() == State.CLOSED) {
            return true;
        }

        // 表示熔断开启,拒绝所有请求
        if (currentState.get() == State.OPEN) {
            return retryTimeoutArrived() && fromOpenToHalfOpen(context);
        }

        return false;
    }

    protected boolean retryTimeoutArrived() {
        // 下一次重新探测的时间到达
        return TimeUtil.currentTimeMillis() >= nextRetryTimestamp;
    }

    protected boolean fromOpenToHalfOpen(Context context) {
        if (currentState.compareAndSet(State.OPEN, State.HALF_OPEN)) {
            notifyObservers(State.OPEN, State.HALF_OPEN, null);
            Entry entry = context.getCurEntry();
            entry.whenTerminate(new BiConsumer() {
                @Override
                public void accept(Context context, Entry entry) {
                    if (entry.getBlockError() != null) {
                        // 尝试从OPEN转换为HALF-OPEN失败因为探测请求依旧被阻塞
                        currentState.compareAndSet(State.HALF_OPEN, State.OPEN);
                        notifyObservers(State.HALF_OPEN, State.OPEN, 1.0d);
                    }
                }
            });
            return true;
        }
        return false;
    }
}
  • tryPass本质上直接判断当前所处的熔断状态,而计算熔断状态是在上次请求完成后进行统计的。

ExceptionCircuitBreaker

public class ExceptionCircuitBreaker extends AbstractCircuitBreaker {

    private final int strategy;
    private final int minRequestAmount;
    private final double threshold;
    private final LeapArray stat;

    @Override
    public void onRequestComplete(Context context) {
        Entry entry = context.getCurEntry();
        if (entry == null) {
            return;
        }
        Throwable error = entry.getError();
        SimpleErrorCounter counter = stat.currentWindow().value();
        if (error != null) {
            counter.getErrorCount().add(1);
        }
        counter.getTotalCount().add(1);

        handleStateChangeWhenThresholdExceeded(error);
    }

    private void handleStateChangeWhenThresholdExceeded(Throwable error) {
        if (currentState.get() == State.OPEN) {
            return;
        }
        
        if (currentState.get() == State.HALF_OPEN) {
            // In detecting request
            if (error == null) {
                fromHalfOpenToClose();
            } else {
                fromHalfOpenToOpen(1.0d);
            }
            return;
        }
        
        List counters = stat.values();
        long errCount = 0;
        long totalCount = 0;
        for (SimpleErrorCounter counter : counters) {
            errCount += counter.errorCount.sum();
            totalCount += counter.totalCount.sum();
        }
        if (totalCount < minRequestAmount) {
            return;
        }
        double curCount = errCount;
        if (strategy == DEGRADE_GRADE_EXCEPTION_RATIO) {
            // Use errorRatio
            curCount = errCount * 1.0d / totalCount;
        }
        if (curCount > threshold) {
            transformToOpen(curCount);
        }
    }
}
  • 统计errCount和totalCount来计算异常比例并变更熔断状态。

ResponseTimeCircuitBreaker

public class ResponseTimeCircuitBreaker extends AbstractCircuitBreaker {

    private static final double SLOW_REQUEST_RATIO_MAX_VALUE = 1.0d;
    private final long maxAllowedRt;
    private final double maxSlowRequestRatio;
    private final int minRequestAmount;
    private final LeapArray slidingCounter;

    @Override
    public void onRequestComplete(Context context) {
        SlowRequestCounter counter = slidingCounter.currentWindow().value();
        Entry entry = context.getCurEntry();
        if (entry == null) {
            return;
        }
        long completeTime = entry.getCompleteTimestamp();
        if (completeTime <= 0) {
            completeTime = TimeUtil.currentTimeMillis();
        }
        long rt = completeTime - entry.getCreateTimestamp();
        if (rt > maxAllowedRt) {
            counter.slowCount.add(1);
        }
        counter.totalCount.add(1);

        handleStateChangeWhenThresholdExceeded(rt);
    }

    private void handleStateChangeWhenThresholdExceeded(long rt) {
        if (currentState.get() == State.OPEN) {
            return;
        }
        
        if (currentState.get() == State.HALF_OPEN) {
            // In detecting request
            // TODO: improve logic for half-open recovery
            if (rt > maxAllowedRt) {
                fromHalfOpenToOpen(1.0d);
            } else {
                fromHalfOpenToClose();
            }
            return;
        }

        List counters = slidingCounter.values();
        long slowCount = 0;
        long totalCount = 0;
        for (SlowRequestCounter counter : counters) {
            slowCount += counter.slowCount.sum();
            totalCount += counter.totalCount.sum();
        }
        if (totalCount < minRequestAmount) {
            return;
        }
        double currentRatio = slowCount * 1.0d / totalCount;
        if (currentRatio > maxSlowRequestRatio) {
            transformToOpen(currentRatio);
        }
        if (Double.compare(currentRatio, maxSlowRequestRatio) == 0 &&
                Double.compare(maxSlowRequestRatio, SLOW_REQUEST_RATIO_MAX_VALUE) == 0) {
            transformToOpen(currentRatio);
        }
    }
}
  • 统计slowCount和totalCount进行慢请求数量或者比例计算来判断是否熔断。

你可能感兴趣的:(Sentinel熔断降级)