系列
- Sentinel流程介绍
- Sentinel资源节点树构成
- Sentinel滑动窗口介绍
- Sentinel流量控制
- Sentinel的职责链slot介绍
- Sentinel熔断降级
- Sentinel Dashboard和应用通信
- Sentinel 控制台
开篇
现代微服务架构都是分布式的,由非常多的服务组成。不同服务之间相互调用,组成复杂的调用链路。以上的问题在链路调用中会产生放大的效果。复杂链路上的某一环不稳定,就可能会层层级联,最终导致整个链路都不可用。因此我们需要对不稳定的弱依赖服务调用进行熔断降级,暂时切断不稳定调用,避免局部不稳定因素导致整体的雪崩。熔断降级作为保护自身的手段,通常在客户端(调用端)进行配置。
Sentinel的熔断策略属于事后熔断,所谓的事后熔断指本次是否熔断是基于本次之前的统计来进行判断。每次调用过程中DegradeSlot会在调用完成后的exit过程中统计各类异常比例(包括慢请求、异常请求)等指标并判定是否进入熔断状态,如果处于熔断状态那么下次请求就直接熔断返回。
熔断策略
慢调用比例 (SLOW_REQUEST_RATIO):选择以慢调用比例作为阈值,需要设置允许的慢调用 RT(即最大的响应时间),请求的响应时间大于该值则统计为慢调用。当单位统计时长(statIntervalMs)内请求数目大于设置的最小请求数目,并且慢调用的比例大于阈值,则接下来的熔断时长内请求会自动被熔断。经过熔断时长后熔断器会进入探测恢复状态(HALF-OPEN 状态),若接下来的一个请求响应时间小于设置的慢调用 RT 则结束熔断,若大于设置的慢调用 RT 则会再次被熔断。
异常比例 (ERROR_RATIO):当单位统计时长(statIntervalMs)内请求数目大于设置的最小请求数目,并且异常的比例大于阈值,则接下来的熔断时长内请求会自动被熔断。经过熔断时长后熔断器会进入探测恢复状态(HALF-OPEN 状态),若接下来的一个请求成功完成(没有错误)则结束熔断,否则会再次被熔断。异常比率的阈值范围是 [0.0, 1.0],代表 0% - 100%。
异常数 (ERROR_COUNT):当单位统计时长内的异常数目超过阈值之后会自动进行熔断。经过熔断时长后熔断器会进入探测恢复状态(HALF-OPEN 状态),若接下来的一个请求成功完成(没有错误)则结束熔断,否则会再次被熔断。
熔断状态,
- 熔断有三种状态,分别为OPEN、HALF_OPEN、CLOSED。
- OPEN:表示熔断开启,拒绝所有请求
- HALF_OPEN:探测恢复状态,如果接下来一个请求通过则结束熔断,否则继续熔断
- CLOSED:表示熔断关闭,请求顺利通过
熔断降级规则
- 熔断规则参数说明如上图所示。
- 熔断规则的配置说明如上图所示。
public abstract class AbstractRule implements Rule {
private String resource;
private String limitApp;
}
public class DegradeRule extends AbstractRule {
private int grade = RuleConstant.DEGRADE_GRADE_RT;
private double count;
private int timeWindow;
private int minRequestAmount = RuleConstant.DEGRADE_DEFAULT_MIN_REQUEST_AMOUNT;
private double slowRatioThreshold = 1.0d;
private int statIntervalMs = 1000;
}
- 熔断参数的源码定义如上图所示。
慢调用策略的规则参数
[
{
"count": 3000,
"grade": 0,
"limitApp": "default",
"minRequestAmount": 100,
"resource": "degrade01",
"slowRatioThreshold": 0.5,
"statIntervalMs": 1000,
"timeWindow": 5
}
]
异常比例的规则参数
{
"count": 0.3,
"grade": 1,
"limitApp": "default",
"minRequestAmount": 200,
"resource": "degrade02",
"slowRatioThreshold": 1,
"statIntervalMs": 1000,
"timeWindow": 5
}
异常数的规则参数
{
"count": 1000,
"grade": 2,
"limitApp": "default",
"minRequestAmount": 300,
"resource": "degrade03",
"slowRatioThreshold": 1,
"statIntervalMs": 1000,
"timeWindow": 5
}
DegradeSlot熔断过程
规则构建
public final class DegradeRuleManager {
private static CircuitBreaker newCircuitBreakerFrom(/*@Valid*/ DegradeRule rule) {
switch (rule.getGrade()) {
// 慢调用策略返回ResponseTimeCircuitBreaker
case RuleConstant.DEGRADE_GRADE_RT:
return new ResponseTimeCircuitBreaker(rule);
// 异常比例和异常数返回ExceptionCircuitBreaker
case RuleConstant.DEGRADE_GRADE_EXCEPTION_RATIO:
case RuleConstant.DEGRADE_GRADE_EXCEPTION_COUNT:
return new ExceptionCircuitBreaker(rule);
default:
return null;
}
}
- 慢调用策略根据DegradeRule生成ResponseTimeCircuitBreaker。
- 异常比例和异常数策略跟进DegradeRule生成ExceptionCircuitBreaker。
熔断执行
public class DegradeSlot extends AbstractLinkedProcessorSlot {
@Override
public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node, int count,
boolean prioritized, Object... args) throws Throwable {
// 检测是否需要熔断
performChecking(context, resourceWrapper);
fireEntry(context, resourceWrapper, node, count, prioritized, args);
}
void performChecking(Context context, ResourceWrapper r) throws BlockException {
// 获取所有规则
List circuitBreakers = DegradeRuleManager.getCircuitBreakers(r.getName());
// 挨个遍历规则进行熔断
for (CircuitBreaker cb : circuitBreakers) {
if (!cb.tryPass(context)) {
throw new DegradeException(cb.getRule().getLimitApp(), cb.getRule());
}
}
}
@Override
public void exit(Context context, ResourceWrapper r, int count, Object... args) {
Entry curEntry = context.getCurEntry();
if (curEntry.getBlockError() != null) {
fireExit(context, r, count, args);
return;
}
List circuitBreakers = DegradeRuleManager.getCircuitBreakers(r.getName());
if (circuitBreakers == null || circuitBreakers.isEmpty()) {
fireExit(context, r, count, args);
return;
}
if (curEntry.getBlockError() == null) {
// passed request
for (CircuitBreaker circuitBreaker : circuitBreakers) {
circuitBreaker.onRequestComplete(context);
}
}
fireExit(context, r, count, args);
}
}
- entry过程中获取所有的拦截器对象CircuitBreaker进行熔断降级的判断。
- exit过程中onRequestComplete统计完成的数据。
- 熔断拦截器主要包含ResponseTimeCircuitBreaker和ExceptionCircuitBreaker。
public abstract class AbstractCircuitBreaker implements CircuitBreaker {
protected final DegradeRule rule;
protected final int recoveryTimeoutMs;
private final EventObserverRegistry observerRegistry;
protected final AtomicReference currentState = new AtomicReference<>(State.CLOSED);
protected volatile long nextRetryTimestamp;
AbstractCircuitBreaker(DegradeRule rule, EventObserverRegistry observerRegistry) {
this.observerRegistry = observerRegistry;
this.rule = rule;
this.recoveryTimeoutMs = rule.getTimeWindow() * 1000;
}
public boolean tryPass(Context context) {
// 表示熔断关闭,请求顺利通过
if (currentState.get() == State.CLOSED) {
return true;
}
// 表示熔断开启,拒绝所有请求
if (currentState.get() == State.OPEN) {
return retryTimeoutArrived() && fromOpenToHalfOpen(context);
}
return false;
}
protected boolean retryTimeoutArrived() {
// 下一次重新探测的时间到达
return TimeUtil.currentTimeMillis() >= nextRetryTimestamp;
}
protected boolean fromOpenToHalfOpen(Context context) {
if (currentState.compareAndSet(State.OPEN, State.HALF_OPEN)) {
notifyObservers(State.OPEN, State.HALF_OPEN, null);
Entry entry = context.getCurEntry();
entry.whenTerminate(new BiConsumer() {
@Override
public void accept(Context context, Entry entry) {
if (entry.getBlockError() != null) {
// 尝试从OPEN转换为HALF-OPEN失败因为探测请求依旧被阻塞
currentState.compareAndSet(State.HALF_OPEN, State.OPEN);
notifyObservers(State.HALF_OPEN, State.OPEN, 1.0d);
}
}
});
return true;
}
return false;
}
}
- tryPass本质上直接判断当前所处的熔断状态,而计算熔断状态是在上次请求完成后进行统计的。
ExceptionCircuitBreaker
public class ExceptionCircuitBreaker extends AbstractCircuitBreaker {
private final int strategy;
private final int minRequestAmount;
private final double threshold;
private final LeapArray stat;
@Override
public void onRequestComplete(Context context) {
Entry entry = context.getCurEntry();
if (entry == null) {
return;
}
Throwable error = entry.getError();
SimpleErrorCounter counter = stat.currentWindow().value();
if (error != null) {
counter.getErrorCount().add(1);
}
counter.getTotalCount().add(1);
handleStateChangeWhenThresholdExceeded(error);
}
private void handleStateChangeWhenThresholdExceeded(Throwable error) {
if (currentState.get() == State.OPEN) {
return;
}
if (currentState.get() == State.HALF_OPEN) {
// In detecting request
if (error == null) {
fromHalfOpenToClose();
} else {
fromHalfOpenToOpen(1.0d);
}
return;
}
List counters = stat.values();
long errCount = 0;
long totalCount = 0;
for (SimpleErrorCounter counter : counters) {
errCount += counter.errorCount.sum();
totalCount += counter.totalCount.sum();
}
if (totalCount < minRequestAmount) {
return;
}
double curCount = errCount;
if (strategy == DEGRADE_GRADE_EXCEPTION_RATIO) {
// Use errorRatio
curCount = errCount * 1.0d / totalCount;
}
if (curCount > threshold) {
transformToOpen(curCount);
}
}
}
- 统计errCount和totalCount来计算异常比例并变更熔断状态。
ResponseTimeCircuitBreaker
public class ResponseTimeCircuitBreaker extends AbstractCircuitBreaker {
private static final double SLOW_REQUEST_RATIO_MAX_VALUE = 1.0d;
private final long maxAllowedRt;
private final double maxSlowRequestRatio;
private final int minRequestAmount;
private final LeapArray slidingCounter;
@Override
public void onRequestComplete(Context context) {
SlowRequestCounter counter = slidingCounter.currentWindow().value();
Entry entry = context.getCurEntry();
if (entry == null) {
return;
}
long completeTime = entry.getCompleteTimestamp();
if (completeTime <= 0) {
completeTime = TimeUtil.currentTimeMillis();
}
long rt = completeTime - entry.getCreateTimestamp();
if (rt > maxAllowedRt) {
counter.slowCount.add(1);
}
counter.totalCount.add(1);
handleStateChangeWhenThresholdExceeded(rt);
}
private void handleStateChangeWhenThresholdExceeded(long rt) {
if (currentState.get() == State.OPEN) {
return;
}
if (currentState.get() == State.HALF_OPEN) {
// In detecting request
// TODO: improve logic for half-open recovery
if (rt > maxAllowedRt) {
fromHalfOpenToOpen(1.0d);
} else {
fromHalfOpenToClose();
}
return;
}
List counters = slidingCounter.values();
long slowCount = 0;
long totalCount = 0;
for (SlowRequestCounter counter : counters) {
slowCount += counter.slowCount.sum();
totalCount += counter.totalCount.sum();
}
if (totalCount < minRequestAmount) {
return;
}
double currentRatio = slowCount * 1.0d / totalCount;
if (currentRatio > maxSlowRequestRatio) {
transformToOpen(currentRatio);
}
if (Double.compare(currentRatio, maxSlowRequestRatio) == 0 &&
Double.compare(maxSlowRequestRatio, SLOW_REQUEST_RATIO_MAX_VALUE) == 0) {
transformToOpen(currentRatio);
}
}
}
- 统计slowCount和totalCount进行慢请求数量或者比例计算来判断是否熔断。