1. 简介

Spring-Retry为Spring的应用程序提供了声明式重试支持。
Spring重试（注解版）

优点：

无侵入：采用Spring AOP方式，提供了声明式方式和编程式方式实现注解。
可配置：可以灵活配置重试次数、间隔、熔断等。
通用性：支持绝大多数场景。

缺陷：

当业务方法抛出Throwable及其子类后，会被spring-retry捕获进行处理。也就是当我们依赖某个数据对象实体作为重试实体时，spring-retry必须强制转换为Throwable的子类。
使用声明式重试时，@Recover注解在使用时无法指定方法，如果一个类中存在多个重试方法，需要细加区分。

2. 配置策略

案例：

@Slf4j
public class SpringRetry1 {
   //业务方法
    public static Boolean vpmsRetryCoupon(User user) {
        // 构建重试模板实例
        RetryTemplate retryTemplate = new RetryTemplate();
        // 设置重试策略，主要设置重试次数
        SimpleRetryPolicy policy = new SimpleRetryPolicy(2, Collections., Boolean>singletonMap(Exception.class, true));
        // 设置重试回退操作策略，主要设置重试间隔时间
        FixedBackOffPolicy fixedBackOffPolicy = new FixedBackOffPolicy();
        fixedBackOffPolicy.setBackOffPeriod(100);
        //填充模板
        retryTemplate.setRetryPolicy(policy);
        retryTemplate.setBackOffPolicy(fixedBackOffPolicy);

        // 通过RetryCallback 重试回调实例包装正常逻辑逻辑，第一次执行和重试执行执行的都是这段逻辑
        final RetryCallback retryCallback = new RetryCallback() {
            //RetryContext 重试操作上下文约定，统一spring-try包装
            public Boolean doWithRetry(RetryContext context) {
                boolean result = pushCouponByVpmsaa(user);
                log.info(JSON.toJSONString(user));
                if (!result) {
                    throw new RuntimeException();//这个点特别注意，重试的根源通过Exception返回
                }
                return true;
            }
        };
        // 通过RecoveryCallback 重试流程正常结束或者达到重试上限后的退出恢复操作实例
        final RecoveryCallback recoveryCallback = new RecoveryCallback() {
            public Boolean recover(RetryContext context) throws Exception {
                System.out.println("重试失败...");
                return false;
            }
        };
        

        Boolean execute = false;
        try {
            // 由retryTemplate 执行execute方法开始逻辑执行
            execute = retryTemplate.execute(retryCallback, recoveryCallback);
        } catch (Exception e) {
            e.printStackTrace();
        }
        return execute;
    }

    //业务方法
    public static Boolean pushCouponByVpmsaa(User user) {
        Random random = new Random();
        int a = random.nextInt(10);
        System.out.println("a：" + a);
        user.setName(user.getName() + a);
        if (a == 8) {
            return true;
        } else {
            return false;
        }
    }

    @Data
    public static class User {
        private String name;
    }
}

上面我们实现了一个编程式的重试方案，需要注意的参数：

RetryTemplate：RetryOperations的具体实现，组合了RetryListener[]，BackOffPolicy，RetryPolicy。
RetryPolicy：重试的策略或条件，上面代码定义了重试的条件Exception异常及其子类，重试的次数2次。
BackOffPolicy：重试的回退策略，需要重试时可以定义等待时间。
RetryCallback：封装的需要重试的业务逻辑。
RecoverCallback：多次重试失败后的降级方法。

2.1 RetryPolicy策略

NeverRetryPolicy：只允许调用RetryCallback一次，不允许重试；
AlwaysRetryPolicy：允许无限重试，直到成功，此方式逻辑不当会导致死循环；
SimpleRetryPolicy：固定次数重试策略，默认重试最大次数为3次，RetryTemplate默认使用的策略；
TimeoutRetryPolicy：超时时间重试策略，默认超时时间为1秒，在指定的超时时间内允许重试；
CircuitBreakerRetryPolicy：有熔断功能的重试策略，需设置3个参数openTimeout、resetTimeout和delegate
- delegate：是真正判断是否重试的策略，配置基于次数的SimpleRetryPolicy或者基于超时的TimeoutRetryPolicy策略，且策略都是全局模式，而非局部模式，所以要注意次数或超时的配置合理性。当全局异常达到配置的次数后，断路器会开启，后续调用将会直接触发降级方案。
- openTimeout：配置熔断器电路打开的超时时间，若连续失败超过了delegate配置的失败重试次数且时间小于openTimeout，那么便会开启熔断。
- resetTimeout：熔断器重置时间，即熔断器关闭的时间。
CompositeRetryPolicy：组合重试策略，有两种组合方式，乐观组合重试策略是指只要有一个策略允许重试即可以，悲观组合重试策略是指只要有一个策略不允许重试即可以，但不管哪种组合方式，组合中的每一个策略都会执行。

2.2 BackOffPolicy策略：

NoBackOffPolicy：无退避算法策略，即当重试时是立即重试；
FixedBackOffPolicy：固定时间的退避策略，需设置参数sleeper（指定等待策略，默认是Thread.sleep，即线程休眠）、backOffPeriod（休眠时间，默认1秒）；
UniformRandomBackOffPolicy：随机时间退避策略，需设置sleeper、minBackOffPeriod、maxBackOffPeriod，该策略在[minBackOffPeriod，maxBackOffPeriod之间取一个随机休眠时间，minBackOffPeriod默认500毫秒，maxBackOffPeriod默认1500毫秒；
ExponentialBackOffPolicy：指数退避策略，需设置参数sleeper、initialInterval、maxInterval和multiplier。initialInterval指定初始休眠时间，默认100毫秒，maxInterval指定最大休眠时间，默认30秒，multiplier指定乘数，即下一次休眠时间为当前休眠时间*multiplier；
ExponentialRandomBackOffPolicy：随机指数退避策略，引入随机乘数，固定乘数可能会引起很多服务同时重试导致DDos，使用随机休眠时间来避免这种情况。

3. 源码分析

重试时，实际需要执行的源码。

public class RetryTemplate implements RetryOperations {
    protected  T doExecute(RetryCallback retryCallback,
            RecoveryCallback recoveryCallback, RetryState state)
            throws E, ExhaustedRetryException {

        RetryPolicy retryPolicy = this.retryPolicy;
        BackOffPolicy backOffPolicy = this.backOffPolicy;

        // 根据重试策略，初始化context对象（会保存重试次数等信息）
        RetryContext context = open(retryPolicy, state);
        // 将context放入到ThreadLocal中
        RetrySynchronizationManager.register(context);

        Throwable lastException = null;

        boolean exhausted = false;
        try {

            //拦截器，处理RetryListener#open（重试监听）
            boolean running = doOpenInterceptors(retryCallback, context);
            //由监听方法决定是否执行后续操作
            if (!running) {
                throw new TerminatedRetryException(
                        "Retry terminated abnormally by interceptor before first attempt");
            }

            // Get or Start the backoff context...
            BackOffContext backOffContext = null;
            Object resource = context.getAttribute("backOffContext");

            if (resource instanceof BackOffContext) {
                backOffContext = (BackOffContext) resource;
            }
            //获取策略中的start方法
            if (backOffContext == null) {
                backOffContext = backOffPolicy.start(context);
                if (backOffContext != null) {
                    context.setAttribute("backOffContext", backOffContext);
                }
            }
             //策略模式。判断是否能进行重试（注意是循环）
            while (canRetry(retryPolicy, context) && !context.isExhaustedOnly()) {

                try {
                    if (this.logger.isDebugEnabled()) {
                        this.logger.debug("Retry: count=" + context.getRetryCount());
                    }
                    // 业务逻辑，若需要重试，那么必须抛出异常
                    lastException = null;
                    return retryCallback.doWithRetry(context);
                }
                catch (Throwable e) {    //捕获异常
                    lastException = e;
                    try {
                         //遇到异常后，注册该异常的失败次数
                        registerThrowable(retryPolicy, state, context, e);
                    }
                    catch (Exception ex) {
                        throw new TerminatedRetryException("Could not register throwable",
                                ex);
                    }
                    finally {
                        //执行RetryListener#onError
                        doOnErrorInterceptors(retryCallback, context, e);
                    }
                    //如果可以重试，执行退避算法，比如休眠一小段时间后再重试
                    if (canRetry(retryPolicy, context) && !context.isExhaustedOnly()) {
                        try {
                            backOffPolicy.backOff(backOffContext);
                        }
        
                    }
                   //在有状态重试时，如果是需要执行回滚操作的异常，则立即抛出异常
                    if (shouldRethrow(retryPolicy, context, state)) {
                        throw RetryTemplate.wrapIfNecessary(e);
                    }

                }

                //如果是有状态重试，且有GLOBAL_STATE属性，则立即跳出重试终止；
　　　　      　 //当抛出的异常是非需要执行回滚操作的异常时，才会执行到此处，CircuitBreakerRetryPolicy会在此跳出循环；
                if (state != null && context.hasAttribute(GLOBAL_STATE)) {
                    break;
                }
            }
           //重试失败后，如果有RecoveryCallback，则执行此回调，否则抛出异常
            exhausted = true;
            return handleRetryExhausted(recoveryCallback, context, state);

        }
        catch (Throwable e) {
            throw RetryTemplate.wrapIfNecessary(e);
        }
        finally {
            close(retryPolicy, context, state, lastException == null || exhausted);
            //执行RetryListener#close，比如统计重试信息
            doCloseInterceptors(retryCallback, context, lastException);
            RetrySynchronizationManager.clear();
        }

    }
}

源码流程.png

4. 有状态的重试和无状态的重试

由源码可知，Spring-retry本质上是while循环调用业务逻辑（配置RetryPolicy和BackOffPolicy），无论是有状态重试和无状态重试，若某次修改了引用传递的请求对象后，后续重试均拿到的是被修改的参数对象。

那么有状态和无状态的区别是什么呢？

有状态重试需要调用该方法，多了一个RetryState请求参数。

    protected  T doExecute(RetryCallback retryCallback,
            RecoveryCallback recoveryCallback, RetryState state)


/**
 * Stateful retry is characterised by having to recognise the items that are
 * being processed, so this interface is used primarily to provide a cache key in
 * between failed attempts. It also provides a hints to the
 * {@link RetryOperations} for optimisations to do with avoidable cache hits and
 * switching to stateless retry if a rollback is not needed.
 * 
 * @author Dave Syer
 *
 */
public interface RetryState {

    /**
     * Key representing the state for a retry attempt. Stateful retry is
     * characterised by having to recognise the items that are being processed,
     * so this value is used as a cache key in between failed attempts.
     *  作用就是根据key获取缓存中的context对象。
     * @return the key that this state represents
     */
    Object getKey();

    /**
     * Indicate whether a cache lookup can be avoided. If the key is known ahead
     * of the retry attempt to be fresh (i.e. has never been seen before) then a
     * cache lookup can be avoided if this flag is true.
     *  决定是在缓存中获取还是强制获取新的context对象
     * @return true if the state does not require an explicit check for the key
     */
    boolean isForceRefresh();

    /**
     * Check whether this exception requires a rollback. The default is always
     * true, which is conservative, so this method provides an optimisation for
     * switching to stateless retry if there is an exception for which rollback
     * is unnecessary. Example usage would be for a stateful retry to specify a
     * validation exception as not for rollback.
     * 当由于某些异常需要重试时，该方法配置的异常可以停止主流程重试逻辑，直接抛出异常。
     * @param exception the exception that caused a retry attempt to fail
     * @return true if this exception should cause a rollback
     */
    boolean rollbackFor(Throwable exception);

}

在3. 源码分析中。有org.springframework.retry.support.RetryTemplate#open方法获取重试的上下文对象。

    protected RetryContext open(RetryPolicy retryPolicy, RetryState state) {
       // state为空，去生成context
        if (state == null) {
            //根据retryPolicy策略生成context对象
            return doOpenInternal(retryPolicy);
        }
       // isForceRefresh强制使用最新的context对象
        Object key = state.getKey();
        if (state.isForceRefresh()) {
            //根据retryPolicy策略生成context对象，并放入缓存
            return doOpenInternal(retryPolicy, state);
        }

        // 缓存中不存在对应的key，获取context对象
        if (!this.retryContextCache.containsKey(key)) {
            //根据retryPolicy策略生成context对象，并放入缓存
            return doOpenInternal(retryPolicy, state);
        }
        //根据key获取到缓存中的context
        RetryContext context = this.retryContextCache.get(key);
        if (context == null) {
            if (this.retryContextCache.containsKey(key)) {
                throw new RetryException(
                        "Inconsistent state for failed item: no history found. "
                                + "Consider whether equals() or hashCode() for the item might be inconsistent, "
                                + "or if you need to supply a better ItemKeyGenerator");
            }
            // The cache could have been expired in between calls to
            // containsKey(), so we have to live with this:
            return doOpenInternal(retryPolicy, state);
        }

        //清除缓存中获取context的配置。
        context.removeAttribute(RetryContext.CLOSED);
        context.removeAttribute(RetryContext.EXHAUSTED);
        context.removeAttribute(RetryContext.RECOVERED);
        return context;

    }

由此可知：有无状态是指是否缓存中获取context上下文对象。

5. 有状态重试和无状态重试的应用场景

5.1 无状态重试

无状态重试（大部分场景）：context对象保持在一个线程上下文，在一次调用过程使用完整的重试策略判断（即配置的重试策略为每一次调用享用）。
详见：2. 配置策略的案例

5.2 有状态重试

场景一、操作数据库重试时配置需要回滚的异常

该场景就是借助了RetryState的rollbackClassifier配置。

    public static Boolean vpmsRetryCoupon() {
        // 构建重试模板实例
        RetryTemplate retryTemplate = new RetryTemplate();
        // 设置重试策略，主要设置重试次数
        SimpleRetryPolicy policy = new SimpleRetryPolicy(3, Collections., Boolean>singletonMap(Exception.class, true));
        // 设置重试回退操作策略，主要设置重试间隔时间
        FixedBackOffPolicy fixedBackOffPolicy = new FixedBackOffPolicy();
        fixedBackOffPolicy.setBackOffPeriod(100);
        //填充模板
        retryTemplate.setRetryPolicy(policy);
        retryTemplate.setBackOffPolicy(fixedBackOffPolicy);

        //当前状态的名称，当把状态放入缓存时，通过该key查询获取
        Object key = "mykey";
        //是否每次都重新生成上下文还是从缓存中查询，即全局模式（如熔断器策略时从缓存中查询）
        boolean isForceRefresh = true;  //true为每次重新生成
        //对DataAccessException进行回滚
        BinaryExceptionClassifier rollbackClassifier =
                new BinaryExceptionClassifier(Collections.>singleton(DataAccessException.class));
        RetryState state = new DefaultRetryState(key, isForceRefresh, rollbackClassifier);
        
        // 通过RetryCallback 重试回调实例包装正常逻辑逻辑，第一次执行和重试执行执行的都是这段逻辑
        final RetryCallback retryCallback = new RetryCallback() {
            //RetryContext 重试操作上下文约定，统一spring-try包装
            public Boolean doWithRetry(RetryContext context) {
                //是DataAccessException子类
                throw new SQLWarningException("msg", null);//这个点特别注意，重试的根源通过Exception返回
            }
        };
        // 通过RecoveryCallback 重试流程正常结束或者达到重试上限后的退出恢复操作实例
        final RecoveryCallback recoveryCallback = new RecoveryCallback() {
            public Boolean recover(RetryContext context) throws Exception {
                System.out.println("重试失败...");
                return false;
            }
        };


        Boolean execute = false;
        try {
            // 由retryTemplate 执行execute方法开始逻辑执行
            execute = retryTemplate.execute(retryCallback, recoveryCallback, state);
        } catch (Exception e) {
            e.printStackTrace();
        }
        return execute;
    }

当抛出异常时，不会进行重试，直接将异常向外抛出（即为回滚）。

场景二、熔断器策略

注意：CircuitBreakerRetryPolicy设置的SimpleRetryPolicy为全局配置，即共享其中的失败次数，每一次请求失败后，并不会重试，而是SimpleRetryPolicy的重试+1，在openTimeout时间内多笔请求连续失败SimpleRetryPolicy配置的次数后，会直接熔断。

    @SneakyThrows
    public static void cirOpen() {

        RetryTemplate template = new RetryTemplate();
        //当第一个请求重试3次的时候，后续请求直接执行fallback方法。
        CircuitBreakerRetryPolicy retryPolicy =
                new CircuitBreakerRetryPolicy(new SimpleRetryPolicy(10));
        retryPolicy.setOpenTimeout(5000);
        retryPolicy.setResetTimeout(10000);

        FixedBackOffPolicy fixedBackOffPolicy = new FixedBackOffPolicy();
        fixedBackOffPolicy.setBackOffPeriod(0);

        template.setRetryPolicy(retryPolicy);
        template.setBackOffPolicy(fixedBackOffPolicy);
        Object key = "circuit";
        //对DataAccessException进行回滚
        BinaryExceptionClassifier rollbackClassifier =
                new BinaryExceptionClassifier(Collections.>singleton(DataAccessException.class));
        //false即为全局共享参数。
        RetryState state = new DefaultRetryState(key, false, rollbackClassifier);

        for (int i = 0; i < 10; i++) {
            try {
                User user = new User();
                user.setName(i);
                String result = template.execute(new RetryCallback() {
                    @SneakyThrows
                    @Override
                    public String doWithRetry(RetryContext context) throws RuntimeException {
                        Thread.sleep(200);
                        int name = user.getName();
                        throw new RuntimeException("timeout:" + name);
                    }
                }, new RecoveryCallback() {
                    @Override
                    public String recover(RetryContext context) throws Exception {
                        return user.getName() + ":失败降级";
                    }
                }, state);
                log.info("结果：" + result);
            } catch (Exception e) {
                System.out.println(e.getMessage());
            }
        }
    }

熔断源码分析：

理解：熔断是：5s内失败10次，那么开启熔断（熔断的恢复时间时10s）。

先判断异常的次数：

大于等于10次时：
1.1 时间大于10s，那么关闭熔断器并重置失败次数和计时时间；
1.2 时间小于5s，那么开启熔断器快速失败，并重置计时时间；
1.3 时间[5,10]内，开启熔断器快速失败；
小于10次时：
2.1 时间大于5s，那么重置失败次数和计时时间；
2.2 时间小于5s，继续执行业务逻辑（不做处理）；

若开启了熔断后，请求会快速失败，若是1.2情况，那么后续的请求间隔时间必须大于5s，否则的话，每次请求进入均重置startTime，若两次请求间隔小于openTimeout（A请求将当前时间设置为startTime，B请求立刻进入System.currentTimeMillis() - this.start依旧会小于this.openWindow），那么每次都会进入else if (time < this.openWindow)判断，直接进行熔断。

注：10次是SimpleRetryPolicy配置的失败次数；5s是openTimeout时间；10s是resetTimeout时间。

public boolean isOpen() {
    //start是new context()时获取的。
    long time = System.currentTimeMillis() - this.start;
    //根据熔断器内部策略，判断是否有重试的次数（注意次数全局共享）
    boolean retryable = this.policy.canRetry(this.context);
    //没有次数
    if (!retryable) {
        //间隔时间大于配置的resetTimeout时间，那么关闭熔断
        if (time > this.timeout) {
            logger.trace("Closing");
            //重新创建context对象（次数归0）
            this.context = createDelegateContext(policy, getParent());
            //重置startTime
            this.start = System.currentTimeMillis();
            retryable = this.policy.canRetry(this.context);
        }
        //间隔时间小于openTimeout，那么每次重置startTime，且直接指向降级方案
        else if (time < this.openWindow) {
            if ((Boolean) getAttribute(CIRCUIT_OPEN) == false) {
                logger.trace("Opening circuit");
                setAttribute(CIRCUIT_OPEN, true);
            }
            this.start = System.currentTimeMillis();
            return true;
        }
        //注意，这里是大于openTimeout下于resetTimeout，直接指向降级方案。
    } else {
        //可以理解为，配置（10s失败5次，开启熔断。但是大于10s都没失败5次，那么重新构建context对象）
        if (time > this.openWindow) {
            logger.trace("Resetting context");
            this.start = System.currentTimeMillis();
            this.context = createDelegateContext(policy, getParent());
        }
        //else-小于10s且没失败5次的情况下，放行执行业务逻辑。
    }
    if (logger.isTraceEnabled()) {
        logger.trace("Open: " + !retryable);
    }
    setAttribute(CIRCUIT_OPEN, !retryable);
    return ! retryable;
}

使用CircuitBreakerRetryPolicy策略，会为context配置state.global属性。即使配置的异常允许重试，也会在下面逻辑中跳出循环。

if (state != null && context.hasAttribute(GLOBAL_STATE)) {
    break;
}

详见上面源码分析org.springframework.retry.support.RetryTemplate#doExecute。

Spring重试（编程式）源码+应用