spring cloud feign调用超时重试retryer

  • 认识一下Retryer接口
  • 认识一下RetryableException异常
  • 认识一下FeignException异常
  • 实际中我们是如何来应用的

Retry 接口 简单介绍

  • 通过下面的源码,Retry接口继承了Cloneable接口。

  • Retry接口里面有一个方法叫continueOrPropagate,参数是一个RetryableException重试异常的对象,返回值为void类型

  • Retry接口还有 一个clone()方法,返回类型是Retryer

  • 该接口里面有个静态内部类Default,并且实现了Retryer接口

    • 该类有一个默认构造函数,还有一个有参数的构造函数

源码如下:

package feign;

import static java.util.concurrent.TimeUnit.SECONDS;

对于克隆每次调用`Client.execute(Request, Request.Options)` 实现可以保持状态,以确定是否重试操作应该继续。
public interface Retryer extends Cloneable {

  /**
   * 如果重试被允许,返回(睡觉后可能)。 否则传播例外。
   */
  void continueOrPropagate(RetryableException e);

  Retryer clone();

  public static class Default implements Retryer {

    // 最大重试次数
    private final int maxAttempts;
    // 重试的间隔
    private final long period;
    // 最大重试间隔
    private final long maxPeriod;
    int attempt;
    long sleptForMillis;

    // Default类的默认无参构造函数,
    // 重试间隔100 ms,最大重试间隔1s,最大重试次数默认5次
    public Default() {
      this(100, SECONDS.toMillis(1), 5);
    }

    // 重试间隔,最大重试间隔,最大重试次数,attempt默认是1
    public Default(long period, long maxPeriod, int maxAttempts) {
      this.period = period;
      this.maxPeriod = maxPeriod;
      this.maxAttempts = maxAttempts;
      this.attempt = 1;
    }

    // visible for testing;
    protected long currentTimeMillis() {
      return System.currentTimeMillis();
    }

    // 重写了Retryer的方法continueOrPropagate
    public void continueOrPropagate(RetryableException e) {
      // 如果重试的次数attempt大于最大重试次数,则抛出重试异常对象RetryableException
      if (attempt++ >= maxAttempts) {
        throw e;
      }

      long interval;
      if (e.retryAfter() != null) {
        interval = e.retryAfter().getTime() - currentTimeMillis();
        if (interval > maxPeriod) {
          interval = maxPeriod;
        }
        if (interval < 0) {
          return;
        }
      } else {
        interval = nextMaxInterval();
      }
      try {
        Thread.sleep(interval);
      } catch (InterruptedException ignored) {
        Thread.currentThread().interrupt();
      }
      sleptForMillis += interval;
    }

    /**
     * 计算时间间隔为重试尝试。 的间隔呈指数增加每次尝试,在nextInterval * = 1.5(其中,1.5是回退因子)的速率,在最大间隔。
     * @return 时间从现在纳秒,直到下一次尝试。
     */
    long nextMaxInterval() {
      long interval = (long) (period * Math.pow(1.5, attempt - 1));
      return interval > maxPeriod ? maxPeriod : interval;
    }

    @Override
    public Retryer clone() {
      return new Default(period, maxPeriod, maxAttempts);
    }
  }

  /**
   * 实现永不重试请求。 它传播RetryableException
   */
  Retryer NEVER_RETRY = new Retryer() {

    @Override
    public void continueOrPropagate(RetryableException e) {
      throw e;
    }

    @Override
    public Retryer clone() {
      return this;
    }
  };
}

RetryableException简单介绍

  • 该异常继承FeignException,也是一个RuntimeException
  • 里面有一个定义的Long类型的变量retryAfter
  • 该类有两个构造函数,分别是:
    • RetryableException(String message, Throwable cause, Date retryAfter)
    • RetryableException(String message, Date retryAfter)
  • 该类还有一个无参数的方法,叫做retryAfter,会返回一个Date类型

源码如下:

package feign;

import java.util.Date;

/**
 * 当引发此异常Response被认为是可重试,通常经由feign.codec.ErrorDecoder当status是503
 */
public class RetryableException extends FeignException {

  private static final long serialVersionUID = 1L;

  private final Long retryAfter;

  /**
   * retryAfter -通常对应于Util.RETRY_AFTER报头。
   */
  public RetryableException(String message, Throwable cause, Date retryAfter) {
    super(message, cause);
    this.retryAfter = retryAfter != null ? retryAfter.getTime() : null;
  }

  /**
   * retryAfter -通常对应于Util.RETRY_AFTER报头。
   */
  public RetryableException(String message, Date retryAfter) {
    super(message);
    this.retryAfter = retryAfter != null ? retryAfter.getTime() : null;
  }

  /**
   * http->503 服务不可用
   * 有时对应于Util.RETRY_AFTER存在于报头503的状态。 其他的时间就从专用响应解析。 空如果不明
   */
  public Date retryAfter() {
    return retryAfter != null ? new Date(retryAfter) : null;
  }
}

FeignException 简单介绍

  • 该类继承了RuntimeException
  • 有一个int类型的私有变量status,用来表示HTTP的状态码
  • 有三个方法,分别是:
    • errorReading(Request request, Response ignored, IOException cause)
    • errorStatus(String methodKey, Response response)
    • errorExecuting(Request request, IOException cause)
  • 主要异常是I/O类的可以进行重试,404无重试效果

源码如下:

package feign;

import java.io.IOException;

import static java.lang.String.format;

public class FeignException extends RuntimeException {

  private static final long serialVersionUID = 0;
  // HTTP status
  private int status;

  protected FeignException(String message, Throwable cause) {
    super(message, cause);
  }

  protected FeignException(String message) {
    super(message);
  }

  protected FeignException(int status, String message) {
    super(message);
    this.status = status;
  }

  public int status() {
    return this.status;
  }

  static FeignException errorReading(Request request, Response ignored, IOException cause) {
    return new FeignException(
        format("%s reading %s %s", cause.getMessage(), request.method(), request.url()),
        cause);
  }

  public static FeignException errorStatus(String methodKey, Response response) {
    String message = format("status %s reading %s", response.status(), methodKey);
    try {
      if (response.body() != null) {
        String body = Util.toString(response.body().asReader());
        message += "; content:\n" + body;
      }
    } catch (IOException ignored) { // NOPMD
    }
    return new FeignException(response.status(), message);
  }

  static FeignException errorExecuting(Request request, IOException cause) {
    return new RetryableException(
        format("%s executing %s %s", cause.getMessage(), request.method(), request.url()), cause,
        null);
  }
}

如何在项目中应用重试机制?

在上面的介绍中,可以知道Retryer接口,Default类,重试异常类RetryerException,我们可以通过重写Retryer接口的方法continueOrPropagate来实现重试,比如:

@Slf4j
public class ConnectTimeoutRetryer extends Retryer.Default {
    Supplier<Stream<String>> streamSupplier = () -> Stream.of("connect timed out");

    public ConnectTimeoutRetryer(){
        super();
    }

    @Override
    public void continueOrPropagate(RetryableException e) {
				// 在kibana上可以分析prd上由于feign超时,都会在cause里面有connect time out关键字,因此这里做判断,如果异常原因里面都不是connect time out的,会打印ConnectTimeoutRetryerFeign failed,并抛出RetryableException对象e
        if (streamSupplier.get().noneMatch(x -> e.getCause().getMessage().contains(x))) {
            log.warn("ConnectTimeoutRetryerFeign failed", e);
            throw e;
        }
        log.error("begin to retry:{} ,{}" , e.getMessage(), e);
        super.continueOrPropagate(e);
    }

    //重写retryer的clone方法
    @Override
    public Retryer clone() {
        return new ConnectTimeoutRetryer();
    }
}

我们这个方案,主要是解决,各个微服务的feign调用之间超时问题,比如网络不稳定等原因导致的。

下面是重试时的堆栈信息:

2020-05-28 21:17:08,954 [hystrix-zis-zzzz-193] ERROR [com.xxxx.common.service.share.feign.ConnectTimeoutRetryer] [?:?] [trace=xxx,span=xxx] - begin to retry:connect timed out executing POST http://xxx.com/search/rrr ,{} feign.RetryableException: connect timed out executing POST http://xxx.com/search/rrr at feign.FeignException.errorExecuting(FeignException.java:67) at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:104) at feign.SynchronousMethodHandler.invoke(SynchronousMethodHandler.java:76) at feign.hystrix.HystrixInvocationHandler$1.run(HystrixInvocationHandler.java:108) at com.netflix.hystrix.HystrixCommand$2.call(HystrixCommand.java:302) at com.netflix.hystrix.HystrixCommand$2.call(HystrixCommand.java:298) at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:46) at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:35) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30) at rx.Observable.unsafeSubscribe(Observable.java:10211) at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:51) at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:35) at rx.Observable.unsafeSubscribe(Observable.java:10211) at rx.internal.operators.OnSubscribeDoOnEach.call(OnSubscribeDoOnEach.java:41) at rx.internal.operators.OnSubscribeDoOnEach.call(OnSubscribeDoOnEach.java:30) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30) at rx.Observable.unsafeSubscribe(Observable.java:10211) at rx.internal.operators.OperatorSubscribeOn$1.call(OperatorSubscribeOn.java:94) at com.netflix.hystrix.strategy.concurrency.HystrixContexSchedulerAction$1.call(HystrixContexSchedulerAction.java:56) at com.netflix.hystrix.strategy.concurrency.HystrixContexSchedulerAction$1.call(HystrixContexSchedulerAction.java:47) at org.springframework.cloud.sleuth.instrument.hystrix.SleuthHystrixConcurrencyStrategy$HystrixTraceCallable.call(SleuthHystrixConcurrencyStrategy.java:188) at com.netflix.hystrix.strategy.concurrency.HystrixContexSchedulerAction.call(HystrixContexSchedulerAction.java:69) at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at sun.net.NetworkClient.doConnect(NetworkClient.java:175) at sun.net.www.http.HttpClient.openServer(HttpClient.java:463) at sun.net.www.http.HttpClient.openServer(HttpClient.java:558) at sun.net.www.http.HttpClient.<init>(HttpClient.java:242) at sun.net.www.http.HttpClient.New(HttpClient.java:339) at sun.net.www.http.HttpClient.New(HttpClient.java:357) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1220) at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:984) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1334) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1309) at feign.Client$Default.convertAndSend(Client.java:133) at feign.Client$Default.execute(Client.java:73) at org.springframework.cloud.sleuth.instrument.web.client.feign.TraceFeignClient.execute(TraceFeignClient.java:92) at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:97) ... 32 common frames omitted

缺点:该方案是可以解决各个微服务之间feign调用超时的问题,但是Supplier> streamSupplier = () -> Stream.of("connect timed out");灵活度不够,只有堆栈cause中有connect time out的时候才会抛出重试异常RetryerException去进行重试。
spring cloud feign调用超时重试retryer_第1张图片

你可能感兴趣的:(spring)