OKHttp重试机制剖析及常见异常分析

OKHttp重试机制剖析

OKHttp拥有网络连接失败时的重试功能:

OkHttp perseveres when the network is troublesome: it will silently recover from common connection problems. If your service has multiple IP addresses OkHttp will attempt alternate addresses if the first connect fails. This is necessary for IPv4+IPv6 and for services hosted in redundant data centers. OkHttp initiates new connections with modern TLS features (SNI, ALPN), and falls back to TLS 1.0 if the handshake fails.

要了解OKHttp的重试机制,我们最关心的就是RetryAndFollowUpInterceptor, 在遭遇网络异常时,OKHttp的网络异常相关的重试都在RetryAndFollowUpInterceptor完成。具体我们先从RetryAndFollowUpInterceptor的#intercept(Chain chian)方法开始入手,下面的代码片段已经去掉了非核心逻辑:

  //StreamAllocation init...
  Response priorResponse = null;
    while (true) {
      if (canceled) {
        streamAllocation.release();
        throw new IOException("Canceled");
      }

      Response response;
      boolean releaseConnection = true;
      try {
        response = realChain.proceed(request, streamAllocation, null, null);
        releaseConnection = false;
      } catch (RouteException e) {
        //socket连接阶段,如果发生连接失败,会统一封装成该异常并抛出
        `RouteException`:通过路由的尝试失败了,请求将不会被发送,此时会尝试通过调用`#recover`来恢复;
        // The attempt to connect via a route failed. The request will not have been sent.
        if (!recover(e.getLastConnectException(), false, request)) {
          throw e.getLastConnectException();
        }
        releaseConnection = false;
        continue;
      } catch (IOException e) {
        //socket连接成功后,发生请求阶段时抛出的各类网络异常
        // An attempt to communicate with a server failed. The request may have been sent.
        boolean requestSendStarted = !(e instanceof ConnectionShutdownException);
        if (!recover(e, requestSendStarted, request)) throw e;
        releaseConnection = false;
        continue;
      } finally {
        // We're throwing an unchecked exception. Release any resources.
        if (releaseConnection) {
          streamAllocation.streamFailed(null);
          streamAllocation.release();
        }
      }

接下来看核心的recover方法:

/**
   * Report and attempt to recover from a failure to communicate with a server. Returns true if
   * {@code e} is recoverable, or false if the failure is permanent. Requests with a body can only
   * be recovered if the body is buffered or if the failure occurred before the request has been
   * sent.
   */
  private boolean recover(IOException e, boolean requestSendStarted, Request userRequest) {
    streamAllocation.streamFailed(e);

    // The application layer has forbidden retries. 应用层禁止重试则不再重试
    if (!client.retryOnConnectionFailure()) return false;

    // We can't send the request body again. 如果请求已经发出,并且请求的body不支持重试则不再重试
    if (requestSendStarted && userRequest.body() instanceof UnrepeatableRequestBody) return false;

    // This exception is fatal. //致命错误
    if (!isRecoverable(e, requestSendStarted)) return false;

    // No more routes to attempt. 没有更多route发起重试
    if (!streamAllocation.hasMoreRoutes()) return false;

    // For failure recovery, use the same route selector with a new connection.
    return true;
  }

在该方法中,首先是通过调用streamAllocation.streamFailed(e)来记录该次异常,进而在RouteDatabase中记录错误的route以降低优先级,避免下次相同address的请求依然使用这个失败过的route。如果没有更多可用的连接线路则不能重试连接

public final class RouteDatabase {
  private final Set failedRoutes = new LinkedHashSet<>();

  /** Records a failure connecting to {@code failedRoute}. */
  public synchronized void failed(Route failedRoute) {
    failedRoutes.add(failedRoute);
  }

  /** Records success connecting to {@code route}. */
  public synchronized void connected(Route route) {
    failedRoutes.remove(route);
  }

  /** Returns true if {@code route} has failed recently and should be avoided. */
  public synchronized boolean shouldPostpone(Route route) {
    return failedRoutes.contains(route);
  }
}

接着我们重点再关注isRecoverable方法:

  private boolean isRecoverable(IOException e, boolean requestSendStarted) {
    // If there was a protocol problem, don't recover.  协议错误不再重试
    if (e instanceof ProtocolException) {
      return false;
    }

    // If there was an interruption don't recover, but if there was a timeout connecting to a route
    // we should try the next route (if there is one)
    if (e instanceof InterruptedIOException) {
      return e instanceof SocketTimeoutException && !requestSendStarted;
    }

    // Look for known client-side or negotiation errors that are unlikely to be fixed by trying
    // again with a different route.
    if (e instanceof SSLHandshakeException) {
      // If the problem was a CertificateException from the X509TrustManager,
      // do not retry.
      if (e.getCause() instanceof CertificateException) {
        return false;
      }
    }
//使用 HostnameVerifier 来验证 host 是否合法,如果不合法会抛出 SSLPeerUnverifiedException
 // 握手HandShake#getSeesion 抛出的异常,属于握手过程中的一环
    if (e instanceof SSLPeerUnverifiedException) {
      // e.g. a certificate pinning error.
      return false;
    }

    // An example of one we might want to retry with a different route is a problem connecting to a
    // proxy and would manifest as a standard IOException. Unless it is one we know we should not
    // retry, we return true and try a new route.
    return true;
  }

常见网络异常分析:

UnknowHostException

产生原因:
  • 网络中断
  • DNS 服务器故障
  • 域名解析劫持
解决办法:
  • HttpDNS
  • 合理的兜底策略

![Uploading image_079055.png . . .]

InterruptedIOException

产生原因:
  • 请求读写阶段,请求线程被中断
解决办法:
  • 检查是否符合业务逻辑

SocketTimeoutException

产生原因:
  • 带宽低、延迟高
  • 路径拥堵、服务端负载吃紧
  • 路由节点临时异常
解决办法:
  • 合理设置重试
  • 切换ip重试

要特别注意: 请求时因为读写超时等原因产生的SocketTimeoutException,OkHttp内部是不会重试的

OKHttp重试机制剖析及常见异常分析_第1张图片
sockettiemout.jpg

因此如果app层特别关心该异常,则应该自定义intercetors,对该异常进行特殊处理。

SSLHandshakeException

产生原因:
  • Tls协议协商失败/握手格式不兼容
  • 办法服务器证书的CA未知
  • 服务器证书不是由CA签名的,而是自签名
  • 服务器配置缺少中间CA(不完整的证书链)
  • 服务器主机名不匹配(SNI);
  • 遭遇了中间人攻击。
解决办法:
  • 指定SNI
  • 证书锁定
  • 降级Http。。。
  • 联系SA

SSLPeerUnverifiedException

产生原因:
  • 证书域名校验错误
解决办法:
  • 指定SNI
  • 证书锁定
  • 降级Http。。。
  • 联系SA

你可能感兴趣的:(OKHttp重试机制剖析及常见异常分析)