OKHttp重试机制剖析

OKHttp拥有网络连接失败时的重试功能：

OkHttp perseveres when the network is troublesome: it will silently recover from common connection problems. If your service has multiple IP addresses OkHttp will attempt alternate addresses if the first connect fails. This is necessary for IPv4+IPv6 and for services hosted in redundant data centers. OkHttp initiates new connections with modern TLS features (SNI, ALPN), and falls back to TLS 1.0 if the handshake fails.

要了解OKHttp的重试机制，我们最关心的就是RetryAndFollowUpInterceptor，在遭遇网络异常时，OKHttp的网络异常相关的重试都在RetryAndFollowUpInterceptor完成。具体我们先从RetryAndFollowUpInterceptor的#intercept(Chain chian)方法开始入手，下面的代码片段已经去掉了非核心逻辑：

  //StreamAllocation init...
  Response priorResponse = null;
    while (true) {
      if (canceled) {
        streamAllocation.release();
        throw new IOException("Canceled");
      }

      Response response;
      boolean releaseConnection = true;
      try {
        response = realChain.proceed(request, streamAllocation, null, null);
        releaseConnection = false;
      } catch (RouteException e) {
        //socket连接阶段，如果发生连接失败，会统一封装成该异常并抛出
        `RouteException`：通过路由的尝试失败了，请求将不会被发送，此时会尝试通过调用`#recover`来恢复；
        // The attempt to connect via a route failed. The request will not have been sent.
        if (!recover(e.getLastConnectException(), false, request)) {
          throw e.getLastConnectException();
        }
        releaseConnection = false;
        continue;
      } catch (IOException e) {
        //socket连接成功后，发生请求阶段时抛出的各类网络异常
        // An attempt to communicate with a server failed. The request may have been sent.
        boolean requestSendStarted = !(e instanceof ConnectionShutdownException);
        if (!recover(e, requestSendStarted, request)) throw e;
        releaseConnection = false;
        continue;
      } finally {
        // We're throwing an unchecked exception. Release any resources.
        if (releaseConnection) {
          streamAllocation.streamFailed(null);
          streamAllocation.release();
        }
      }

接下来看核心的recover方法：

/**
   * Report and attempt to recover from a failure to communicate with a server. Returns true if
   * {@code e} is recoverable, or false if the failure is permanent. Requests with a body can only
   * be recovered if the body is buffered or if the failure occurred before the request has been
   * sent.
   */
  private boolean recover(IOException e, boolean requestSendStarted, Request userRequest) {
    streamAllocation.streamFailed(e);

    // The application layer has forbidden retries. 应用层禁止重试则不再重试
    if (!client.retryOnConnectionFailure()) return false;

    // We can't send the request body again. 如果请求已经发出，并且请求的body不支持重试则不再重试
    if (requestSendStarted && userRequest.body() instanceof UnrepeatableRequestBody) return false;

    // This exception is fatal. //致命错误
    if (!isRecoverable(e, requestSendStarted)) return false;

    // No more routes to attempt. 没有更多route发起重试
    if (!streamAllocation.hasMoreRoutes()) return false;

    // For failure recovery, use the same route selector with a new connection.
    return true;
  }

在该方法中，首先是通过调用streamAllocation.streamFailed(e)来记录该次异常，进而在RouteDatabase中记录错误的route以降低优先级，避免下次相同address的请求依然使用这个失败过的route。如果没有更多可用的连接线路则不能重试连接

public final class RouteDatabase {
  private final Set failedRoutes = new LinkedHashSet<>();

  /** Records a failure connecting to {@code failedRoute}. */
  public synchronized void failed(Route failedRoute) {
    failedRoutes.add(failedRoute);
  }

  /** Records success connecting to {@code route}. */
  public synchronized void connected(Route route) {
    failedRoutes.remove(route);
  }

  /** Returns true if {@code route} has failed recently and should be avoided. */
  public synchronized boolean shouldPostpone(Route route) {
    return failedRoutes.contains(route);
  }
}

接着我们重点再关注isRecoverable方法：

  private boolean isRecoverable(IOException e, boolean requestSendStarted) {
    // If there was a protocol problem, don't recover.  协议错误不再重试
    if (e instanceof ProtocolException) {
      return false;
    }

    // If there was an interruption don't recover, but if there was a timeout connecting to a route
    // we should try the next route (if there is one)
    if (e instanceof InterruptedIOException) {
      return e instanceof SocketTimeoutException && !requestSendStarted;
    }

    // Look for known client-side or negotiation errors that are unlikely to be fixed by trying
    // again with a different route.
    if (e instanceof SSLHandshakeException) {
      // If the problem was a CertificateException from the X509TrustManager,
      // do not retry.
      if (e.getCause() instanceof CertificateException) {
        return false;
      }
    }
//使用 HostnameVerifier 来验证 host 是否合法，如果不合法会抛出 SSLPeerUnverifiedException
 // 握手HandShake#getSeesion 抛出的异常，属于握手过程中的一环
    if (e instanceof SSLPeerUnverifiedException) {
      // e.g. a certificate pinning error.
      return false;
    }

    // An example of one we might want to retry with a different route is a problem connecting to a
    // proxy and would manifest as a standard IOException. Unless it is one we know we should not
    // retry, we return true and try a new route.
    return true;
  }

常见网络异常分析：

UnknowHostException

产生原因：

网络中断
DNS 服务器故障
域名解析劫持

解决办法：

HttpDNS
合理的兜底策略

![Uploading image_079055.png . . .]

InterruptedIOException

产生原因：

请求读写阶段，请求线程被中断

解决办法：

检查是否符合业务逻辑

SocketTimeoutException

产生原因：

带宽低、延迟高
路径拥堵、服务端负载吃紧
路由节点临时异常

解决办法：

合理设置重试
切换ip重试

要特别注意：请求时因为读写超时等原因产生的SocketTimeoutException，OkHttp内部是不会重试的

sockettiemout.jpg

因此如果app层特别关心该异常，则应该自定义intercetors，对该异常进行特殊处理。

SSLHandshakeException

产生原因：

Tls协议协商失败/握手格式不兼容
办法服务器证书的CA未知
服务器证书不是由CA签名的，而是自签名
服务器配置缺少中间CA（不完整的证书链）
服务器主机名不匹配（SNI）；
遭遇了中间人攻击。

解决办法：

指定SNI
证书锁定
降级Http。。。
联系SA

SSLPeerUnverifiedException

产生原因：

证书域名校验错误

解决办法：

指定SNI
证书锁定
降级Http。。。
联系SA

OKHttp重试机制剖析及常见异常分析

OKHttp重试机制剖析

常见网络异常分析：

UnknowHostException

产生原因：

解决办法：

InterruptedIOException

产生原因：

解决办法：

SocketTimeoutException

产生原因：

解决办法：

SSLHandshakeException

产生原因：

解决办法：

SSLPeerUnverifiedException

产生原因：

解决办法：

你可能感兴趣的:(OKHttp重试机制剖析及常见异常分析)