OkHttp请求时Keep-Alive无法生效问题修复记录

一、问题情形还原

    在一次列表请求数据慢的优化过程中,当时的场景是用一系列的接口请求一个列表数据,前期使用Android Studio的Profiler工具,排查了一些耗时接口,并做了代码层的优化,虽然做完优化后,速度提升近+50%了,但是在体验上还是有些不理想。于是开始对比同样的接口在别的端的加载速度后,发现非首次请求的速度简直没法跟人家比。
    于是乎,开始用抓包工具,结合网络库log进行排查,发现虽然在发起网络请求时header增加了"Connection":"Keep-Alive",使用charles抓包工具查看请求,实际上却没有保持连接,每次都会重建建立网络连接,这个过程需要进行SSL握手,很耗时,导致整个数据请求延迟挺高。
Charles抓包如下图:

  1. 第一次请求


    第一次请求

    第一次网络请求,Keep-Alive未生效是符合预期的

    第一次进行网络请求,会走TCP的三次握手建立连接没有问题。

  2. 第二次请求


    第一次网络请求,Keep-Alive还未生效不符合预期

    如果说keep-alive生效了,抓包的Kept-Alive不会是No,而且第二次请求也不应该需要重新握手建立连接才能请求-应答数据,第一次接口请求时间为201ms,第二次请求为256ms,可见是没有复用连接的,而且两次请求间隔很短。

可见虽然接口请求头添加了Keep-Alive,但是仍然没有达到连接复用。

二、问题定位过程

    跟进OKHttp请求的代码,发现是因为OkHttp的StreamAllocation类中从ConnectionPool中查找可复用的条件一直为false导致每次都重新建立连接,即使接口请求时请求头带上了Connection:Keep-Alive也是无法生效的,因此网络连接并没有复用。
    对于OKHttp来说,同步请求和异步请求都是在RealCall中发起的。如下:

1. OkHttp请求的发起

Call call = httpClient.newCall(request);
Response res = call.execute();

2. RealCall.execute():

@Override public Response execute() throws IOException {
  synchronized (this) {
    if (executed) throw new IllegalStateException("Already Executed");
    executed = true;
  }
  transmitter.timeoutEnter();
  transmitter.callStart();
  try {
    client.dispatcher().executed(this);
    return getResponseWithInterceptorChain();
  } finally {
    client.dispatcher().finished(this);
  }
}

3. getResponseWithInterceptorChain(): 调起OKHttp的拦截器发起网络请求:

  Response getResponseWithInterceptorChain() throws IOException {
    // Build a full stack of interceptors.
    List interceptors = new ArrayList<>();
    interceptors.addAll(client.interceptors());
    interceptors.add(new RetryAndFollowUpInterceptor(client));
    interceptors.add(new BridgeInterceptor(client.cookieJar()));
    interceptors.add(new CacheInterceptor(client.internalCache()));
    interceptors.add(new ConnectInterceptor(client));
    if (!forWebSocket) {
      interceptors.addAll(client.networkInterceptors());
    }
    interceptors.add(new CallServerInterceptor(forWebSocket));

    Interceptor.Chain chain = new RealInterceptorChain(interceptors, transmitter, null, 0,
        originalRequest, this, client.connectTimeoutMillis(),
        client.readTimeoutMillis(), client.writeTimeoutMillis());

    boolean calledNoMoreExchanges = false;
    try {
      Response response = chain.proceed(originalRequest);
      if (transmitter.isCanceled()) {
        closeQuietly(response);
        throw new IOException("Canceled");
      }
      return response;
    } catch (IOException e) {
      calledNoMoreExchanges = true;
      throw transmitter.noMoreExchanges(e);
    } finally {
      if (!calledNoMoreExchanges) {
        transmitter.noMoreExchanges(null);
      }
    }
  }

拦截的作用,这里列举一下,详细可以看看关于OkHttp源码分析的文章:

拦截器 作用
RetryAndFollowUpInterceptor 重试拦截器,负责网络请求中的重试和重定向,比如网络请求过程中出现异常,就会重试请求。
BridgeInterceptor 桥接拦截器,用于桥接应用层和网络层的数据,请求时将应用层的数据类型转换为网络层的数据类型,响应时则将网络层返回的数据类型转换为应用层的数据类型。
CacheInterceptor 缓存拦截器,负责读取和更新缓存,可以配置自定义的缓存拦截器。
ConnectInterceptor 网络连接拦截器,其内部会获取一个连接。
CallServerInterceptor 请求服务拦截器,拦截器链的最后的拦截器,用于向服务端发送数据并获取响应。

既然是网络连接方向的问题,那直接先看ConnectInterceptor拦截器的实现:

4. StreamAllocation#newStream():

//才从ConnectionPool中查找RealConnection对象
RealConnection resultConnection = findHealthyConnection(connectTimeout, readTimeout,
    writeTimeout, pingIntervalMillis, connectionRetryEnabled, doExtensiveHealthChecks);
HttpCodec resultCodec = resultConnection.newCodec(client, chain, this);
synchronized (connectionPool) {
  codec = resultCodec;
  return resultCodec;
}

5. StreamAllocation#findHealthyConnection():

  private RealConnection findHealthyConnection(int connectTimeout, int readTimeout,
      int writeTimeout, int pingIntervalMillis, boolean connectionRetryEnabled,
      boolean doExtensiveHealthChecks) throws IOException {
    while (true) {
      //查找
      RealConnection candidate = findConnection(connectTimeout, readTimeout, writeTimeout,
          pingIntervalMillis, connectionRetryEnabled);

      // If this is a brand new connection, we can skip the extensive health checks.
      synchronized (connectionPool) {
        if (candidate.successCount == 0) {
          return candidate;
        }
      }
      if (!candidate.isHealthy(doExtensiveHealthChecks)) {
        noNewStreams();
        continue;
      }
      return candidate;
    }
  }

6. StreamAllocation#findConnection():

  private RealConnection findConnection(int connectTimeout, int readTimeout, int writeTimeout,
      int pingIntervalMillis, boolean connectionRetryEnabled) throws IOException {
      //根据变量名就知道,这个变量代表是否查找到已经存入连接缓存池中的RealConnection对象,
      //跟着这个变量的赋值对于逻辑的理解很关键。
    boolean foundPooledConnection = false; 
    RealConnection result = null;
    Route selectedRoute = null;
    Connection releasedConnection;
    Socket toClose;
    synchronized (connectionPool) {
      releasedConnection = this.connection;
      toClose = releaseIfNoNewStreams();
      if (this.connection != null) {
        result = this.connection;
        releasedConnection = null;
      }
      ...
      if (result == null) {
        //(1) 从连接池中查找已存在的连接,下面开始判断connection是否为null,可见get()方法中有对
        //connection的赋值逻辑,但是是通过第三个参数,将StreamAllocation的this引用传递出去了,
        //然后直接通过this引用给connection变量赋值。
        Internal.instance.get(connectionPool, address, this, null);
        if (connection != null) {
          foundPooledConnection = true; //这里说明是查找到连接池中已缓存的连接了
          result = connection;
        } else {
          selectedRoute = route;
        }
      }
    }
    closeQuietly(toClose);
    if (releasedConnection != null) {
      eventListener.connectionReleased(call, releasedConnection);
    }
    if (foundPooledConnection) { 
    //这里可以在二次封装OKHttp时回调本次请求是连接复用了,
    //可以用于检测网络请求的情况
      eventListener.connectionAcquired(call, result);
    }
    //如果查找到了前年result会被赋值为从ConnectionPool中查找到的connection对象,
    //然后直接返回
    if (result != null) {
      return result;
    }
    ...
  }

(1)Internal.instance.get(connectionPool, address, this, null)的实现:
Internal类中的instance变量是public的,直接查instance被赋值的位置可发现,他是在OKHttpClient类的static代码块中初始化的:
okhttp3.OkHttpClient:

public class OkHttpClient implements Cloneable, Call.Factory, WebSocket.Factory {
  static {
    Internal.instance = new Internal() {
       //get()方法的实现只是转调到了ConnectionPool的get()方法
      @Override public RealConnection get(ConnectionPool pool, Address address,
          StreamAllocation streamAllocation, Route route) {
        return pool.get(address, streamAllocation, route);
      }

      @Override public boolean equalsNonHost(Address a, Address b) {
        return a.equalsNonHost(b);
      }

      @Override public void put(ConnectionPool pool, RealConnection connection) {
        pool.put(connection);
      }

      @Override public RouteDatabase routeDatabase(ConnectionPool connectionPool) {
        return connectionPool.routeDatabase;
      }

      @Override public int code(Response.Builder responseBuilder) {
        return responseBuilder.code;
      }

      @Override public boolean isInvalidHttpUrlHost(IllegalArgumentException e) {
        return e.getMessage().startsWith(HttpUrl.Builder.INVALID_HOST);
      }

      @Override public StreamAllocation streamAllocation(Call call) {
        return ((RealCall) call).streamAllocation();
      }

      @Override public Call newWebSocketCall(OkHttpClient client, Request originalRequest) {
        return RealCall.newRealCall(client, originalRequest, true);
      }
    };
  }
}

下面直接看ConnectionPool#get():

  @Nullable RealConnection get(Address address, StreamAllocation streamAllocation, Route route) {
    assert (Thread.holdsLock(this));
    for (RealConnection connection : connections) {
      //从这里可以看到只有RealConnection的isEligible()返回true才能返回connection对象
      if (connection.isEligible(address, route)) {
        streamAllocation.acquire(connection, true);
        return connection;
      }
    }
    return null;
  }

7. 继续看RealConnection#isEligible():

  public boolean isEligible(Address address, @Nullable Route route) {
    // 
    if (allocations.size() >= allocationLimit || noNewStreams) return false;

    //(1)关键是这里的判断逻辑
    if (!Internal.instance.equalsNonHost(this.route.address(), address)) return false;

    // If the host exactly matches, we're done: this connection can carry the address.
    if (address.url().host().equals(this.route().address().url().host())) {
      return true; // This connection is a perfect match.
    }
    // 1. This connection must be HTTP/2.
    if (http2Connection == null) return false;
    return true; // The caller's address can be carried by this connection.
  }

(1)位置是Internal.instance.equalsNonHost(),从前面可知实现是在OKHttpClient类中:

@Override public boolean equalsNonHost(Address a, Address b) {
      return a.equalsNonHost(b);
 }

8. 关键位置和解决办法:Address#equalsNonHost():

  //要达到连接复用,就必须保证下面的对象的equals结果都是true才行,挨个排查哪个有问题,
  //然后重写其equals方法即可
  boolean equalsNonHost(Address that) {
    return this.dns.equals(that.dns) //false
        && this.proxyAuthenticator.equals(that.proxyAuthenticator)
        && this.protocols.equals(that.protocols)
        && this.connectionSpecs.equals(that.connectionSpecs)
        && this.proxySelector.equals(that.proxySelector)
        && equal(this.proxy, that.proxy)
        && equal(this.sslSocketFactory, that.sslSocketFactory) //false
        && equal(this.hostnameVerifier, that.hostnameVerifier)
        && equal(this.certificatePinner, that.certificatePinner)
        && this.url().port() == that.url().port();
  }

依次查看每个类的equals()类,发现前5个类的equals都是直接调用实现类的equals方法了,这样就可以调试下哪个equals添加不过导致的了,如果未自定义实现类,则直接调用的是Object的equals实现的,直接对比引用的。像dns、proxySelector、sslSocketFactory、hostnameVerifier、certificatePinner就可能被重写了。
我这里的场景是dns和sslSocketFactor有被自定义,但是没重写equals方法,导致这里的equalsNonHost()逻辑没过:

实际问题中调试下,根据自己的项目,看看哪个实现类没有重写equals导致equalsNonHost()返回false,然后重写对应自定义实现类的equals方法即可。

综合上面的源码分析,可以得到下面的UML调用时序图:


ConnectInterceptor的UML时序图

我本次遇到的是dns和socketFactory被重写后equals没通过,修改如下:

public class CustomDns implements Dns {
    //dns拦截器
    List dnsInterceptorList = new ArrayList<>();

    public void addDnsInterceptor(DnsInterceptor dnsInterceptor) {
       ...
    }

    @Override
    public List lookup(String hostname) throws UnknownHostException {
        ...
    }
    
    //重写equals和hashCode方法
    @Override
    public boolean equals(Object o) {
        if (this == o) {
            return true;
        }
        if (o == null || getClass() != o.getClass()) {
            return false;
        }
        CustomDns that = (CustomDns) o;
        return Objects.equals(dnsInterceptorList, that.dnsInterceptorList);
    }
    
    @Override
    public int hashCode() {
        return Objects.hash(dnsInterceptorList);
    }
}

对于sslSocketFactory来说,OKHttpClient默认的实现是:
okhttp3.OkHttpClient#OkHttpClient(okhttp3.OkHttpClient.Builder):

OkHttpClient(Builder builder) {
  ...
  if (builder.sslSocketFactory != null || !isTLS) {
    this.sslSocketFactory = builder.sslSocketFactory;
    this.certificateChainCleaner = builder.certificateChainCleaner;
  } else {
    X509TrustManager trustManager = Util.platformTrustManager();
    //socketFactory不在OKHttpClientBuilder里面传入自定义的,则会走到这里
    this.sslSocketFactory = newSslSocketFactory(trustManager);
    this.certificateChainCleaner = CertificateChainCleaner.get(trustManager);
  }
  if (sslSocketFactory != null) {
    Platform.get().configureSslSocketFactory(sslSocketFactory);
  }
  ...
}

继续看okhttp3.OkHttpClient#newSslSocketFactory()实现如下:

private static SSLSocketFactory newSslSocketFactory(X509TrustManager trustManager) {
  try {
    SSLContext sslContext = Platform.get().getSSLContext();
    sslContext.init(null, new TrustManager[] { trustManager }, null);
    //(1)
    return sslContext.getSocketFactory();
  } catch (GeneralSecurityException e) {
    throw new AssertionError("No System TLS", e); // The system has no TLS. Just give up.
  }
}

(1)这里默认每次getSocketFactory()其实都是new出来的不一样的对象实例,所以equals返回false,SSLSocetFactory是给OkHttp建立Socket连接时使用的,其实可以用单例实现的,所以改成单例即可解决。
修改如下:

public class HttpsHelper {
    private static SSLSocketFactory sSocketFactory;
    //TRUST_MANAGER: javax.net.ssl.X509TrustManager
    clientBuilder.sslSocketFactory(getSSLSocketFactory(), TRUST_MANAGER);
    //将SocketFactory对象保持单例即可
    private static SSLSocketFactory getSSLSocketFactory() {
        if(sSocketFactory == null) {
            synchronized (HttpsHelper.class) {
                if (sSocketFactory == null) {
                    SSLContext sslContext = Platform.get().getSSLContext();
                    try {
                        sslContext.init(null, new TrustManager[]{TRUST_MANAGER}, new SecureRandom());
                        sSocketFactory = sslContext.getSocketFactory();
                    } catch (KeyManagementException e) {
                    }
                }
            }
        }
    return sSocketFactory;
}
}

改完后测试非首次没有再进行重新SSL握手,整个接口的耗时的提升还是挺明显的,基本从原来的230~240减少到100以内了。

三、小结

(1)Http协议从1.1版本默认支持保持连接,短时间内发起相同域名的请求会复用保持的连接,但是并不是请求的header增加了Connection:Keep-Alive就能真正实现保持连接的效果,还是需要相应的网络层逻辑去实现,比如OKHttp的网络请求,网络连接是用ConnectInterceptor的拦截器实现的,底层最终都是Socket实现的,上层是用一个叫ConectionPool的缓存池,维护着一个个Socket连接队列。基本结构如下:

public final class ConnectionPool {
  private static final Executor executor = new ThreadPoolExecutor(0 ,
      Integer.MAX_VALUE, 60L, TimeUnit.SECONDS,
      new SynchronousQueue(), Util.threadFactory("OkHttp ConnectionPool", true));
  private final int maxIdleConnections;
  private final long keepAliveDurationNs;
  //用一个队列的结构缓存连接
  private final Deque connections = new ArrayDeque<>();
  //默认
  public ConnectionPool() {
    this(5, 5, TimeUnit.MINUTES);
  }
  public ConnectionPool(int maxIdleConnections, long keepAliveDuration, TimeUnit timeUnit) {
   this.maxIdleConnections = maxIdleConnections;
   this.keepAliveDurationNs = timeUnit.toNanos(keepAliveDuration);
   // Put a floor on the keep alive duration, otherwise cleanup will spin loop.
   if (keepAliveDuration <= 0) {
    throw new IllegalArgumentException("keepAliveDuration <= 0: " + keepAliveDuration);
  }
}
}

(2)网络层的问题,要多结合抓包,其实也不全是服务端的问题,很多问题其实也能从客户端层面去优化,但是增加了连接复用,要和服务端同步改动,核对是有对接口造成负载问题。不过,一般不会对服务器增加资源占用,毕竟是HTTP协议本身的能力,像我这边反而会减少频繁的SSL握手过程,能一定程度上减少服务器的CPU占用。
(3)对开源框架的源码,一定要去熟悉,再看的过程中要尽量带着问题去看,有目的性的,效果会好一些,像我之前漫无目地的看,其实理解起来反而挺吃力的。

最后给出一些跟网络请求的优化建议:

  • 1.尽量减少网络接口请求,能不用接口则不要请求,能合并的则合并;
  • 2.采用IP直连方式,开启Keep_Alive复用已有连接,减少网络重新建立连接耗时;
  • 3.业务场景的水平拆分,无依赖接口尽量并行请求;
  • 4.大量数据的接口采用分页(分批)方式请求;
  • 5.不要频繁(循环)序列化,非网络缓存相关尽量用Parcelable,而非Serializable;
  • 6.思考产品逻辑,是真的要请求很多接口,而且顺序请求吗?能否优化交互?
  • 7.功能改动,拓展,关注对性能的影响

你可能感兴趣的:(OkHttp请求时Keep-Alive无法生效问题修复记录)