一、问题情形还原
在一次列表请求数据慢的优化过程中,当时的场景是用一系列的接口请求一个列表数据,前期使用Android Studio的Profiler工具,排查了一些耗时接口,并做了代码层的优化,虽然做完优化后,速度提升近+50%了,但是在体验上还是有些不理想。于是开始对比同样的接口在别的端的加载速度后,发现非首次请求的速度简直没法跟人家比。
于是乎,开始用抓包工具,结合网络库log进行排查,发现虽然在发起网络请求时header增加了"Connection":"Keep-Alive",使用charles抓包工具查看请求,实际上却没有保持连接,每次都会重建建立网络连接,这个过程需要进行SSL握手,很耗时,导致整个数据请求延迟挺高。
Charles抓包如下图:
-
第一次请求
第一次进行网络请求,会走TCP的三次握手建立连接没有问题。
-
第二次请求
如果说keep-alive生效了,抓包的Kept-Alive不会是No,而且第二次请求也不应该需要重新握手建立连接才能请求-应答数据,第一次接口请求时间为201ms,第二次请求为256ms,可见是没有复用连接的,而且两次请求间隔很短。
可见虽然接口请求头添加了Keep-Alive,但是仍然没有达到连接复用。
二、问题定位过程
跟进OKHttp请求的代码,发现是因为OkHttp的StreamAllocation类中从ConnectionPool中查找可复用的条件一直为false导致每次都重新建立连接,即使接口请求时请求头带上了Connection:Keep-Alive也是无法生效的,因此网络连接并没有复用。
对于OKHttp来说,同步请求和异步请求都是在RealCall中发起的。如下:
1. OkHttp请求的发起
Call call = httpClient.newCall(request);
Response res = call.execute();
2. RealCall.execute():
@Override public Response execute() throws IOException {
synchronized (this) {
if (executed) throw new IllegalStateException("Already Executed");
executed = true;
}
transmitter.timeoutEnter();
transmitter.callStart();
try {
client.dispatcher().executed(this);
return getResponseWithInterceptorChain();
} finally {
client.dispatcher().finished(this);
}
}
3. getResponseWithInterceptorChain(): 调起OKHttp的拦截器发起网络请求:
Response getResponseWithInterceptorChain() throws IOException {
// Build a full stack of interceptors.
List interceptors = new ArrayList<>();
interceptors.addAll(client.interceptors());
interceptors.add(new RetryAndFollowUpInterceptor(client));
interceptors.add(new BridgeInterceptor(client.cookieJar()));
interceptors.add(new CacheInterceptor(client.internalCache()));
interceptors.add(new ConnectInterceptor(client));
if (!forWebSocket) {
interceptors.addAll(client.networkInterceptors());
}
interceptors.add(new CallServerInterceptor(forWebSocket));
Interceptor.Chain chain = new RealInterceptorChain(interceptors, transmitter, null, 0,
originalRequest, this, client.connectTimeoutMillis(),
client.readTimeoutMillis(), client.writeTimeoutMillis());
boolean calledNoMoreExchanges = false;
try {
Response response = chain.proceed(originalRequest);
if (transmitter.isCanceled()) {
closeQuietly(response);
throw new IOException("Canceled");
}
return response;
} catch (IOException e) {
calledNoMoreExchanges = true;
throw transmitter.noMoreExchanges(e);
} finally {
if (!calledNoMoreExchanges) {
transmitter.noMoreExchanges(null);
}
}
}
拦截的作用,这里列举一下,详细可以看看关于OkHttp源码分析的文章:
拦截器 | 作用 |
---|---|
RetryAndFollowUpInterceptor | 重试拦截器,负责网络请求中的重试和重定向,比如网络请求过程中出现异常,就会重试请求。 |
BridgeInterceptor | 桥接拦截器,用于桥接应用层和网络层的数据,请求时将应用层的数据类型转换为网络层的数据类型,响应时则将网络层返回的数据类型转换为应用层的数据类型。 |
CacheInterceptor | 缓存拦截器,负责读取和更新缓存,可以配置自定义的缓存拦截器。 |
ConnectInterceptor | 网络连接拦截器,其内部会获取一个连接。 |
CallServerInterceptor | 请求服务拦截器,拦截器链的最后的拦截器,用于向服务端发送数据并获取响应。 |
既然是网络连接方向的问题,那直接先看ConnectInterceptor拦截器的实现:
4. StreamAllocation#newStream():
//才从ConnectionPool中查找RealConnection对象
RealConnection resultConnection = findHealthyConnection(connectTimeout, readTimeout,
writeTimeout, pingIntervalMillis, connectionRetryEnabled, doExtensiveHealthChecks);
HttpCodec resultCodec = resultConnection.newCodec(client, chain, this);
synchronized (connectionPool) {
codec = resultCodec;
return resultCodec;
}
5. StreamAllocation#findHealthyConnection():
private RealConnection findHealthyConnection(int connectTimeout, int readTimeout,
int writeTimeout, int pingIntervalMillis, boolean connectionRetryEnabled,
boolean doExtensiveHealthChecks) throws IOException {
while (true) {
//查找
RealConnection candidate = findConnection(connectTimeout, readTimeout, writeTimeout,
pingIntervalMillis, connectionRetryEnabled);
// If this is a brand new connection, we can skip the extensive health checks.
synchronized (connectionPool) {
if (candidate.successCount == 0) {
return candidate;
}
}
if (!candidate.isHealthy(doExtensiveHealthChecks)) {
noNewStreams();
continue;
}
return candidate;
}
}
6. StreamAllocation#findConnection():
private RealConnection findConnection(int connectTimeout, int readTimeout, int writeTimeout,
int pingIntervalMillis, boolean connectionRetryEnabled) throws IOException {
//根据变量名就知道,这个变量代表是否查找到已经存入连接缓存池中的RealConnection对象,
//跟着这个变量的赋值对于逻辑的理解很关键。
boolean foundPooledConnection = false;
RealConnection result = null;
Route selectedRoute = null;
Connection releasedConnection;
Socket toClose;
synchronized (connectionPool) {
releasedConnection = this.connection;
toClose = releaseIfNoNewStreams();
if (this.connection != null) {
result = this.connection;
releasedConnection = null;
}
...
if (result == null) {
//(1) 从连接池中查找已存在的连接,下面开始判断connection是否为null,可见get()方法中有对
//connection的赋值逻辑,但是是通过第三个参数,将StreamAllocation的this引用传递出去了,
//然后直接通过this引用给connection变量赋值。
Internal.instance.get(connectionPool, address, this, null);
if (connection != null) {
foundPooledConnection = true; //这里说明是查找到连接池中已缓存的连接了
result = connection;
} else {
selectedRoute = route;
}
}
}
closeQuietly(toClose);
if (releasedConnection != null) {
eventListener.connectionReleased(call, releasedConnection);
}
if (foundPooledConnection) {
//这里可以在二次封装OKHttp时回调本次请求是连接复用了,
//可以用于检测网络请求的情况
eventListener.connectionAcquired(call, result);
}
//如果查找到了前年result会被赋值为从ConnectionPool中查找到的connection对象,
//然后直接返回
if (result != null) {
return result;
}
...
}
(1)Internal.instance.get(connectionPool, address, this, null)的实现:
Internal类中的instance变量是public的,直接查instance被赋值的位置可发现,他是在OKHttpClient类的static代码块中初始化的:
okhttp3.OkHttpClient:
public class OkHttpClient implements Cloneable, Call.Factory, WebSocket.Factory {
static {
Internal.instance = new Internal() {
//get()方法的实现只是转调到了ConnectionPool的get()方法
@Override public RealConnection get(ConnectionPool pool, Address address,
StreamAllocation streamAllocation, Route route) {
return pool.get(address, streamAllocation, route);
}
@Override public boolean equalsNonHost(Address a, Address b) {
return a.equalsNonHost(b);
}
@Override public void put(ConnectionPool pool, RealConnection connection) {
pool.put(connection);
}
@Override public RouteDatabase routeDatabase(ConnectionPool connectionPool) {
return connectionPool.routeDatabase;
}
@Override public int code(Response.Builder responseBuilder) {
return responseBuilder.code;
}
@Override public boolean isInvalidHttpUrlHost(IllegalArgumentException e) {
return e.getMessage().startsWith(HttpUrl.Builder.INVALID_HOST);
}
@Override public StreamAllocation streamAllocation(Call call) {
return ((RealCall) call).streamAllocation();
}
@Override public Call newWebSocketCall(OkHttpClient client, Request originalRequest) {
return RealCall.newRealCall(client, originalRequest, true);
}
};
}
}
下面直接看ConnectionPool#get():
@Nullable RealConnection get(Address address, StreamAllocation streamAllocation, Route route) {
assert (Thread.holdsLock(this));
for (RealConnection connection : connections) {
//从这里可以看到只有RealConnection的isEligible()返回true才能返回connection对象
if (connection.isEligible(address, route)) {
streamAllocation.acquire(connection, true);
return connection;
}
}
return null;
}
7. 继续看RealConnection#isEligible():
public boolean isEligible(Address address, @Nullable Route route) {
//
if (allocations.size() >= allocationLimit || noNewStreams) return false;
//(1)关键是这里的判断逻辑
if (!Internal.instance.equalsNonHost(this.route.address(), address)) return false;
// If the host exactly matches, we're done: this connection can carry the address.
if (address.url().host().equals(this.route().address().url().host())) {
return true; // This connection is a perfect match.
}
// 1. This connection must be HTTP/2.
if (http2Connection == null) return false;
return true; // The caller's address can be carried by this connection.
}
(1)位置是Internal.instance.equalsNonHost(),从前面可知实现是在OKHttpClient类中:
@Override public boolean equalsNonHost(Address a, Address b) {
return a.equalsNonHost(b);
}
8. 关键位置和解决办法:Address#equalsNonHost():
//要达到连接复用,就必须保证下面的对象的equals结果都是true才行,挨个排查哪个有问题,
//然后重写其equals方法即可
boolean equalsNonHost(Address that) {
return this.dns.equals(that.dns) //false
&& this.proxyAuthenticator.equals(that.proxyAuthenticator)
&& this.protocols.equals(that.protocols)
&& this.connectionSpecs.equals(that.connectionSpecs)
&& this.proxySelector.equals(that.proxySelector)
&& equal(this.proxy, that.proxy)
&& equal(this.sslSocketFactory, that.sslSocketFactory) //false
&& equal(this.hostnameVerifier, that.hostnameVerifier)
&& equal(this.certificatePinner, that.certificatePinner)
&& this.url().port() == that.url().port();
}
依次查看每个类的equals()类,发现前5个类的equals都是直接调用实现类的equals方法了,这样就可以调试下哪个equals添加不过导致的了,如果未自定义实现类,则直接调用的是Object的equals实现的,直接对比引用的。像dns、proxySelector、sslSocketFactory、hostnameVerifier、certificatePinner就可能被重写了。
我这里的场景是dns和sslSocketFactor有被自定义,但是没重写equals方法,导致这里的equalsNonHost()逻辑没过:
实际问题中调试下,根据自己的项目,看看哪个实现类没有重写equals导致equalsNonHost()返回false,然后重写对应自定义实现类的equals方法即可。
综合上面的源码分析,可以得到下面的UML调用时序图:
我本次遇到的是dns和socketFactory被重写后equals没通过,修改如下:
public class CustomDns implements Dns {
//dns拦截器
List dnsInterceptorList = new ArrayList<>();
public void addDnsInterceptor(DnsInterceptor dnsInterceptor) {
...
}
@Override
public List lookup(String hostname) throws UnknownHostException {
...
}
//重写equals和hashCode方法
@Override
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}
CustomDns that = (CustomDns) o;
return Objects.equals(dnsInterceptorList, that.dnsInterceptorList);
}
@Override
public int hashCode() {
return Objects.hash(dnsInterceptorList);
}
}
对于sslSocketFactory来说,OKHttpClient默认的实现是:
okhttp3.OkHttpClient#OkHttpClient(okhttp3.OkHttpClient.Builder):
OkHttpClient(Builder builder) {
...
if (builder.sslSocketFactory != null || !isTLS) {
this.sslSocketFactory = builder.sslSocketFactory;
this.certificateChainCleaner = builder.certificateChainCleaner;
} else {
X509TrustManager trustManager = Util.platformTrustManager();
//socketFactory不在OKHttpClientBuilder里面传入自定义的,则会走到这里
this.sslSocketFactory = newSslSocketFactory(trustManager);
this.certificateChainCleaner = CertificateChainCleaner.get(trustManager);
}
if (sslSocketFactory != null) {
Platform.get().configureSslSocketFactory(sslSocketFactory);
}
...
}
继续看okhttp3.OkHttpClient#newSslSocketFactory()实现如下:
private static SSLSocketFactory newSslSocketFactory(X509TrustManager trustManager) {
try {
SSLContext sslContext = Platform.get().getSSLContext();
sslContext.init(null, new TrustManager[] { trustManager }, null);
//(1)
return sslContext.getSocketFactory();
} catch (GeneralSecurityException e) {
throw new AssertionError("No System TLS", e); // The system has no TLS. Just give up.
}
}
(1)这里默认每次getSocketFactory()其实都是new出来的不一样的对象实例,所以equals返回false,SSLSocetFactory是给OkHttp建立Socket连接时使用的,其实可以用单例实现的,所以改成单例即可解决。
修改如下:
public class HttpsHelper {
private static SSLSocketFactory sSocketFactory;
//TRUST_MANAGER: javax.net.ssl.X509TrustManager
clientBuilder.sslSocketFactory(getSSLSocketFactory(), TRUST_MANAGER);
//将SocketFactory对象保持单例即可
private static SSLSocketFactory getSSLSocketFactory() {
if(sSocketFactory == null) {
synchronized (HttpsHelper.class) {
if (sSocketFactory == null) {
SSLContext sslContext = Platform.get().getSSLContext();
try {
sslContext.init(null, new TrustManager[]{TRUST_MANAGER}, new SecureRandom());
sSocketFactory = sslContext.getSocketFactory();
} catch (KeyManagementException e) {
}
}
}
}
return sSocketFactory;
}
}
改完后测试非首次没有再进行重新SSL握手,整个接口的耗时的提升还是挺明显的,基本从原来的230~240减少到100以内了。
三、小结
(1)Http协议从1.1版本默认支持保持连接,短时间内发起相同域名的请求会复用保持的连接,但是并不是请求的header增加了Connection:Keep-Alive就能真正实现保持连接的效果,还是需要相应的网络层逻辑去实现,比如OKHttp的网络请求,网络连接是用ConnectInterceptor的拦截器实现的,底层最终都是Socket实现的,上层是用一个叫ConectionPool的缓存池,维护着一个个Socket连接队列。基本结构如下:
public final class ConnectionPool {
private static final Executor executor = new ThreadPoolExecutor(0 ,
Integer.MAX_VALUE, 60L, TimeUnit.SECONDS,
new SynchronousQueue(), Util.threadFactory("OkHttp ConnectionPool", true));
private final int maxIdleConnections;
private final long keepAliveDurationNs;
//用一个队列的结构缓存连接
private final Deque connections = new ArrayDeque<>();
//默认
public ConnectionPool() {
this(5, 5, TimeUnit.MINUTES);
}
public ConnectionPool(int maxIdleConnections, long keepAliveDuration, TimeUnit timeUnit) {
this.maxIdleConnections = maxIdleConnections;
this.keepAliveDurationNs = timeUnit.toNanos(keepAliveDuration);
// Put a floor on the keep alive duration, otherwise cleanup will spin loop.
if (keepAliveDuration <= 0) {
throw new IllegalArgumentException("keepAliveDuration <= 0: " + keepAliveDuration);
}
}
}
(2)网络层的问题,要多结合抓包,其实也不全是服务端的问题,很多问题其实也能从客户端层面去优化,但是增加了连接复用,要和服务端同步改动,核对是有对接口造成负载问题。不过,一般不会对服务器增加资源占用,毕竟是HTTP协议本身的能力,像我这边反而会减少频繁的SSL握手过程,能一定程度上减少服务器的CPU占用。
(3)对开源框架的源码,一定要去熟悉,再看的过程中要尽量带着问题去看,有目的性的,效果会好一些,像我之前漫无目地的看,其实理解起来反而挺吃力的。
最后给出一些跟网络请求的优化建议:
- 1.尽量减少网络接口请求,能不用接口则不要请求,能合并的则合并;
- 2.采用IP直连方式,开启Keep_Alive复用已有连接,减少网络重新建立连接耗时;
- 3.业务场景的水平拆分,无依赖接口尽量并行请求;
- 4.大量数据的接口采用分页(分批)方式请求;
- 5.不要频繁(循环)序列化,非网络缓存相关尽量用Parcelable,而非Serializable;
- 6.思考产品逻辑,是真的要请求很多接口,而且顺序请求吗?能否优化交互?
- 7.功能改动,拓展,关注对性能的影响