What is HTTP Persistent Connections?
HTTP persistent connections, also called HTTP keep-alive, or HTTP connection reuse, is the idea of using the same TCP connection to send and receive multiple HTTP requests/responses, as opposed to opening a new one for every single request/response pair. Using persistent connections is very important for improving HTTP performance.
什么是HTTP长连接?
HTTP长连接,与一般每次发起http请求或响应都要建立一个tcp连接不同,http长连接利用同一个tcp连接处理多个http请求和响应,也叫HTTP keep-alive,或者http连接重用。使用http长连接可以提高http请求/响应的性能。
There are several advantages of using persistent connections, including:
Network friendly. Less network traffic due to fewer setting up and tearing down of TCP connections.
Reduced latency on subsequent request. Due to avoidance of initial TCP handshake
Long lasting connections allowing TCP sufficient time to determine the congestion state of the network, thus to react appropriately.
使用http长连接有很多好处,包括:
更少的建立和关闭tcp连接,可以减少网络流量。
因为已建立的tcp握手,减少后续请求的延时。
长时间的连接让tcp有充足的时间判断网络的拥塞情况,方便做出下步操作。
The advantages are even more obvious with HTTPS or HTTP over SSL/TLS. There, persistent connections may reduce the number of costly SSL/TLS handshake to establish security associations, in addition to the initial TCP connection set up.
In HTTP/1.1, persistent connections are the default behavior of any connection. That is, unless otherwise indicated, the client SHOULD assume that the server will maintain a persistent connection, even after error responses from the server. However, the protocol provides means for a client and a server to signal the closing of a TCP connection.
这些优点在使用https连接时更显著。可以减少多次建立高消耗的SSL/TLS握手。
在HTTP/1.1中,默认使用的是长连接方式。客户端默认服务端会保持长连接,即便返回错误响应;除非明确指示不使用长连接。同时,协议中也指定了客户端可以发送关闭信号到服务端来关闭TCP连接。
What makes a connection reusable?
Since TCP by its nature is a stream based protocol, in order to reuse an existing connection, the HTTP protocol has to have a way to indicate the end of the previous response and the beginning of the next one. Thus, it is required that all messages on the connection MUST have a self-defined message length (i.e., one not defined by closure of the connection). Self demarcation is achieved by either setting the Content-Length header, or in the case of chunked transfer encoded entity body, each chunk starts with a size, and the response body ends with a special last chunk.
怎样是连接可以重用?
因为TCP是基于流的协议,所以HTTP协议需要有一种方式来指示前一个响应的结束和后一个响应的开始来重用已建立的连接。所以,它要求连接中传输的信息必须有自定义的消息长度。自定义消息长度可以通过设置 Content-Length 消息头,若传输编码的实体内容块,则每个数据块的标明数据块的大小,而且响应体也是以一个特殊的数据块结束。
What happens if there are proxy servers in between?
Since persistent connections applies to only one transport link, it is important that proxy servers correctly signal persistent/or-non-persistent connections separately with its clients and the origin servers (or to other proxy servers). From a HTTP client or server's perspective, as far as persistence connection is concerned, the presence or absence of proxy servers is transparent.
若中间存在代理服务器将会如何?
因为长连接仅占用一条传输链路,所以代理服务器能否正确得与客户端和服务器端(或者其他代理服务器)发送长连接或非长连接的信号尤为重要。但是HTTP的客户端或服务器端来看,代理服务器对他们来说是透明的,即便长连接是需要关注的。
What does the current JDK do for Keep-Alive?
The JDK supports both HTTP/1.1 and HTTP/1.0 persistent connections.
When the application finishes reading the response body or when the application calls close() on the InputStream returned by URLConnection.getInputStream(), the JDK's HTTP protocol handler will try to clean up the connection and if successful, put the connection into a connection cache for reuse by future HTTP requests.
The support for HTTP keep-Alive is done transparently. However, it can be controlled by system properties http.keepAlive, and http.maxConnections, as well as by HTTP/1.1 specified request and response headers.
当前的JDK如何处理Keep-Alive?
JDK同时支持HTTP/1.1 和 HTTP/1.0。
当应用程序读取完响应体内容后或者调用 close() 关闭了URLConnection.getInputStream()返回的流,JDK中的HTTP协议句柄将关闭连接,并将连接放到连接缓存中,以便后面的HTTP请求使用。
对HTTP keep-Alive 的支持是透明的。但是,你也可以通过系统属性http.keepAlive和http.maxConnections以及HTTP/1.1协议中的特定的请求响应头来控制。
The system properties that control the behavior of Keep-Alive are:
http.keepAlive=<boolean>
default: true
Indicates if keep alive (persistent) connections should be supported.
http.maxConnections=<int>
default: 5
Indicates the maximum number of connections per destination to be kept alive at any given time
HTTP header that influences connection persistence is:
Connection: close
If the "Connection" header is specified with the value "close" in either the request or the response header fields, it indicates that the connection should not be considered 'persistent' after the current request/response is complete.
控制Keep-Alive表现的系统属性有:
http.keepAlive=<布尔值>
默认: true
指定长连接是否支持
http.maxConnections=<整数>
默认: 5
指定对同一个服务器保持的长连接的最大个数。
影响长连接的HTTP header是:
Connection: close
如果请求或响应中的Connection header被指定为close,表示在当前请求或响应完成后将关闭TCP连接。
The current implementation doesn't buffer the response body. Which means that the application has to finish reading the response body or call close() to abandon the rest of the response body, in order for that connection to be reused. Furthermore, current implementation will not try block-reading when cleaning up the connection, meaning if the whole response body is not available, the connection will not be reused.
JDK中的当前实现不支持缓存响应体,所以应用程序必须读取完响应体内容或者调用close()关闭流并丢弃未读内容来重用连接。此外,当前实现在清理连接时并未使用阻塞读,这就意味这如果响应体不可用,连接将不能被重用。
What's new in Tiger?
When the application encounters a HTTP 400 or 500 response, it may ignore the IOException and then may issue another HTTP request. In this case, the underlying TCP connection won't be Kept-Alive because the response body is still there to be consumed, so the socket connection is not cleared, therefore not available for reuse. What the application needs to do is call HttpURLConnection.getErrorStream() after catching the IOException , read the response body, then close the stream. However, some existing applications are not doing this. As a result, they do not benefit from persistent connections. To address this problem, we have introduced a workaround.
The workaround involves buffering the response body if the response is >=400, up to a certain amount and within a time limit, thus freeing up the underlying socket connection for reuse. The rationale behind this is that when the server responds with a >=400 error (client error or server error. One example is "404: File Not Found" error), the server usually sends a small response body to explain whom to contact and what to do to recover.
JDK1.5中的新特性
当应用接收到400或500的HTTP响应时,它将忽略IOException 而另发一个HTTP 请求。这种情况下,底层的TCP连接将不会再保持,因为响应内容还在等待被读取,socket 连接未清理,不能被重用。应用可以在捕获IOException 以后调用HttpURLConnection.getErrorStream() ,读取响应内容然后关闭流。但是现存的应用没有这么做,不能体现出长连接的优势。为了解决这个问题,介绍下workaround。
当响应体的状态码大于或等于400的时候,workaround 将在一定时间内缓存一定数量的响应内容,释放底层的socket连接来重用。基本原理是当响应状态码大于或等于400时,服务器端会发送一个简短的响应体来指明连接谁以及如何恢复连接。
Several new Sun implementation specific properties are introduced to help clean up the connections after error response from the server.
The major one is:
sun.net.http.errorstream.enableBuffering=<boolean>
default: false
With the above system property set to true (default is false), when the response code is >=400, the HTTP handler will try to buffer the response body. Thus freeing up the underlying socket connection for reuse. Thus, even if the application doesn't call getErrorStream(), read the response body, and then call close(), the underlying socket connection may still be kept-alive and reused.
The following two system properties provide further control to the error stream buffering behavior:
sun.net.http.errorstream.timeout=<int> in millisecond
default: 300 millisecond
sun.net.http.errorstream.bufferSize=<int> in bytes
default: 4096 bytes
下面介绍一些SUN实现中的特定属性来帮助接收到错误响应体后清理连接:
主要的一个是:
sun.net.http.errorstream.enableBuffering=<布尔值>
默认: false
当上面属性设置为true后,在接收到响应码大于或等于400是,HTTP 句柄将尝试缓存响应内容。释放底层的socket连接来重用。所以,即便应用不调用getErrorStream()来读取响应内容,或者调用close()关闭流,底层的socket连接也将保持连接状态。
下面的两个系统属性是为了更进一步控制错误流的缓存行为:
sun.net.http.errorstream.timeout=<int> in 毫秒
默认: 300 毫秒
sun.net.http.errorstream.bufferSize=<int> in bytes
默认: 4096 bytes
What can you do to help with Keep-Alive?
Do not abandon a connection by ignoring the response body. Doing so may results in idle TCP connections. That needs to be garbage collected when they are no longer referenced.
If getInputStream() successfully returns, read the entire response body.
When calling getInputStream() from HttpURLConnection, if an IOException occurs, catch the exception and call getErrorStream() to get the response body (if there is any).
Reading the response body cleans up the connection even if you are not interested in the response content itself. But if the response body is long and you are not interested in the rest of it after seeing the beginning, you can close the InputStream. But you need to be aware that more data could be on its way. Thus the connection may not be cleared for reuse.
Here's a code example that complies to the above recommendation:
你如何做可以保持连接为连接状态呢?
不要忽略响应体而丢弃连接。这样会是TCP连接闲置,当不再被引用后将会被垃圾回收器回收。
如果getInputStream()返回成功,读取全部响应内容。如果抛出IOException ,捕获异常并调用getErrorStream() 读取响应内容(如果存在响应内容)。
即便你对响应内容不感兴趣,也要读取它,以便清理连接。但是,如果响应内容很长,你读取到开始部分后就不感兴趣了,可以调用close()来关闭流。值得注意的是,其他部分的数据已在读取中,所以连接将不能被清理进而被重用。
下面是一个基于上面建议的代码样例:
2 URL a = new URL(args[0]);
3 URLConnection urlc = a.openConnection();
4 is = conn.getInputStream();
5 int ret = 0;
6 while ((ret = is.read(buf)) > 0) {
7 processBuf(buf);
8 }
9 // close the inputstream
10 is.close();
11} catch (IOException e) {
12 try {
13 respCode = ((HttpURLConnection)conn).getResponseCode();
14 es = ((HttpURLConnection)conn).getErrorStream();
15 int ret = 0;
16 // read the response body
17 while ((ret = es.read(buf)) > 0) {
18 processBuf(buf);
19 }
20 // close the errorstream
21 es.close();
22 } catch(IOException ex) {
23 // deal with the exception
24 }
25}
If you know ahead of time that you won't be interested in the response body, you should issue a HEAD request instead of a GET request. For example when you are only interested in the meta info of the web resource or when testing for its validity, accessibility and recent modification. Here's a code snippet:
如果你预先就对响应内容不感兴趣,你可以使用HEAD 请求来代替GET 请求。例如,获取web资源的meta信息或者测试它的有效性,可访问性以及最近的修改。下面是代码片段:
2 URLConnection urlc = a.openConnection();
3 HttpURLConnection httpc = (HttpURLConnection)urlc;
4 // only interested in the length of the resource
5 httpc.setRequestMethod( " HEAD " );
6 int len = httpc.getContentLength();