前两天一段调用HTTP服务的脚本出了问题,仔细一看,发现是提供的HTTP服务在响应头里写了:
HTTP/1.1 200 OK
Server: xxxxxxxxxx
Content-Type: text/html; charset=utf-8
Connection: close
Content-Length:2014
响应的头中声明了Content-Type,其中指定了charset=utf-8;但实际上响应中的文本却是GBK编码的。这使得原本我写的请求脚本出了问题。
依赖的Apache HttpClient如下:
pom.xml:
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpcomponents-client</artifactId>
<version>4.0</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpcomponents-core</artifactId>
<version>4.0.1</version>
</dependency>
原本的脚本使用
DefaultHttpClient去发起请求,并通过
EntityUtils自己实现一个与
BasicResponseHandler相似的
ResponseHandler,类似这样的:
import org.apache.http.client.HttpResponseException;
import org.apache.http.client.ResponseHandler;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.util.EntityUtils;
def httpClient = new DefaultHttpClient();
def makeResponseHandler(charset) {
{ response ->
def statusLine = response.statusLine;
if (statusLine.statusCode >= 300) {
throw new HttpResponseException(statusLine.statusCode, statusLine.reasonPhrase);
}
def entity = response.entity;
entity ? EntityUtils.toString(entity, charset) : null;
} as ResponseHandler
}
def httpGet = new HttpGet(requestUrl);
def responseBody = httpClient.execute(httpGet, makeResponseHandler('GBK'));
原本要调用的那个HTTP服务返回的响应的头里面没有Content-Type,所以这样去使用
EntityUtils.toString(entity, defaultCharset)就已经可以达到指定解析响应内容时使用的字符编码的目的了。
问题是那个HTTP服务现在带上了错误的Content-Type,而EntityUtils.toString(entity, defaultCharset)认为Content-Type中的charset比defaultCharset更优先,此时上面的脚本就达不到强制指定字符编码的目的了。
咋办呢?最直观的当然是自己把响应的内容的byte数组拿到手,然后自己想怎么处理就怎么处理:
import org.apache.http.client.HttpResponseException;
import org.apache.http.client.ResponseHandler;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.util.EntityUtils;
def httpClient = new DefaultHttpClient();
def makeResponseHandler(charset) {
{ response ->
def statusLine = response.statusLine;
if (statusLine.statusCode >= 300) {
throw new HttpResponseException(statusLine.statusCode, statusLine.reasonPhrase);
}
def entity = response.entity;
def bytes = entity ? EntityUtils.toByteArray(entity) : null;
bytes ? new String(bytes, charset) : null;
} as ResponseHandler
}
def httpGet = new HttpGet(requestUrl);
def responseBody = httpClient.execute(httpGet, makeResponseHandler('GBK'));
不知道还有没有啥更好的办法呢?我对HttpClient还是太不熟悉了。
本来最好自然是提供HTTP服务的那边把响应头的信息修正,但这又要经过各种繁琐的流程,我在跟进的某工具却等不及了,只好hack一下 =_=