用Apache HttpClient 4.0时强制指定响应的字符编码

前两天一段调用HTTP服务的脚本出了问题,仔细一看,发现是提供的HTTP服务在响应头里写了:
HTTP/1.1 200 OK
Server: xxxxxxxxxx
Content-Type: text/html; charset=utf-8
Connection: close
Content-Length:2014

响应的头中声明了Content-Type,其中指定了charset=utf-8;但实际上响应中的文本却是GBK编码的。这使得原本我写的请求脚本出了问题。

依赖的Apache HttpClient如下:
pom.xml:
<dependency>
  <groupId>org.apache.httpcomponents</groupId>
  <artifactId>httpcomponents-client</artifactId>
  <version>4.0</version>
</dependency>
<dependency>
  <groupId>org.apache.httpcomponents</groupId>
  <artifactId>httpcomponents-core</artifactId>
  <version>4.0.1</version>
</dependency>


原本的脚本使用 DefaultHttpClient去发起请求,并通过 EntityUtils自己实现一个与 BasicResponseHandler相似的 ResponseHandler,类似这样的:
import org.apache.http.client.HttpResponseException;
import org.apache.http.client.ResponseHandler;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.util.EntityUtils;

def httpClient = new DefaultHttpClient();
def makeResponseHandler(charset) {
  { response ->
      def statusLine = response.statusLine;
      if (statusLine.statusCode >= 300) {
        throw new HttpResponseException(statusLine.statusCode, statusLine.reasonPhrase);
      }
      
      def entity = response.entity;
      entity ? EntityUtils.toString(entity, charset) : null;
  } as ResponseHandler
}

def httpGet = new HttpGet(requestUrl);
def responseBody = httpClient.execute(httpGet, makeResponseHandler('GBK'));


原本要调用的那个HTTP服务返回的响应的头里面没有Content-Type,所以这样去使用 EntityUtils.toString(entity, defaultCharset)就已经可以达到指定解析响应内容时使用的字符编码的目的了。

问题是那个HTTP服务现在带上了错误的Content-Type,而EntityUtils.toString(entity, defaultCharset)认为Content-Type中的charset比defaultCharset更优先,此时上面的脚本就达不到强制指定字符编码的目的了。

咋办呢?最直观的当然是自己把响应的内容的byte数组拿到手,然后自己想怎么处理就怎么处理:
import org.apache.http.client.HttpResponseException;
import org.apache.http.client.ResponseHandler;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.util.EntityUtils;

def httpClient = new DefaultHttpClient();
def makeResponseHandler(charset) {
  { response ->
      def statusLine = response.statusLine;
      if (statusLine.statusCode >= 300) {
        throw new HttpResponseException(statusLine.statusCode, statusLine.reasonPhrase);
      }
      
      def entity = response.entity;
      def bytes = entity ? EntityUtils.toByteArray(entity) : null;
      bytes ? new String(bytes, charset) : null;
  } as ResponseHandler
}

def httpGet = new HttpGet(requestUrl);
def responseBody = httpClient.execute(httpGet, makeResponseHandler('GBK'));


不知道还有没有啥更好的办法呢?我对HttpClient还是太不熟悉了。

本来最好自然是提供HTTP服务的那边把响应头的信息修正,但这又要经过各种繁琐的流程,我在跟进的某工具却等不及了,只好hack一下 =_=

你可能感兴趣的:(apache,maven,脚本,rubygems,groovy)