HTTPClient PostMethod 乱码问题

虽然这是篇老文,不过还是忍不住转过来看看。

 

类别标签: UTF-8 encoding http-client java 
HttpClient POST 的 UTF-8 编码问题 
Apache HttpClient ( http://jakarta.apache.org/commons/httpclient/  ) 是一个纯 Java 的HTTP 协议的客户端编程工具包, 对 HTTP 协议的支持相当全面, 更多细节也可以参考IBM 网站上的这篇文章 HttpClient入门 ( http://www-128.ibm.com/developerworks/cn/opensource/os-httpclient/  ). 

问题分析 
不 过在实际使用中, 还是发现按照最基本的方式调用 HttpClient 时, 并不支持 UTF-8 编码, 在网络上找过一些文章, 也不得要领, 于是查看了 commons-httpclient-3.0.1 的一些代码, 首先在 PostMethod 中找到了 generateRequestEntity() 方法:

 

# /**  
# * Generates a request entity from the post parameters, if present. Calls  
# * {@link EntityEnclosingMethod#generateRequestBody()} if parameters have not been set.  
# *  
# * @since 3.0  
# */   
# protected RequestEntity generateRequestEntity() {   
# if (!this.params.isEmpty()) {   
# // Use a ByteArrayRequestEntity instead of a StringRequestEntity.   
# // This is to avoid potential encoding issues. Form url encoded strings   
# // are ASCII by definition but the content type may not be. Treating the content   
# // as bytes allows us to keep the current charset without worrying about how   
# // this charset will effect the encoding of the form url encoded string.   
# String content = EncodingUtil.formUrlEncode(getParameters(), getRequestCharSet());   
# ByteArrayRequestEntity entity = new ByteArrayRequestEntity(   
# EncodingUtil.getAsciiBytes(content),   
# FORM_URL_ENCODED_CONTENT_TYPE   
# );   
# return entity;   
# } else {   
# return super.generateRequestEntity();   
# }   
# } 


 

原来使用 NameValuePair 加入的 HTTP 请求的参数最终都会转化为 RequestEntity 提交到 HTTP 服务器, 接着在 PostMethod 的父类 EntityEnclosingMethod 中找到了如下的代码:

 

   1. /**  
   2. * Returns the request's charset. The charset is parsed from the request entity's  
   3. * content type, unless the content type header has been set manually.  
   4. *  
   5. * @see RequestEntity#getContentType()  
   6. *  
   7. * @since 3.0  
   8. */   
   9. public String getRequestCharSet() {   
  10. if (getRequestHeader("Content-Type") == null) {   
  11. // check the content type from request entity   
  12. // We can't call getRequestEntity() since it will probably call   
  13. // this method.   
  14. if (this.requestEntity != null) {   
  15. return getContentCharSet(   
  16. new Header("Content-Type", requestEntity.getContentType()));   
  17. } else {   
  18. return super.getRequestCharSet();   
  19. }   
  20. } else {   
  21. return super.getRequestCharSet();   
  22. }   
  23. }   

 

 

解决方案 
从上面两段代码可以看出是 HttpClient 是如何依据 "Content-Type" 获得请求的编码(字符集), 而这个编码又是如何应用到提交内容的编码过程中去的. 按照这个原来, 其实我们只需要重载 getRequestCharSet() 方法, 返回我们需要的编码(字符集)名称, 就可以解决 UTF-8 或者其它非默认编码提交 POST 请求时的乱码问题了.

 

 

测试 
首先在 Tomcat 的 ROOT WebApp 下部署一个页面 test.jsp, 作为测试页面, 主要代码片段如下:

 

   1. <%@ page contentType="text/html;charset=UTF-8"%>   
   2. <%@ page session="false" %>   
   3. <%   
   4. request.setCharacterEncoding("UTF-8");   
   5. String val = request.getParameter("TEXT");   
   6. System.out.println(">>>> The result is " + val);   
   7. %>   
 

 

接着写一个测试类, 主要代码如下:

 

   1. public static void main(String[] args) throws Exception, IOException {   
   2. String url = "http://localhost:8080/test.jsp";   
   3. PostMethod postMethod = new UTF8PostMethod(url);   
   4. //填入各个表单域的值   
   5. NameValuePair[] data = {   
   6. new NameValuePair("TEXT", "中文"),   
   7. };   
   8. //将表单的值放入postMethod中   
   9. postMethod.setRequestBody(data);   
  10. //执行postMethod   
  11. HttpClient httpClient = new HttpClient();   
  12. httpClient.executeMethod(postMethod);   
  13. }   
  14.   
  15. //Inner class for UTF-8 support   
  16. public static class UTF8PostMethod extends PostMethod{   
  17. public UTF8PostMethod(String url){   
  18. super(url);   
  19. }   
  20. @Override   
  21. public String getRequestCharSet() {   
  22. //return super.getRequestCharSet();   
  23. return "UTF-8";   
  24. }   
  25. }   

 

运行这个测试程序, 在 Tomcat 的后台输出中可以正确打印出 ">>>> The result is 中文" .

 

 

你可能感兴趣的:(tomcat,jsp,网络协议,IBM,OpenSource)