今天在解析json数据的时候得到了一堆这样的数据:{"errNum":0,"errMsg":"success","retData":[{"title":"\u6536\u5e9f\u54c1\u5927\u53d4\u521a\u4e0a\u53f0\uff0c\u5c31\u60e8\u906d\u8bc4\u59d4\u706d\u706f\uff0c\u4f46\u63a5\u4e0b\u6765\u5168\u573a\u90fd\u9707\u60ca\u4e86\uff01","url":"http:\/\/toutiao.com\/group\/6263036756505920002\/","abstract":"\u8ba2\u9605\u6211\u83b7\u53d6\u66f4\u591a\u7cbe\u5f69\u5185\u5bb9\uff01","image_url":"http:\/\/p1.pstatp.com\/list\/2f90009a31a7ee8bb15"}]}
这是因为,为了更好的传输中文,json进行了Unicode编码。
这样一来,我们在解析json之前,就得要先将json数据中的Unicode编码转换为我们使用的中文;
一:http请求数据返回json中string字段包含unicode的转码
public static String decodeUnicode(String theString) { char aChar; int len = theString.length(); StringBuffer outBuffer = new StringBuffer(len); for (int x = 0; x < len;) { aChar = theString.charAt(x++); if (aChar == '\\') { aChar = theString.charAt(x++); if (aChar == 'u') { // Read the xxxx int value = 0; for (int i = 0; i < 4; i++) { aChar = theString.charAt(x++); switch (aChar) { case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': value = (value << 4) + aChar - '0'; break; case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': value = (value << 4) + 10 + aChar - 'a'; break; case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': value = (value << 4) + 10 + aChar - 'A'; break; default: throw new IllegalArgumentException( "Malformed \\uxxxx encoding."); } } outBuffer.append((char) value); } else { if (aChar == 't') aChar = '\t'; else if (aChar == 'r') aChar = '\r'; else if (aChar == 'n') aChar = '\n'; else if (aChar == 'f') aChar = '\f'; outBuffer.append(aChar); } } else outBuffer.append(aChar); } return outBuffer.toString(); }二、普通string含有unicode转码方法
public static String reEncoding(String text, String newEncoding) { String str = null; try { str = new String(text.getBytes(), newEncoding); } catch (UnsupportedEncodingException e) { log.error("不支持的字符编码" + newEncoding); throw new RuntimeException(e); } return str; }
三、说一下比较奇怪的方案,测试中无意发现的,暂时没弄明白原理(有明白原理的大神,请告知一声,谢谢)
我用HttpClent的post方式获取的json数据,得到的是带Unicode码的数据,需要转换成中文才行,但是转换的时间感觉有点长,就用HttpURLConnection的get方式又试了一下,在不转码的情况下,经过gson解析后,竟然神奇的自动转换成了中文:
简直是太神奇了,而且需要的时间相对于HttpClient的post请求方式的请求和处理时间更短,所以,果断换用HttpURLConnection的get方式了
①现在先贴一下HttpURLConnection的get的方式:
@Test public void test() { try { long start = System.currentTimeMillis(); URL url = new URL("http://apis.baidu.com/songshuxiansheng/news/news"); HttpURLConnection connection = (HttpURLConnection) url.openConnection(); connection.addRequestProperty("apikey","0fc807e45a37ce264f45d169646f4a9e" ); String dataString = new String(GsonTools.IsToByte(connection.getInputStream()),"utf-8"); HeadlineJson newsJson = GsonTools.getObjectData(dataString, HeadlineJson.class); List<Headline>list = newsJson.getRetData(); System.out.println(list.toString()); long end = System.currentTimeMillis(); System.out.println("timeGap:"+(end-start)); } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } }
public static <T> T getObjectData(String jsonString, Class<T> type) { T t = null; try { Gson gson = new Gson(); t = gson.fromJson(jsonString, type); } catch (JsonSyntaxException e) { // TODO Auto-generated catch block e.printStackTrace(); } return t; }
@Test public void TestHeadLine() { long start = System.currentTimeMillis(); List<NameValuePair> params = new ArrayList<NameValuePair>(); String url = "http://apis.baidu.com/songshuxiansheng/news/news"; String jsonString = HttpUtils.getBaiDuString2(url, params); HeadlineJson lineJson = GsonTools.getObjectData(jsonString, HeadlineJson.class); System.out.println(lineJson.toString()); long end = System.currentTimeMillis(); System.out.println("timeGap:"+(end-start)); }
public static String getBaiDuString(String url,List<NameValuePair> params) { String serverDataString = null; HttpPost post = new HttpPost(url); try { post.setEntity(new UrlEncodedFormEntity(params, HTTP.UTF_8)); post.addHeader("apikey", UrlUtils.BAIDU_API_KEY); HttpClient client = new DefaultHttpClient(); HttpResponse response = client.execute(post); int code = response.getStatusLine().getStatusCode(); System.out.println("StatusCode:" + code); if (code == 200) { serverDataString = decodeUnicode(EntityUtils.toString(response.getEntity())); // serverDataString = EntityUtils.toString(response.getEntity()); System.out.println("接收字符串数据成功\nServerData:"+serverDataString); } } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } return serverDataString; }
public static String getBaiDuString2(String url,List<NameValuePair> params) { String serverDataString = null; HttpGet get = new HttpGet(url); try {get.addHeader("apikey", UrlUtils.BAIDU_API_KEY); HttpClient client = new DefaultHttpClient(); HttpResponse response = client.execute(get); int code = response.getStatusLine().getStatusCode(); System.out.println("StatusCode:" + code); if (code == 200) { // serverDataString = decodeUnicode(EntityUtils.toString(response.getEntity())); serverDataString = EntityUtils.toString(response.getEntity()); System.out.println("接收字符串数据成功\nServerData:"+serverDataString); } } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } return serverDataString; }
谷歌提供的HttpClient的通信和HttpURLConnection网络通信的时间间隔我也做了比较,明显,HttpURLConnection的请求时间更短,所以果断使用HttpURLConnection的方式
四、java中本身就提供了对Unicode 的url进行解码的方法了:
System.out.println(URLDecoder.decode("\u82f9\u679c", "utf-8"));