最近在正式环境中手机访问系统时,中文都会出现乱码的情况。顺带研究一下字符集的问题。
参考:Java 正确的做字符串编码转换
显示的时候,将unicode转换成OS的方式来显示。
gbk 一个字符串占2个字节
utf8 一个字符串占3个字节
iso-8859-1 一个字符串占1个字节
在Java中,String的getBytes()方法是得到一个操作系统默认的编码格式的字节数组。这表示在不同的操作系统下,返回的东西不一样!
1、 str.getBytes(); 如果括号中不写charset,则采用的是Sytem.getProperty("file.encoding"),即当前文件的编码方式,
2、 str.getBytes("charset");//指定charset,即将底层存储的Unicode码解析为charset编码格式的字节数组方式
乱码产生的本质上都是由于 字符串原本的编码格式 与 读取时解析用的编码格式不一致导致的。如:
System.out.println("当前文件的字符集:" + System.getProperty("file.encoding")); // GBK
String lm = null;
lm = new String("我们".getBytes("ISO-8859-1"), "UTF-8");
System.out.println(lm + "\t" + bytes2HexString(lm.getBytes())); // ?? 3F3F
lm = new String("我们".getBytes("GBK"), "UTF-8");
System.out.println(lm + "\t" + bytes2HexString(lm.getBytes())); // ???? 3F3F3F3F
new String("我们".getBytes("ISO-8859-1"), "UTF-8") 执行顺序是:
1. 由于当前文件编码是GBK,编译时先将"我们"转换成unicode。
2. 将unicode转成ISO-8859-1的字节数组
3.将这个字节数组以UTF-8的编码方式decode
此时就会乱码。
package org.wxy.demo.test;
import java.io.UnsupportedEncodingException;
import java.net.URLEncoder;
import org.junit.jupiter.api.Test;
/**
* Java 正确的做字符串编码转换
* https://blog.csdn.net/h12kjgj/article/details/73496528
*
*
* @author wang
*
*/
public class CharsetTest {
public static void main(String[] args) throws UnsupportedEncodingException {
System.out.println("当前文件的字符集:" + System.getProperty("file.encoding")); // GBK
System.out.println("\n============GBK============");
String gbk = new String("我们".getBytes("GBK"), "GBK");
// 获取UNICODE
System.out.println(bytes2HexString(gbk.getBytes())); // CED2C3C7
System.out.println(bytes2HexString(gbk.getBytes("UTF-8"))); // E68891E4BBAC
String encode = URLEncoder.encode(gbk);
System.out.println(gbk + "(默认)\t" + encode); // %CE%D2%C3%C7
encode = URLEncoder.encode(gbk, "GBK");
System.out.println(gbk + "GBK\t\t" + encode); // %CE%D2%C3%C7
encode = URLEncoder.encode(gbk, "UTF-8");
System.out.println(gbk + "UTF-8\t" + encode); // %E6%88%91%E4%BB%AC
System.out.println("\n============UTF-8============");
// getBytes 原GBK => Unicode => UTF-8
String utf = new String("我们".getBytes("UTF-8"), "UTF-8");
System.out.println(bytes2HexString(utf.getBytes())); // CED2C3C7
System.out.println(bytes2HexString(utf.getBytes("UTF-8"))); // E68891E4BBAC
encode = URLEncoder.encode(utf);
System.out.println(utf + "(默认)\t" + encode); // %CE%D2%C3%C7
encode = URLEncoder.encode(utf, "GBK");
System.out.println(utf + "GBK\t\t" + encode); // %CE%D2%C3%C7
encode = URLEncoder.encode(utf, "UTF-8");
System.out.println(utf + "UTF-8\t" + encode); // %E6%88%91%E4%BB%AC
System.out.println("\n============乱码============");
System.out.println("当前文件的字符集:" + System.getProperty("file.encoding")); // GBK
String lm = null;
lm = new String("我们".getBytes("ISO-8859-1"), "UTF-8");
System.out.println(lm + "\t" + bytes2HexString(lm.getBytes())); // ?? 3F3F
lm = new String("我们".getBytes("GBK"), "UTF-8");
System.out.println(lm + "\t" + bytes2HexString(lm.getBytes())); // ???? 3F3F3F3F
}
/*
* 字节数组转16进制字符串
*/
public static String bytes2HexString(byte[] b) {
String r = "";
for (int i = 0; i < b.length; i++) {
String hex = Integer.toHexString(b[i] & 0xFF);
if (hex.length() == 1) {
hex = '0' + hex;
}
r += hex.toUpperCase();
}
return r;
}
/*
* 16进制字符串转字节数组
*/
public static byte[] hexString2Bytes(String hex) {
if ((hex == null) || (hex.equals(""))) {
return null;
} else if (hex.length() % 2 != 0) {
return null;
} else {
hex = hex.toUpperCase();
int len = hex.length() / 2;
byte[] b = new byte[len];
char[] hc = hex.toCharArray();
for (int i = 0; i < len; i++) {
int p = 2 * i;
b[i] = (byte) (charToByte(hc[p]) << 4 | charToByte(hc[p + 1]));
}
return b;
}
}
/*
* 字符转换为字节
*/
private static byte charToByte(char c) {
return (byte) "0123456789ABCDEF".indexOf(c);
}
}
网上好多资料说tomcat默认get请求是 ISO-8859-1,但经过下面的验证好像并不是这么会事,应该是UTF-8。但也可能与我当前电脑的环境有关系,后面再在别的电脑上试一下。
当前环境
win10+jdk8+eclipsse text file encoding GBK+tomcat8.5
传入参数GBK UTF-8转码:
@Test
public void test1() throws UnsupportedEncodingException {
String str = "{\"approveComment\":\"我们\",\"approveResult\":\"0\"}";
// %7B%22approveComment%22%3A%22%CE%D2%C3%C7%22%2C%22approveResult%22%3A%220%22%7D
System.out.println(URLEncoder.encode(str, "GBK"));
// %7B%22approveComment%22%3A%22%E6%88%91%E4%BB%AC%22%2C%22approveResult%22%3A%220%22%7D
System.out.println(URLEncoder.encode(str, "UTF-8"));
// %7B%22approveComment%22%3A%22%3F%3F%22%2C%22approveResult%22%3A%220%22%7D
System.out.println(URLEncoder.encode(str, "ISO-8859-1"));
}
servlet:
@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
String keyword = req.getParameter("keyword");
System.out.println("keyword=" + keyword);
// 设置响应内容类型
resp.setContentType("text/html");
PrintWriter out = resp.getWriter();
out.println("test
");
}
1. 使用get请求传GBK编码
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%CE%D2%C3%C7%22%2C%22approveResult%22%3A%220%22%7D
输出(乱码):
keyword={"approveComment":"????","approveResult":"0"}
2. 使用get请求传UTF-8编码
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%E6%88%91%E4%BB%AC%22%2C%22approveResult%22%3A%220%22%7D
输出:
keyword = {"approveComment":"我们","approveResult":"0"}
3. 使用get请求传ISO-8859-1编码
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%3F%3F%22%2C%22approveResult%22%3A%220%22%7D
输出(乱码,为何):
keyword={"approveComment":"??","approveResult":"0"}
《为何与默认是一样的结果》
1. 使用get请求传GBK编码
keyword = {"approveComment":"??","approveResult":"0"}
1. 使用get请求传GBK编码
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%CE%D2%C3%C7%22%2C%22approveResult%22%3A%220%22%7D
输出:
keyword = {"approveComment":"我们","approveResult":"0"}
2. 使用get请求传UTF-8编码
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%E6%88%91%E4%BB%AC%22%2C%22approveResult%22%3A%220%22%7D
输出:
keyword = {"approveComment":"鎴戜滑","approveResult":"0"}
3. 使用get请求传ISO-8859-1编码
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%3F%3F%22%2C%22approveResult%22%3A%220%22%7D
输出(乱码):
keyword = {"approveComment":"??","approveResult":"0"}
1. 使用get请求传GBK编码
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%CE%D2%C3%C7%22%2C%22approveResult%22%3A%220%22%7D
输出(乱码):
keyword = {"approveComment":"????","approveResult":"0"}
2. 使用get请求传UTF-8编码
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%E6%88%91%E4%BB%AC%22%2C%22approveResult%22%3A%220%22%7D
输出(乱码):
keyword = {"approveComment":"??????","approveResult":"0"}
3. 使用get请求传ISO-8859-1编码
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%3F%3F%22%2C%22approveResult%22%3A%220%22%7D
输出(乱码,为何):
keyword = {"approveComment":"??","approveResult":"0"}