java字符串编码转换及在tomcat中的应用

最近在正式环境中手机访问系统时,中文都会出现乱码的情况。顺带研究一下字符集的问题。

参考:Java 正确的做字符串编码转换

字符串编码转换

java文件编译时,JVM按照文件的编码方式解析成字符,然后转换为unicode格式的字节数组。 那么不论源码文件是什么格式,同样的字符串,最后得到的unicode字节数组是完全一致的,

显示的时候,将unicode转换成OS的方式来显示。

gbk  一个字符串占2个字节
utf8 一个字符串占3个字节
iso-8859-1 一个字符串占1个字节

getBytes 方法的作用

在Java中,String的getBytes()方法是得到一个操作系统默认的编码格式的字节数组。这表示在不同的操作系统下,返回的东西不一样!
1、 str.getBytes();  如果括号中不写charset,则采用的是Sytem.getProperty("file.encoding"),即当前文件的编码方式, 

2、 str.getBytes("charset");//指定charset,即将底层存储的Unicode码解析为charset编码格式的字节数组方式 

乱码

乱码产生的本质上都是由于 字符串原本的编码格式 与 读取时解析用的编码格式不一致导致的。如:

System.out.println("当前文件的字符集:" + System.getProperty("file.encoding")); // GBK
String lm = null;
lm = new String("我们".getBytes("ISO-8859-1"), "UTF-8");
System.out.println(lm + "\t" + bytes2HexString(lm.getBytes())); // ?? 3F3F
lm = new String("我们".getBytes("GBK"), "UTF-8");
System.out.println(lm + "\t" + bytes2HexString(lm.getBytes())); // ???? 3F3F3F3F

new String("我们".getBytes("ISO-8859-1"), "UTF-8") 执行顺序是:

1. 由于当前文件编码是GBK,编译时先将"我们"转换成unicode。

2. 将unicode转成ISO-8859-1的字节数组

3.将这个字节数组以UTF-8的编码方式decode

此时就会乱码。

示例

package org.wxy.demo.test;

import java.io.UnsupportedEncodingException;
import java.net.URLEncoder;

import org.junit.jupiter.api.Test;

/**
 * Java 正确的做字符串编码转换
* https://blog.csdn.net/h12kjgj/article/details/73496528 * * * @author wang * */ public class CharsetTest { public static void main(String[] args) throws UnsupportedEncodingException { System.out.println("当前文件的字符集:" + System.getProperty("file.encoding")); // GBK System.out.println("\n============GBK============"); String gbk = new String("我们".getBytes("GBK"), "GBK"); // 获取UNICODE System.out.println(bytes2HexString(gbk.getBytes())); // CED2C3C7 System.out.println(bytes2HexString(gbk.getBytes("UTF-8"))); // E68891E4BBAC String encode = URLEncoder.encode(gbk); System.out.println(gbk + "(默认)\t" + encode); // %CE%D2%C3%C7 encode = URLEncoder.encode(gbk, "GBK"); System.out.println(gbk + "GBK\t\t" + encode); // %CE%D2%C3%C7 encode = URLEncoder.encode(gbk, "UTF-8"); System.out.println(gbk + "UTF-8\t" + encode); // %E6%88%91%E4%BB%AC System.out.println("\n============UTF-8============"); // getBytes 原GBK => Unicode => UTF-8 String utf = new String("我们".getBytes("UTF-8"), "UTF-8"); System.out.println(bytes2HexString(utf.getBytes())); // CED2C3C7 System.out.println(bytes2HexString(utf.getBytes("UTF-8"))); // E68891E4BBAC encode = URLEncoder.encode(utf); System.out.println(utf + "(默认)\t" + encode); // %CE%D2%C3%C7 encode = URLEncoder.encode(utf, "GBK"); System.out.println(utf + "GBK\t\t" + encode); // %CE%D2%C3%C7 encode = URLEncoder.encode(utf, "UTF-8"); System.out.println(utf + "UTF-8\t" + encode); // %E6%88%91%E4%BB%AC System.out.println("\n============乱码============"); System.out.println("当前文件的字符集:" + System.getProperty("file.encoding")); // GBK String lm = null; lm = new String("我们".getBytes("ISO-8859-1"), "UTF-8"); System.out.println(lm + "\t" + bytes2HexString(lm.getBytes())); // ?? 3F3F lm = new String("我们".getBytes("GBK"), "UTF-8"); System.out.println(lm + "\t" + bytes2HexString(lm.getBytes())); // ???? 3F3F3F3F } /* * 字节数组转16进制字符串 */ public static String bytes2HexString(byte[] b) { String r = ""; for (int i = 0; i < b.length; i++) { String hex = Integer.toHexString(b[i] & 0xFF); if (hex.length() == 1) { hex = '0' + hex; } r += hex.toUpperCase(); } return r; } /* * 16进制字符串转字节数组 */ public static byte[] hexString2Bytes(String hex) { if ((hex == null) || (hex.equals(""))) { return null; } else if (hex.length() % 2 != 0) { return null; } else { hex = hex.toUpperCase(); int len = hex.length() / 2; byte[] b = new byte[len]; char[] hc = hex.toCharArray(); for (int i = 0; i < len; i++) { int p = 2 * i; b[i] = (byte) (charToByte(hc[p]) << 4 | charToByte(hc[p + 1])); } return b; } } /* * 字符转换为字节 */ private static byte charToByte(char c) { return (byte) "0123456789ABCDEF".indexOf(c); } }

tomcat

网上好多资料说tomcat默认get请求是 ISO-8859-1,但经过下面的验证好像并不是这么会事,应该是UTF-8。但也可能与我当前电脑的环境有关系,后面再在别的电脑上试一下。

当前环境

win10+jdk8+eclipsse text file encoding GBK+tomcat8.5


server.xml文件(默认)


传入参数GBK UTF-8转码:

	@Test
	public void test1() throws UnsupportedEncodingException {
		String str = "{\"approveComment\":\"我们\",\"approveResult\":\"0\"}";
		// %7B%22approveComment%22%3A%22%CE%D2%C3%C7%22%2C%22approveResult%22%3A%220%22%7D
		System.out.println(URLEncoder.encode(str, "GBK"));

		// %7B%22approveComment%22%3A%22%E6%88%91%E4%BB%AC%22%2C%22approveResult%22%3A%220%22%7D
		System.out.println(URLEncoder.encode(str, "UTF-8"));

		// %7B%22approveComment%22%3A%22%3F%3F%22%2C%22approveResult%22%3A%220%22%7D
		System.out.println(URLEncoder.encode(str, "ISO-8859-1"));
	}

servlet:

@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
	String keyword = req.getParameter("keyword");
	System.out.println("keyword=" + keyword);

	// 设置响应内容类型
	resp.setContentType("text/html");

	PrintWriter out = resp.getWriter();
	out.println("

test

"); }

1. 使用get请求传GBK编码

http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%CE%D2%C3%C7%22%2C%22approveResult%22%3A%220%22%7D

输出(乱码):

keyword={"approveComment":"????","approveResult":"0"}

2. 使用get请求传UTF-8编码

http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%E6%88%91%E4%BB%AC%22%2C%22approveResult%22%3A%220%22%7D

输出:

keyword = {"approveComment":"我们","approveResult":"0"}


3. 使用get请求传ISO-8859-1编码

http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%3F%3F%22%2C%22approveResult%22%3A%220%22%7D

输出(乱码,为何):

keyword={"approveComment":"??","approveResult":"0"}


server.xml文件(UTF-8)  

《为何与默认是一样的结果》

1. 使用get请求传GBK编码
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%CE%D2%C3%C7%22%2C%22approveResult%22%3A%220%22%7D
输出(乱码):
keyword = {"approveComment":"????","approveResult":"0"}


2. 使用get请求传UTF-8编码
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%E6%88%91%E4%BB%AC%22%2C%22approveResult%22%3A%220%22%7D
输出:
keyword = {"approveComment":"我们","approveResult":"0"}


3. 使用get请求传ISO-8859-1编码
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%3F%3F%22%2C%22approveResult%22%3A%220%22%7D
输出(乱码):

keyword = {"approveComment":"??","approveResult":"0"}


server.xml文件(GBK)  

1. 使用get请求传GBK编码
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%CE%D2%C3%C7%22%2C%22approveResult%22%3A%220%22%7D
输出:
keyword = {"approveComment":"我们","approveResult":"0"}


2. 使用get请求传UTF-8编码
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%E6%88%91%E4%BB%AC%22%2C%22approveResult%22%3A%220%22%7D
输出:
keyword = {"approveComment":"鎴戜滑","approveResult":"0"}


3. 使用get请求传ISO-8859-1编码
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%3F%3F%22%2C%22approveResult%22%3A%220%22%7D
输出(乱码):

keyword = {"approveComment":"??","approveResult":"0"}


server.xml文件(ISO-8859-1)  

1. 使用get请求传GBK编码
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%CE%D2%C3%C7%22%2C%22approveResult%22%3A%220%22%7D
输出(乱码):
keyword = {"approveComment":"????","approveResult":"0"}


2. 使用get请求传UTF-8编码
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%E6%88%91%E4%BB%AC%22%2C%22approveResult%22%3A%220%22%7D
输出(乱码):
keyword = {"approveComment":"??????","approveResult":"0"}


3. 使用get请求传ISO-8859-1编码
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%3F%3F%22%2C%22approveResult%22%3A%220%22%7D
输出(乱码,为何):
keyword = {"approveComment":"??","approveResult":"0"}

你可能感兴趣的:(java基础,charset,tomcat)