解决通过this.class.getResource()得到的URL中乱码的问题及源码解析:

问题浮现:

获取这个文件时,打印路径,发现乱码,然后我尝试用JDK 的file.encoding 编码字符集来把path 转成字节数组,在以此字符集解码这个字节数组,发现还是乱码。(原因可以分析源码)

  String path = PoiUtil.class.getClassLoader().getResource("template/中国.txt").getPath();
  System.out.println(path);

  //尝试使用系统编码方式utf-8 来解码,还是不行  
  String encode = System.getProperties().getProperty("file.encoding");
  System.out.println(encode);
  path = new String(path.getBytes(encode),encode);
  System.out.println(path);

结果:

/D:/project01/tsms-parent/tsms-web/target/classes/template/%e4%b8%ad%e5%9b%bd.txt
UTF-8
/D:/project01/tsms-parent/tsms-web/target/classes/template/%e4%b8%ad%e5%9b%bd.txt

解决方案:

String path = PoiUtil.class.getClassLoader().getResource("template/中国.txt").getPath();
path = URLDecoder.decode(path,"utf-8");

结果:

/D:/project01/tsms-parent/tsms-web/target/classes/template/中国.txt

因为ClassLoader 的getResource 方法使用了utf-8 对路径信息进行了编码,当路径中存在中文和空格时,他会对这些字符进行转换,这样有时会出现乱码,所以在可以使用URLDecoder 的decoder方法进行解码,以便得到原始的中文及空格路径。

 

源码解析:

这里是  URLDecoder.decode(path,"utf-8"); 的源码  (主要是对汉字转化时出现的 %e4%b8%ad%e5%9b%bd 的这一段进行处理)

/**
* Decodes a {@code application/x-www-form-urlencoded} string using a specific
* encoding scheme.
* The supplied encoding is used to determine
* what characters are represented by any consecutive sequences of the
* form "{@code %xy}".
* 

* Note: The * World Wide Web Consortium Recommendation states that * UTF-8 should be used. Not doing so may introduce * incompatibilities. * * @param s the {@code String} to decode * @param enc The name of a supported * character * encoding. * @return the newly decoded {@code String} * @exception UnsupportedEncodingException * If character encoding needs to be consulted, but * named character encoding is not supported * @see URLEncoder#encode(java.lang.String, java.lang.String) * @since 1.4 */ public static String decode(String s, String enc) throws UnsupportedEncodingException{ boolean needToChange = false; int numChars = s.length(); StringBuffer sb = new StringBuffer(numChars > 500 ? numChars / 2 : numChars); int i = 0; if (enc.length() == 0) { throw new UnsupportedEncodingException ("URLDecoder: empty string enc parameter"); } char c; byte[] bytes = null; while (i < numChars) { c = s.charAt(i); switch (c) { case '+': sb.append(' '); i++; needToChange = true; break; case '%': /* * Starting with this instance of %, process all * consecutive substrings of the form %xy. Each * substring %xy will yield a byte. Convert all * consecutive bytes obtained this way to whatever * character(s) they represent in the provided * encoding. */ try { // (numChars-i)/3 is an upper bound for the number // of remaining bytes if (bytes == null) bytes = new byte[(numChars-i)/3]; int pos = 0; while ( ((i+2) < numChars) && (c=='%')) { //把从 i + 1 ~ i+3 的字符串以16进制转为一个整数 int v = Integer.parseInt(s.substring(i+1,i+3),16); if (v < 0) throw new IllegalArgumentException("URLDecoder: Illegal hex characters in escape (%) pattern - negative value"); bytes[pos++] = (byte) v; i+= 3; if (i < numChars) c = s.charAt(i); } // A trailing, incomplete byte encoding such as // "%x" will cause an exception to be thrown if ((i < numChars) && (c=='%')) throw new IllegalArgumentException( "URLDecoder: Incomplete trailing escape (%) pattern"); //把以十六进制转为整数的字节数组以 utf-8 解码 sb.append(new String(bytes, 0, pos, enc)); } catch (NumberFormatException e) { throw new IllegalArgumentException( "URLDecoder: Illegal hex characters in escape (%) pattern - " + e.getMessage()); } needToChange = true; break; default: sb.append(c); i++; break; } } return (needToChange? sb.toString() : s); }

源码分解:

1.提取%标记的16进制,并转为字节数组

String str1 = "%e4%b8%ad";
        int i = 0;
        int j = 0;
        byte[] bb = new byte[3];
        while ((i+2< str1.length()) && (str1.charAt(i) == '%')) {
            //取出%号后面的16进制数
            String hex = str1.substring(i+1, i+ 3);
            //把16进制数转化成10进制数
            int i1 = Integer.parseInt(hex, 16);
            //把十进制数转成字节放入字节数组中,
            bb[j] = (byte) i1;
            j++;
            i+=3;
        }
        //这样字节数组中就有3个字节了,把字节数组以utf-8 解码为一个字符串
        String s11 = new String(bb, "utf-8");
        System.out.println(s11);

结果:中

2.把字节数组转成字符串

byte[] ss = new byte []{(byte) 228,(byte) 184,(byte) 173};
String s = new String(ss, "utf-8");
System.out.println(s);

结果:中

你可能感兴趣的:(java,基础知识)