编码转换会丢失信息吗?
这是个命题,根据目前的研究结果,答案是肯定的,就是会丢失,理由如下:
String m = URLEncoder.encode("聶","iso-8859-1"); System.out.println(m); String g = URLDecoder.decode(m,"gbk"); System.out.println(g);
经过编码转换后,1字节的信息丢失了,所以不可能再还原了
而又说Eclipse里显示编码转换,信息不会丢失,是因为它没有编码过程,只有解码过程,随便怎么转换,都不会丢失,它的应用场景也只是去寻找一种合适的解码形式,原始的编码是不变的
String mk = URLEncoder.encode("聶","gbk"); String i = URLDecoder.decode(mk,"iso-8859-1"); System.out.println("i = " + i); String ik = URLDecoder.decode(mk,"gbk"); System.out.println("ik = " + ik);
2010.06.08 补充添加:
In the Java run-time of the world, garbled generation (both compile-time generated here) exist in two places at source, in fact, that is what I have mentioned two functions (of course, sometimes the framework of which helped us a call a function, so you get is already uploaded by the network over a byte array converted to String a),
1. A kind of encoded files to another way to parse code to read,
this would certainly garbled, this is where we open a file when
the operating systemfrequently.
2. The wrong way transmission over the encoding of the byte stream decoding.
So, get the wrong unicode string.
3. And console inconsistent encoding of unicode strings correctly coded,
and sent to the console display. Will be garbled.
from:
http://www.codeweblog.com/java-depth-analysis-of-the-character-encoding/