编码转换会丢失信息吗

 

    编码转换会丢失信息吗?

    这是个命题,根据目前的研究结果,答案是肯定的,就是会丢失,理由如下:

String m = URLEncoder.encode("聶","iso-8859-1");

System.out.println(m);
		
String g = URLDecoder.decode(m,"gbk");
		
System.out.println(g);

 经过编码转换后,1字节的信息丢失了,所以不可能再还原了

 

 

而又说Eclipse里显示编码转换,信息不会丢失,是因为它没有编码过程,只有解码过程,随便怎么转换,都不会丢失,它的应用场景也只是去寻找一种合适的解码形式,原始的编码是不变的

 

String mk = URLEncoder.encode("聶","gbk");

String i = URLDecoder.decode(mk,"iso-8859-1");
		
System.out.println("i = " + i);
		
String ik = URLDecoder.decode(mk,"gbk");
		
System.out.println("ik = " + ik);

 

 

2010.06.08 补充添加:

Garbled summary


In the Java run-time of the world, garbled generation (both compile-time generated here) exist in two places at source, in fact, that is what I have mentioned two functions (of course, sometimes the framework of which helped us a call a function, so you get is already uploaded by the network over a byte array converted to String a),

  • getBytes (String charset) if according to a specified charset to encode a unicode String, but found that the coding system, where (for example, iso-8859-1) do not have this character, it will be encoded into the 3F (actually a question mark), so that has caused the loss of information, and can not be restored.
  • new String (byte [] bytes, String charset) if a byte array according to a specified character set to decode the character set, but suddenly some of them do not know when the encoding, for example, a certain period of a byte array according to UTF-8 decoding time, do not know, and to a unicode string side is the "\ uFFFD", in fact this thing called 'REPLACEMENT CHARACTER', shows a question mark

    Therefore, we encounter the following situations are often garbled

         1. A kind of encoded files to another way to parse code to read,

             this would certainly garbled,  this  is where we open a file when

             the operating systemfrequently.
         2. The wrong way transmission over the encoding of the byte stream decoding.

             So, get the wrong unicode string.
         3. And console inconsistent encoding of unicode strings correctly coded,

             and sent to the console  display. Will be garbled.

 

from:

http://www.codeweblog.com/java-depth-analysis-of-the-character-encoding/

 

 

你可能感兴趣的:(eclipse)