展示字符集编码表示

import java.nio.ByteBuffer;
import java.nio.charset.Charset;

/**
 * Charset encoding test.  Run the same input string, which contains
 * some non-ascii characters, through several Charset encoders and dump out
 * the hex values of the resulting byte sequences.
 */
public class DecodeTest {
    public static void main(String[] args) {
        // This is the character sequence to encode 
        String input = "\u00bfMa\u00f1ana?";
        String [] charsetNames = {
                "US-ASCII", "ISO-8859-1", "UTF-8", "UTF-16BE",
                "UTF-16LE", "UTF-16"
        };
        for (int i = 0; i < charsetNames.length; i++) {
            doEncode (Charset.forName(charsetNames[i]), input);
        }
    }

    private static void doEncode(Charset cs, String input) {
        ByteBuffer bb = cs.encode(input);
        System.out.println("Charset: " + cs.name());
        System.out.println("  input :" + input);
        System.out.println("Encoded: " );
        for (int i = 0; bb.hasRemaining(); i++) {
            int b = bb.get();
            int ival = ((int) b) & 0xff;
            char c = (char) ival;
            // Keep tabular alignment pretty
            if (i < 10) System.out.print(" ");
            // Print index number
            System.out.print("  " + i + ": ");
            // Better formatted output is coming someday...
            if (ival < 16) System.out.print("0");
            // Print the hex value of  the byte
            System.out.print(Integer.toHexString(ival));
            // If the byte seems to be the value of a
            // printable character, print it.  No guarantee
            // it will be.
            if (Character.isWhitespace(c) || Character.isISOControl(c)) {
                System.out.println("");
            } else {
                System.out.println(" (" + c + ")");
            }
        }
        System.out.println("");
    }
}


输出结果
Charset: US-ASCII
  input :¿Mañana?
Encoded: 
   0: 3f (?)
   1: 4d (M)
   2: 61 (a)
   3: 3f (?)
   4: 61 (a)
   5: 6e (n)
   6: 61 (a)
   7: 3f (?)

Charset: ISO-8859-1
  input :¿Mañana?
Encoded: 
   0: bf (¿)
   1: 4d (M)
   2: 61 (a)
   3: f1 (ñ)
   4: 61 (a)
   5: 6e (n)
   6: 61 (a)
   7: 3f (?)

Charset: UTF-8
  input :¿Mañana?
Encoded: 
   0: c2 (Â)
   1: bf (¿)
   2: 4d (M)
   3: 61 (a)
   4: c3 (Ã)
   5: b1 (±)
   6: 61 (a)
   7: 6e (n)
   8: 61 (a)
   9: 3f (?)

Charset: UTF-16BE
  input :¿Mañana?
Encoded: 
   0: 00
   1: bf (¿)
   2: 00
   3: 4d (M)
   4: 00
   5: 61 (a)
   6: 00
   7: f1 (ñ)
   8: 00
   9: 61 (a)
  10: 00
  11: 6e (n)
  12: 00
  13: 61 (a)
  14: 00
  15: 3f (?)

Charset: UTF-16LE
  input :¿Mañana?
Encoded: 
   0: bf (¿)
   1: 00
   2: 4d (M)
   3: 00
   4: 61 (a)
   5: 00
   6: f1 (ñ)
   7: 00
   8: 61 (a)
   9: 00
  10: 6e (n)
  11: 00
  12: 61 (a)
  13: 00
  14: 3f (?)
  15: 00

Charset: UTF-16
  input :¿Mañana?
Encoded: 
   0: fe (þ)
   1: ff (ÿ)
   2: 00
   3: bf (¿)
   4: 00
   5: 4d (M)
   6: 00
   7: 61 (a)
   8: 00
   9: f1 (ñ)
  10: 00
  11: 61 (a)
  12: 00
  13: 6e (n)
  14: 00
  15: 61 (a)
  16: 00
  17: 3f (?)


UTF -16BE 和UTF -16LE把每个字符编码为一个 2-字节数值。因此这类编码的解码器必须
要预先了解数据是如何编码的,或者根据编码数据流本身来确定字节顺序的方式。UTF -16
编码承认一种字节顺序标记:Unicode字符\uFEFF 。只有发生在编码流的开端时字节顺序
标记才表现为其特殊含义。如果之后遇到该值,它是根据其定义的 Unicode 值(零宽度,
无间断空格)被映射。外来的,小字节序系统可能会优先考虑\ uFEF 并且把流编码为
UTF -16LE。使用UTF -16编码优先考虑和认可字节顺序标记使系统带有不同的内部字节顺
序,从而与 Unicode数据交流

UTF-16BE 无字节标记,编码高位字序
UTF-16LE 无字节标记,编码低位字序


更多信息请参考: orelly出版的 java nio 第6章.

你可能感兴趣的:(字符集编码)