BOM头得截去

  demo4j在解析XML时,如果XML中遇到了bom头会报一些无厘头的错,以下贴出解决方案。

文本文件的头字节的标识:

 

 

   
00 00 FE FF UTF-32, big-endian
FF FE 00 00 UTF-32, little-endian
FE FF UTF-16, big-endian
FF FE UTF-16, little-endian
EF BB BF UTF-8

 

    SAX解析中出现Content is not allowed in prolog.异常解决方法:

public Reader getReader(InputStream is) throws IOException,
   UnsupportedEncodingException {
  PushbackInputStream pis = new PushbackInputStream(is, 1024);
  String bomEncoding = getBOMEncoding(pis);
  System.out.println(bomEncoding);
  Reader input = null;
  if (bomEncoding == null) {
   input = new BufferedReader(new InputStreamReader(pis, "UTF8"));
  }
  else {
   input = new BufferedReader(new InputStreamReader(pis, bomEncoding));
  }
  return input;
 }

 protected String getBOMEncoding(PushbackInputStream is) throws IOException {
  String encoding = null;
  int[] bytes = new int[3];
  bytes[0] = is.read();
  bytes[1] = is.read();
  bytes[2] = is.read();
  if (bytes[0] == 0xFE && bytes[1] == 0xFF) {
   encoding = "UTF_16BE";
   is.unread(bytes[2]);
  }
  else if (bytes[0] == 0xFF && bytes[1] == 0xFE) {
   encoding = "UTF_16LE";
   is.unread(bytes[2]);
  }
  else if (bytes[0] == 0xEF && bytes[1] == 0xBB && bytes[2] == 0xBF) {
   encoding = "UTF8";
  }
  else {
   for (int i = bytes.length - 1; i >= 0; i--) {
    is.unread(bytes[i]);
   }
  }
  return encoding;
 }

 

你可能感兴趣的:(bom)