Java读带有BOM的UTF-8文件乱码解决方法

Java default io reader does not recognize all BOM markers. It it known to be fixed in JDK6, but I havent tested it yet. You can use UnicodeReader class to overcome problems and auto-recognize bom markers. It will give a transparent behaviour to underlying inputstreams.

Example code using UnicodeReader class
Here is an example method to read text file. It will recognize bom marker and skip it while reading.


   public static char[] loadFile(String file) throws IOException {
      // read text file, auto recognize bom marker or use
      // system default if markers not found.
      BufferedReader reader = null;
      CharArrayWriter writer = null;
      UnicodeReader r = new UnicodeReader(new FileInputStream(file), null);

      char[] buffer = new char[16 * 1024];   // 16k buffer
      int read;
      try {
         reader = new BufferedReader(r);
         writer = new CharArrayWriter();
         while( (read = reader.read(buffer)) != -1) {
            writer.write(buffer, 0, read);
         }
         writer.flush();
         return writer.toCharArray();
      } catch (IOException ex) {
         throw ex;
      } finally {
         try {
            writer.close(); reader.close(); r.close();
         } catch (Exception ex) { }
      }
   }

Example code to write UTF-8 with bom marker
Write bom marker bytes to start of empty file and all proper text editors have no problems using a correct charset while reading files. Java's OutputStreamWriter does not write utf8 bom marker bytes.


   public static void saveFile(String file, String data, boolean append) throws IOException {
      BufferedWriter bw = null;
      OutputStreamWriter osw = null;

      File f = new File(file);
      FileOutputStream fos = new FileOutputStream(f, append);
      try {
         // write UTF8 BOM mark if file is empty
         if (f.length() < 1) {
            final byte[] bom = new byte[] { (byte)0xEF, (byte)0xBB, (byte)0xBF };
            fos.write(bom);
         }

         osw = new OutputStreamWriter(fos, "UTF-8");
         bw = new BufferedWriter(osw);
         if (data != null) bw.write(data);
      } catch (IOException ex) {
         throw ex;
      } finally {
         try { bw.close(); fos.close(); } catch (Exception ex) { }
      }
   }

你可能感兴趣的:(java,F#)