Java读取UTF-8/UNICODE等字符编码格式的文本文件

Java要读取数据流的时候一定要指定数据流的编码方式,至少读取String流的时候要这么做。
Java读取UTF-8/UNICODE等特定字符编码格式文件时,应该要指定相应的编码读取,如UTF-8,UTF-16,UNICODE,GBK,GB2312,
ISO-8859-1,Big5等。
如下示例代码:
InputStreamReader read = new InputStreamReader (new FileInputStream(f),"UTF-8");//或者UNICODE,UTF-16
BufferedReader reader=new BufferedReader(read);
String line;
while ((line = reader.readLine()) != null) {
    System.out.println(line);
}
reader.close();
read.close();
 

 

  而下列代码可以将GB2312文件转换成UTF-8文件
import java.io.*;

public class inputtest {
 
  public static void main(String[] args) {
    String outfile = null;

    try { convert(args[0], args[1], "GB2312", "UTF8"); } // or "BIG5"
    catch (Exception e) {
      System.out.print(e.getMessage());
      System.exit(1);
    }
  }

  public static void convert(String infile, String outfile, String from, String to)
       throws IOException, UnsupportedEncodingException
  {
    // set up byte streams
    InputStream in;
    if (infile != null) in = new FileInputStream(infile);
    else in = System.in;
    OutputStream out;
    if (outfile != null) out = new FileOutputStream(outfile);
    else out = System.out;

    // Use default encoding if no encoding is specified.
    if (from == null) from = System.getProperty("file.encoding");
    if (to == null) to = System.getProperty("file.encoding");

    // Set up character stream
    Reader r = new BufferedReader(new InputStreamReader(in, from));
    Writer w = new BufferedWriter(new OutputStreamWriter(out, to));

    // Copy characters from input to output.  The InputStreamReader
    // converts from the input encoding to Unicode,, and the OutputStreamWriter
    // converts from Unicode to the output encoding.  Characters that cannot be
    // represented in the output encoding are output as '?'
    char[] buffer = new char[4096];
    int len;
    while((len = r.read(buffer)) != -1)
      w.write(buffer, 0, len);
    r.close();
    w.flush();
    w.close();
  }

}

 

你可能感兴趣的:(java)