dom4j解析xml遇中文,加载报错问题

dom4j解析xml遇中文,加载报错问题。错误信息为:org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0xdd26) was found in the element content of the document.

举个最简答的例子,D:/log/test.xml 文件为GBK编码,内容如下:

dom4j解析xml遇中文,加载报错问题_第1张图片

public class XmlTest {

	public static void main(String[] args) {
		SAXReader saxReader = new SAXReader();
		String fileName = "D:\\log\\test.xml";
		File file = new File(fileName);
		Document document = null;
		try { 
	        if (fileName.endsWith(".xml.gz")) {
	            document = saxReader.read(new InputStreamReader(new GZIPInputStream(new FileInputStream(file))));
	        } else {
	            document = saxReader.read(new FileReader(file));
	            //document = saxReader.read(new BufferedInputStream(new FileInputStream(file)));
	        }
	        Element root = document.getRootElement();
	        System.out.println(root.asXML());
		} catch (Exception e){
			e.printStackTrace();
		}
	}
}
如果XmlTest类为UTF-8编码的话,就会报错:An invalid XML character (Unicode: 0xdd26) was found in the element content of the document.

而如果XmlTest类为GBK编码的话,就没有问题。

原因是FileReader读取文件,进行字节到字符转化的时候,如果没有指定编码,会默认使用本地环境的编码。

所以dom4j加载xml文件时,建议使用saxReader.read(new BufferedInputStream(new FileInputStream(file)));

或者saxReader.read(file);  而避免使用FileReader或BufferedReader。




你可能感兴趣的:(Java)