在Java中如何读取UTF-8格式的XML文件 - How to read UTF-8 XML file in Java

在使用JAXB unmarshal XML的时候碰到了一个异常: Invalid byte 1 of 1-byte UTF-8 sequence

public static Object unmarshal(InputStream xml, Class<?> clazz) {
    Object obj = null;

    try {
        JAXBContext jc = JAXBContext.newInstance(clazz.getPackage().getName());
        Unmarshaller u = jc.createUnmarshaller();

        obj = u.unmarshal(xml);
    } catch (JAXBException e) {
        throw new RuntimeException("Can't unmarshal this xml file, please check the error message: " + e.getMessage());
    }

    return obj;		
}

 问题出现在u.unmarshal(xml)这个地方,这句实际上调用的是SaxParser.parse()方法,这是一个encoding的问题,我们需要将输入流转换为UTF-8格式,然后再由SaxParser去解析该输入流, 解决方法如下:

public static Object unmarshal(InputStream xml, Class<?> clazz) {
    Object obj = null;
    
	try {
		JAXBContext jc = JAXBContext.newInstance(clazz.getPackage().getName());
		Unmarshaller u = jc.createUnmarshaller();
		
		Reader reader = new InputStreamReader(xml,"UTF-8");
		InputSource is = new InputSource(reader);
		is.setEncoding("UTF-8");
		
		obj = u.unmarshal(is);
	} catch (JAXBException e) {
		throw new RuntimeException("Can't unmarshal this xml file, please check the error message: " + e.getMessage());
	} catch (UnsupportedEncodingException e) {
		throw new RuntimeException("Doesn't support encoding: UTF-8, please check the error message: " + e.getMessage());
	}
	
	return obj;
}

下面将使用泛型进一步优化该方法:

public static <T> T unmarshal(InputStream xml, Class<T> clazz) {
	T obj = null;
	
	try {
		JAXBContext jc = JAXBContext.newInstance(clazz.getPackage().getName());
		Unmarshaller u = jc.createUnmarshaller();
		
		Reader reader = new InputStreamReader(xml,"UTF-8");
		Source source = new StreamSource(reader);
		JAXBElement<T> element = u.unmarshal(source, clazz);
		
		obj = element.getValue();
	} catch (JAXBException e) {
		throw new RuntimeException("Can't unmarshal this xml file, please check the error message: " + e.getMessage());
	} catch (UnsupportedEncodingException e) {
		throw new RuntimeException("Doesn't support encoding: UTF-8, please check the error message: " + e.getMessage());
	}
	
	return obj;
}

 

你可能感兴趣的:(java,xml)