解决过程:
提取导入数据包中的一条完整数据做调试。
根据报错提示查看XF_SUMMARY标签元素是否真正缺少闭合标签,目测没问题
源文件:
<XF_SUMMARY>XXXX相关信息</XF_SUMMARY>
将该元素标签中的内容删除,继续导入,
还是报错换成其他的标签元素信息:
2014-07-22 13:31:41 错误 [con.err] org.dom4j.DocumentException: Error on line 36 of document : The element type "APPELLEE_POLITY" must be terminated by the matching end-tag "</APPELLEE_POLITY>". Nested exception: The element type "APPELLEE_POLITY" must be terminated by the matching end-tag "</APPELLEE_POLITY>". 2014-07-22 13:31:41 错误 [con.err] at org.dom4j.io.SAXReader.read(SAXReader.java:482) 2014-07-22 13:31:41 错误 [con.err] at org.dom4j.io.SAXReader.read(SAXReader.java:343)
反复操作当将所有提示有问题的标签元素数据清除后,正常导入。
原因是什么?看上去完好的标签为什么在解析的时候会提示缺少闭合标签呢?
我们可以通过下面代码的输出得到问题原因:
public static Document changerXMLCode(File xmlFile) throws IOException, DocumentException { SAXReader reader = new SAXReader(); FileInputStream fileInputStream=new FileInputStream(xmlFile); byte[] b0=new byte[1024]; byte[] B=new byte[0]; int read =-1; while ((read=fileInputStream.read(b0))>-1) { int i=B.length; B=Arrays.copyOf(B, B.length+read); for(int j=0;j<read;j++){ B[i+j]=b0[j]; } } String xmlDate = new String(B,"GBK");//我们的XML文件编码为GBK xmlDate = xmlDate.replaceAll("&#[1-9]+|&#\\w{0,3};?", ""); //将字符串转换为Document对象 Document document = reader.read(new ByteArrayInputStream(xmlDate .getBytes("GBK"))); return document; }
通过查看xmlDate的值我们知道了原来是:
<?xml version="1.0" encoding="GBK"?><XF_JUBAO> <Jubao> <APPELLEE_SEX>鐢?/APPELLEE_SEX> <APPELLEE_NATION>姹夋棌</APPELLEE_NATION> <APPELLEE_POLITY>涓浗鍏变骇鍏氬厷鍛?/APPELLEE_POLITY> <XF_QUESTIONTYPE>宸ㄩ璐骇鏉ユ簮涓嶆槑</XF_QUESTIONTYPE> <APPELLEE_NAME>鐜嬪繝</APPELLEE_NAME> <APPELLEE_ADDR>娉板窞甯傚叴鍖栧競宸ュ晢閾惰</APPELLEE_ADDR> </Jubao> </XF_JUBAO>
的确有的闭合标签元素被乱码给破坏了如:
<APPELLEE_POLITY>涓浗鍏变骇鍏氬厷鍛?/APPELLEE_POLITY>
最后确定问题原因:导出xml数据包时,没有设置编码为GBK。
继续做个调试,将解析代码调整为utf-8格式
public static Document changerXMLCode(File xmlFile) throws IOException, DocumentException { SAXReader reader = new SAXReader(); FileInputStream fileInputStream=new FileInputStream(xmlFile); byte[] b0=new byte[1024]; byte[] B=new byte[0]; int read =-1; while ((read=fileInputStream.read(b0))>-1) { int i=B.length; B=Arrays.copyOf(B, B.length+read); for(int j=0;j<read;j++){ B[i+j]=b0[j]; } } String xmlDate = new String(B,"utf-8");//修改为utf-8 xmlDate = xmlDate.replaceAll("&#[1-9]+|&#\\w{0,3};?", ""); //将字符串转换为Document对象 Document document = reader.read(new ByteArrayInputStream(xmlDate .getBytes("utf-8")));//修改为utf-8 return document; }
继续导入数据包,查看xmlDate数据是:
<?xml version="1.0" encoding="GBK"?> <XF_JUBAO> <Jubao> <APPELLEE_SEX>男</APPELLEE_SEX> <APPELLEE_NATION>汉族</APPELLEE_NATION> <APPELLEE_POLITY>职级<//APPELLEE_POLITY> <XF_QUESTIONTYPE>问题</XF_QUESTIONTYPE> <APPELLEE_NAME>名称</APPELLEE_NAME> <APPELLEE_ADDR>地址</APPELLEE_ADDR> </Jubao> </XF_JUBAO>
一切显示都正常了,但是莫名其妙又报错了。还是之前的错误信息:
2014-07-22 13:52:01 错误 [con.err] org.dom4j.DocumentException: Error on line 36 of document : The element type "APPELLEE_POLITY" must be terminated by the matching end-tag "</APPELLEE_POLITY>". Nested exception: The element type "APPELLEE_POLITY" must be terminated by the matching end-tag "</APPELLEE_POLITY>". 2014-07-22 13:52:01 错误 [con.err] at org.dom4j.io.SAXReader.read(SAXReader.java:482) 2014-07-22 13:52:01 错误 [con.err] at org.dom4j.io.SAXReader.read(SAXReader.java:343)
为什么呢?原来继续调试问题又绕回来了
因为XML已经指定了gbk编码格式:
<?xml version="1.0" encoding="GBK"?>
将该格式统一改成utf-8解析就哦了。