读取xml文件时不做validation的方法

今天遇到一个问题,我使用dom4j读取一个xml文件的内容,该xml文件中指定了一个dtd文件,而我并没有这个dtd文
件,在我用SAXReader读取xml文件时,便报如下错误:

java.io.FileNotFoundException: [dtd文件名] (The system cannot find the file specified)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:106)
at java.io.FileInputStream.(FileInputStream.java:66)
at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:653)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1315)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(XMLEntityManager.java:1282)
at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(XMLDTDScannerImpl.java:283)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(XMLDocumentScannerImpl.java:1192)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(XMLDocumentScannerImpl.java:1089)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1002)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:807)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
at org.dom4j.io.SAXReader.read(SAXReader.java:465)
at org.dom4j.io.SAXReader.read(SAXReader.java:321)


看来,是xerces自动进行了语法检查,其实这个xml文件是合法的,我这是想读取其中一些数据,并不想做validation,而
且我也不能删除xml中的对dtd的引用,我想,只要关闭默认的语法检查就可以了。查了查dom4j的文档,我把能想到的设置
都关闭了:

	reader.setValidation(false);
reader.setIncludeInternalDTDDeclarations(false);
reader.setIncludeExternalDTDDeclarations(false);
reader.setFeature("http://apache.org/xml/features/validation", false);


可是,仍然不行,没办法,只好跟踪源码了,最后在xerces的XMLDocumentScannerImpl里发现了这段代码:

	if (((fValidation || fLoadExternalDTD) 
&& (fValidationManager == null || !fValidationManager.isCachedDTD()))) {
// This handles the case of a DOCTYPE that had neither an internal subset or an external subset.
fDTDScanner.setInputSource(fExternalSubsetSource);
fExternalSubsetSource = null;
if (!fDisallowDoctype)
setScannerState(SCANNER_STATE_DTD_EXTERNAL_DECLS);
else
setScannerState(SCANNER_STATE_PROLOG);
setDriver(fContentDriver);
if(fDTDDriver == null)
fDTDDriver = new DTDDriver();
return fDTDDriver.next();
}


需要把fValidation和fLoadExternalDTD都设成false才行,继续研究代码,最后终于找到了解决办法:
	SAXReader reader = new SAXReader(false);
reader.setValidation(false);
reader.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
Document document = reader.read(xmlFile);


其中setValidation(false);可以将fValidation设置成false,setFeature("http://apache.org/xml/features/
nonvalidating/load-external-dtd", false);将fLoadExternalDTD设置成false。

希望对其他遇到同样问题的人有些帮助。

你可能感兴趣的:(java)