下面是大致的异常栈:
org.dom4j.DocumentException: Error on line 1 of document : 前言中不允许有内容。 Nested exception: 前言中不允许有内容。 at org.dom4j.io.SAXReader.read(SAXReader.java:482) at org.dom4j.DocumentHelper.parseText(DocumentHelper.java:278) at com.apobates.parser.RssParser.build(RssParser.java:38) at com.apobates.machine.reader.Reader.mainParser(Reader.java:57) at com.apobates.machine.reader.Reader.load(Reader.java:37) at com.apobates.test.ParserEntityTest.main(ParserEntityTest.java:41) Nested exception: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; 前言中不允许有内容。 at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at org.dom4j.io.SAXReader.read(SAXReader.java:465) at org.dom4j.DocumentHelper.parseText(DocumentHelper.java:278) at com.apobates.parser.RssParser.build(RssParser.java:38) at com.apobates.machine.reader.Reader.mainParser(Reader.java:57) at com.apobates.machine.reader.Reader.load(Reader.java:37) at com.apobates.test.ParserEntityTest.main(ParserEntityTest.java:41) Nested exception: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; 前言中不允许有内容。 at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at org.dom4j.io.SAXReader.read(SAXReader.java:465) at org.dom4j.DocumentHelper.parseText(DocumentHelper.java:278) at com.apobates.parser.RssParser.build(RssParser.java:38) at com.apobates.machine.reader.Reader.mainParser(Reader.java:57) at com.apobates.machine.reader.Reader.load(Reader.java:37) at com.apobates.test.ParserEntityTest.main(ParserEntityTest.java:41)
Document doc=DocumentHelper.parseText(responseText);
翻译一下是说:内容是不允许在序言。这下有头绪了。下面看一看xml序言有哪些内容:
The prolog refers to the information that appears before the start tag of the document or root element. It includes information that applies to the document as a whole, such as character encoding, document structure, and style sheets.
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="show_book.xsl"?> <!DOCTYPE catalog SYSTEM "catalog.dtd"> <!--catalog last updated 2000-11-01-->
上面的定义来自MSDN:http://msdn.microsoft.com/en-us/library/vstudio/ms256037(v=vs.100).aspx
answers的feed xml的序言有:
<?xml version="1.0" encoding="utf-8"?><rss xmlns:a10="http://www.w3.org/2005/Atom" version="2.0"><channel><title>Internet Explorer Category - All Threads</title><description /><language>en-us</language><a10:link href="http://answers.microsoft.com/en-us/ie/forum?tab=Threads&threadType=all" />
SAXReader xmlReader = new SAXReader(); List<RssNews> rs=new ArrayList<RssNews>(); try { Document doc=xmlReader.read(new URL("http://answers.microsoft.com/en-us/feed/f/ie")); List<Node> list = doc.selectNodes("//item"); //ETC } catch (MalformedURLException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (DocumentException e) { // TODO Auto-generated catch block e.printStackTrace(); }