1、DOM提供了一个易于使用的API,与SAX和StAX相比,它的优势在于支持XPath,不过,它也迫使将整个文档读入存储器中,这对于小文档来说没什么,但会影响大文档的性能,而对于非常大的文档来说,这是根本禁止的。
2、SAX通过作为一种“推”机制的解析器来处理该方面,也就是说,对于该解析器在文档中遇到的每种结构,都会生成相应的事件,程序员可以选择自己感兴趣的事件进行处理,不足之处在于SAX通常生成的大量事件是程序员并不关系的。而且,SAX API不提供迭代文档处理,从头到尾摧毁整个事件。
3、StAX方法解析XML
StAX即Streaming API for XML,当前最有效的XML处理方法,因此特别适合于处理复杂流程,比如数据库绑定和SOAP消息。StAX创建的信息集是非常小,可以直接作为垃圾收集的候选对象。这让XML处理任务占用较小的空间,使得它不仅适用于小型堆设备,比如移动电话,而且适用于长期运行的服务器端应用程序。
与SAX不同,StAX能够对XML文档进行写操作,这减少了需要处理的API数量。
StAX提供两种不同的解析数据模型:光标模型和迭代器模型。
Catalog.xml
<?xml version="1.0" encoding="UTF-8"?> <catalog> <book sku="123_xaa"> <title>King Lear</title> <author>William Shakespeare</author> <price>6.95</price> <category>classics</category> </book> <book sku="988_yty"> <title>Hamlet</title> <author>William Shakespeare</author> <price>5.95</price> <category>classics</category> </book> <book sku="434_asd"> <title>1984</title> <author>George Orwell</author> <price>12.95</price> <category>classics</category> </book> <book sku="876_pep"> <title>Java Generics and Collections</title> <authors> <author>Maurice Naftalin</author> <author>Phillip Wadler</author> </authors> <price>34.99</price> <category>programming</category> </book> </catalog>
使用StAX光标模型:XMLStreamReader
import static java.lang.System.out; import java.io.InputStream; import java.util.Set; import java.util.TreeSet; import javax.xml.stream.XMLInputFactory; import javax.xml.stream.XMLStreamException; import javax.xml.stream.XMLStreamReader; import javax.xml.stream.events.XMLEvent; /** * StAX光标模型 * @author K * */ public class StaxCursor { private static final String db = "/ch02/Catalog.xml"; // we'll hold values here as we find them private Set<String> uniqueAuthors; public static void main(String... args) { StaxCursor p = new StaxCursor(); p.find(); } // constructor public StaxCursor() { uniqueAuthors = new TreeSet<String>(); } // parse the document and offload work to helpers public void find() { XMLInputFactory xif = XMLInputFactory.newInstance(); // forward-only, most efficient way to read XMLStreamReader reader = null; // get ahold of the file final InputStream is = StaxCursor.class.getResourceAsStream(db); // whether current event represents elem, attrib, etc int eventType; String current = ""; try { // create the reader from the stream reader = xif.createXMLStreamReader(is); // work with stream and get the type of event // we're inspecting while (reader.hasNext()) { // because this is Cursor, we get an integer token to next event eventType = reader.next(); // do different work depending on current event switch (eventType) { case XMLEvent.START_ELEMENT: // save element name for later current = reader.getName().toString(); printSkus(current, reader); break; case XMLEvent.CHARACTERS: findAuthors(current, reader); break; } } // end loop out.println("Unique Authors=" + uniqueAuthors); } catch (XMLStreamException e) { out.println("Cannot parse: " + e); } } // get the name and value of the book's sku attribute private void printSkus(String current, XMLStreamReader r) { current = r.getName().toString(); if ("book".equals(current)) { String k = r.getAttributeName(0).toString(); String v = r.getAttributeValue(0); out.println("AttribName " + k + "=" + v); } } // inspect author elements and read their values. private void findAuthors(String current, XMLStreamReader r) throws XMLStreamException { if ("author".equals(current)) { String v = r.getText().trim(); // can get whitespace value, so ignore if (v.length() > 0) { uniqueAuthors.add(v); } } } }
使用StAX迭代器模型:迭代器API比较灵活,而且易于扩展
import static java.lang.System.out; import java.io.InputStream; import javax.xml.namespace.QName; import javax.xml.stream.XMLEventReader; import javax.xml.stream.XMLInputFactory; import javax.xml.stream.XMLStreamException; import javax.xml.stream.events.XMLEvent; /** * StAX迭代器模型 * @author K * */ public class StaxIterator { private static final String db = "/ch02/Catalog.xml"; public static void main(String... args) { StaxIterator p = new StaxIterator(); p.find(); } public void find() { XMLInputFactory xif = XMLInputFactory.newInstance(); // forward-only, most efficient way to read XMLEventReader reader = null; // get ahold of the file final InputStream is = StaxIterator.class.getResourceAsStream(db); try { // create the reader from the stream reader = xif.createXMLEventReader(is); // work with stream and get the type of event // we're inspecting while (reader.hasNext()) { XMLEvent e = reader.nextEvent(); if (e.isStartElement()) { e = e.asStartElement().getAttributeByName(new QName("sku")); if (e != null) { out.println(e); } } } // end loop } catch (XMLStreamException e) { out.println("Cannot parse: " + e); } } }
使用StAX光标API编写XML数据流
import static java.lang.System.out; import java.io.FileNotFoundException; import java.io.FileOutputStream; import java.io.IOException; import javax.xml.stream.XMLOutputFactory; import javax.xml.stream.XMLStreamException; import javax.xml.stream.XMLStreamWriter; public class WriteStax { private static final String REPAIR_NS = "javax.xml.stream.isRepairingNamespaces"; private static final String NS = "http://ns.example.com/books"; public static void main(String... args) { XMLOutputFactory factory = XMLOutputFactory.newInstance(); // autobox factory.setProperty(REPAIR_NS, true); try { // setup a destination file FileOutputStream fos = new FileOutputStream("result.xml"); // create the writer final XMLStreamWriter xsw = factory.createXMLStreamWriter(fos); xsw.setDefaultNamespace(NS); // open the document. Can also add encoding, etc xsw.writeStartDocument("1.0"); xsw.writeEndDocument(); xsw.writeComment("Powered by StAX"); // make enclosing book xsw.writeStartElement("book"); xsw.writeNamespace("b", NS); xsw.writeAttribute("sku", "345_iui"); // make title child element xsw.writeStartElement(NS, "title"); xsw.writeCharacters("White Noise"); xsw.writeEndElement(); // close title xsw.writeEndElement(); // close book // clean up xsw.flush(); fos.close(); xsw.close(); out.print("All done."); } catch (FileNotFoundException fnfe) { fnfe.printStackTrace(); } catch (IOException ioe) { ioe.printStackTrace(); } catch (XMLStreamException xse) { xse.printStackTrace(); } } }
该API非常灵活,允许按照不同程度的规范化和合法性来编写XML。可以快速、清晰底生成这样的XML片段:适合于传输到SOAP主体的有效载荷中或其他任何希望粘贴某种标记的地方。
一般来说,在两种模式中进行抉择时,如果希望能够修改事件流和采用更灵活的API,就选择迭代器。如果希望得到更快的可行新能和更小的空间,就使用光标API。
使用过滤器来提高应用程序的性能和清晰度,方法是指示解析器只提供我们所感性起的事件,使光标模式解析更有效率。实现StreamFilter接口的accept方法,然后使用它构造XMLStreamReader。当使用EventReader时,要做的所有事情就是实现EventFilter接口的accept方法。