java应用中到处可见xml的身影 ,从web.xml到各框架的配置文件,至wsdp中多数是xml:jaxp(java api for xml processing),jaxb(java Architecture for xml binding),sjsxp(sun java streaming xml parser),....可以访问wsdp(java web service developer pack)的网址:http://www.oracle.com/technetwork/java/webservicespack-jsp-140788.html.我只起个头.里面涉及的知识很广。看完本文的读者可以阅读这两本书:java and xml,java web service
part 1: java xml
1.xml解析器
相信多数java程序员都知道,官方只负责组织架构api,api的实现由不同的厂商来开发.在xml上,好像是迟了一点,在StAX面世之后,jaxp还没出炉,已知的技术有:
1.1 w3c.org 的Dom,跨语言的,跨平台的文档对象模型 .这个不论是在前端和服务器中都有汲列。Document是可读可写的内存树,在非顺序读和写的时候比较出色通常配合xpath
1.2 saxproject.org的sax,也是跨语言的,基于事件回调的处理模型, 比dom的优势是省内存,是顺序读。
1.3 bea的StAX试图统一dom和sax的领土,在jsr 173项目之下.网址:https://www.jcp.org/en/jsr/detail?id=173.我没用过,就不多说了
上面的都是卖点,但不是解析器。我知道的解析器有apache 的 xerces:http://xerces.apache.org/.
支持
SAX 2.0.2
DOM Level 3 Core, Load and Save
DOM Level 2 Core, Events, Traversal and Range
JAXP 1.4
StAX 1.0 Event API (javax.xml.stream.events)
2.jaxp是什么
是java对sax,dom的封口,让你用一个jaxp即可使用dom,也可以使用sax.用sax解析xml时使用SAXParserFactory;用dom解析时使用DocumentBuilderFactory
3.jaxb是什么
在java对象和xml之间架起一座桥梁。让你不用理会dom,sax,stax.你面对的要么是xml,要么是java bean.使用marshaller把java对象转到xml(可以一个实例一个xml也可以多个实例一个xml),unmarshaller把xml中的数据还原为java对象实例 ,包的位置:javax.xml.bind.
part 2: dom4j
1.dom4j不是xml解析器,它跟jdom不同之处在于其提供了一套xml抽像接口,顶接口:node,Attribute, Branch, CDATA, CharacterData, Comment, Document(不是w3c的Document), DocumentType, Element, Entity, ProcessingInstruction, Text都是node的子接口
2.默认工厂:DocumentFactory
还有几个具体用处的子工厂:BeanDocumentFactory, DatatypeDocumentFactory, DatatypeElementFactory, DOMDocumentFactory, IndexedDocumentFactory, NonLazyDocumentFactory, UserDataDocumentFactory
说一说:DOMDocumentFactory,它继承了DocumentFactory并实现了org.w3c.dom.DOMImplementation.如果有一个方法接受org.w3c.dom.Element,你可以传给它一个DOMDocumentFactory实例创建的org.dom4j.Element
3.dom4j中也可以使用sax,dom,stax来解析xml,创建的解析器是通过jaxp创建的,他们都在org.dom4j.io包中,org.dom4j.io.DOMReader,org.dom4j.io.SAXReader.org.dom4j.io.XPP3Reader.解析方法都是:read
4.序列化:指的是输出到字符串对象,文件,控制台,可使用:org.dom4j.io.XMLWriter,除此之外还有:
org.dom4j.io.DOMWriter输出到org.dom4j.document中返回一个org.w3c.dom.Document,
org.dom4j.io.SAXWriter输出到org.xml.sax.ContentHandler
5.使用Dom4j解析RSS url
5.1 使用ElementHandler
public class SAXRssParser{ private final SAXReader reader; private final List<RssItem> items; public SAXRssParser() { super(); this.reader = new SAXReader(); this.items = new ArrayList<>(); } public boolean parser(final URL url) { // TODO Auto-generated method stub reader.addHandler("/rss/channel/item",new ElementHandler(){ final ItemChildElementHandler titleHandler=new ItemChildElementHandler(); final ItemChildElementHandler linkHandler=new ItemChildElementHandler(); final ItemChildElementHandler dateHandler=new ItemChildElementHandler(); final ItemChildElementHandler descripHandler=new ItemChildElementHandler(); @Override public void onStart(ElementPath elementPath) { // TODO Auto-generated method stub elementPath.addHandler("title",titleHandler); elementPath.addHandler("link",linkHandler); elementPath.addHandler("pubDate",dateHandler); elementPath.addHandler("description",descripHandler); } @Override public void onEnd(ElementPath elementPath) { // TODO Auto-generated method stub elementPath.removeHandler("title"); elementPath.removeHandler("link"); elementPath.removeHandler("pubDate"); elementPath.removeHandler("description"); try { URL curURL = processRemoteLink(linkHandler.getNodeContent(),url);//处理方法 Date curDate = processDate(dateHandler.getNodeContent());//处理方法 items.add( new RssItem( curURL, titleHandler.getNodeContent(), descripHandler.getNodeContent(), curDate)); } catch (MalformedURLException e) { e.printStackTrace(); } } }); try { reader.read(url); } catch (DocumentException e) { // TODO Auto-generated catch block e.printStackTrace(); } return items.size()>0?true:false; } public List<RssItem> getEntryList(){ return items; } private class ItemChildElementHandler implements ElementHandler{ private String tagName; private String tagText; @Override public void onStart(ElementPath elementPath) { // TODO Auto-generated method stub Element elt = elementPath.getCurrent(); tagName=elt.getName(); } @Override public void onEnd(ElementPath elementPath) { // TODO Auto-generated method stub Element elt = elementPath.getCurrent(); tagText=elt.getText(); } @SuppressWarnings("unused") public String getNodeNames(){ return tagName; } public String getNodeContent(){ return tagText; } } }
@ThreadSafe public class RssItem implements Serializable{ private static final long serialVersionUID = 673250215751499564L; /** * 条目的连接地址 */ private final URL url; /** * 条目标题 */ private final String title; /** * 条目简述 */ private final String description; /** * 条目发布日期 */ private final Date date; public RssItem( URL url, String title, String description, Date date) { super(); this.url = url; this.title = title; this.description = description; this.date = date; } public URL getUrl() { return url; } public String getTitle() { return title; } public String getDescription() { return description; } public Date getDate() { return date; } @Override public int hashCode() { //ETC } @Override public boolean equals(Object obj) { //ETC } @Override public String toString() { //ETC } }
public class SAXRssParserTest { public static void main(String[] args) { // TODO Auto-generated method stub String b="http://news.baidu.com/n?cmd=7&loc=4075&name=%D1%CC%CC%A8&tn=rss"; final long beginTime=System.nanoTime(); SAXRssParser sap=new SAXRssParser(); try{ if(sap.parser(new URL(b))){ List<RssItem> news=sap.getEntryList(); System.out.println("size:"+news.size()); for(RssItem ri:news){ System.out.println("title:"+ri.getTitle()+"@"+ri.getDate()); System.out.println("link:"+ri.getUrl()); } } }catch(MalformedURLException e){ e.printStackTrace(); } final long endTime=System.nanoTime(); System.out.println("used Second: "+(endTime-beginTime)/1.0e9); } }
public class SAXRssParser{ private final SAXReader reader; private final List<RssItem> items; public SAXRssParser() { super(); this.reader = new SAXReader(); this.items = new ArrayList<>(); } public boolean parser(final URL url) { // TODO Auto-generated method stub Document document=reader.read(url); final RssVisitorSupport rvs=new RssVisitorSupport(url); document.accept(rvs); items.addAll(rvs.getNews()); return rvs.getTotalStep()>0?true:false; } public List<RssItem> getEntryList(){ return items; } class RssVisitorSupport extends VisitorSupport{ private int step=0; private RssItemBuilder build=null; private final List<RssItem> news; private final URL referURL; public RssVisitorSupport(final URL referURL){ this.referURL=referURL; this.news=new ArrayList<>(); } @Override public void visit(Element node) { // TODO Auto-generated method stub String eleName=node.getName(); if(eleName.equals("item")){ build=new RssItemBuilder(); step++; } if (eleName.equals("title") && build!=null) { build.setTitle(node.getText()); } if (eleName.equals("link") && build!=null) { try{ build.setURL(processRemoteLink(node.getText(),referURL)); }catch(MalformedURLException e){ e.printStackTrace(); } } if (eleName.equals("pubDate") && build!=null) { build.setDate(processDate(node.getText())); } if (eleName.equals("description") && build!=null) { build.setDescription(node.getText()); } if(build!=null && !build.isEmpty()){ news.add(build.build()); build=null;//不设置此值会出现重复数据 } } public int getTotalStep(){ return step; } public List<RssItem> getNews(){ return news; } } }
由于RssItem设计为不可变对象,所以在RssVisitorSupport中使用的对象:RssItemBuilder,使用了构建模式。关于Builder设计模式可以参考此文:
我测了几个rss地址发现:VisitorSupport > ElementHandler > Iterator
6.jaxb示例
场景:以前在写后台程序时都有一个功能管理菜单,不知道jaxb为何时,都会创建一份xml,用一种解析器在程序启动时创建一个单例
6.1功能管理菜单xml
<?xml version="1.0" encoding="UTF-8"?> <root ico="sec"> <group name="会员管理" link="/user" symbol="sec_1"> <item> <anchor>会员列表</anchor> <id>child_1_1</id> <link>/user</link> </item> <item> <anchor>个人信息</anchor> <id>child_1_2</id> <link>/user/person</link> </item> <item> <anchor>企业信息</anchor> <id>child_1_3</id> <link>/user/company</link> </item> <item> <anchor>安全问题</anchor> <id>child_1_4</id> <link>/user/secret</link> </item> <item> <anchor>信用记录</anchor> <id>child_1_5</id> <link>/user/trust</link> </item> </group> <group name="商品管理" link="/product" symbol="sec_2"> <item> <anchor>商品列表</anchor> <id>child_2_1</id> <link>/product</link> </item> <item> <anchor>交易帐号</anchor> <id>child_2_2</id> <link>/product/account</link> </item> <item> <anchor>扩展字段</anchor> <id>child_2_3</id> <link>/product/field</link> </item> <item> <anchor>类型模板</anchor> <id>child_2_4</id> <link>/product/field/template</link> </item> </group> <group name="订单管理" link="/order" symbol="sec_3"> <item> <anchor>订单列表</anchor> <id>child_3_1</id> <link>/order</link> </item> <item> <anchor>清单管理</anchor> <id>child_3_2</id> <link>/order/inventory</link> </item> <item> <anchor>点评管理</anchor> <id>child_3_3</id> <link>/order/pointer</link> </item> </group> <group name="财务管理" symbol="sec_4"> <item> <anchor>网银交易渠道</anchor> <id>child_4_1</id> <link>/channel</link> </item> <item> <anchor>充值记录</anchor> <id>child_4_2</id> <link>/channel/cache</link> </item> <item> <anchor>银行卡管理</anchor> <id>child_4_3</id> <link>/bank/card</link> </item> <item> <anchor>帐单管理</anchor> <id>child_4_4</id> <link>/bill</link> </item> <item> <anchor>现金记录</anchor> <id>child_4_5</id> <link>/bank/saction</link> </item> <item> <anchor>支付宝转账记录</anchor> <id>child_4_6</id> <link>/bill/ali</link> </item> </group> <group name="新闻管理" link="/news" symbol="sec_5"> <item> <anchor>新闻列表</anchor> <id>child_5_1</id> <link>/news</link> </item> <item> <anchor>新闻栏目</anchor> <id>child_5_2</id> <link>/news/category</link> </item> <item> <anchor>新闻标题标识</anchor> <id>child_5_3</id> <link>/news/level</link> </item> </group> <group name="系统管理" symbol="sec_6"> <item> <anchor>投诉/意见反馈</anchor> <id>child_6_1</id> <link>/feedback</link> </item> <item> <anchor>活跃日志</anchor> <id>child_6_2</id> <link>/user/active</link> </item> <item> <anchor>会员等级</anchor> <id>child_6_3</id> <link>/user/level</link> </item> <item> <anchor>手机短信</anchor> <id>child_6_4</id> <link>/recaptcha</link> </item> <item> <anchor>站内消息</anchor> <id>child_6_5</id> <link>/message</link> </item> <item> <anchor>关键词</anchor> <id>child_6_6</id> <link>/word</link> </item> </group> </root>
import java.io.InputStream; import java.util.ArrayList; import java.util.Iterator; import java.util.List; import net.project.entity.Group; import net.project.entity.GroupItem; import org.dom4j.Document; import org.dom4j.DocumentException; import org.dom4j.Element; import org.dom4j.Node; import org.dom4j.io.SAXReader; /** * 传统的sax解析 * @author xiaofanku * 20130701 */ public class ParserManagerPanel { private static ParserManagerPanel instance=null; private final List<Group> group; private ParserManagerPanel(InputStream stream){ this.group=new ArrayList<Group>(); try{ parser(new SAXReader().read(stream)); }catch(DocumentException e){ e.printStackTrace(); } } private void parser(final Document doc){ List<Node> list = doc.selectNodes("//group"); for (Iterator<Node> iter = list.iterator(); iter.hasNext(); ) { Element currentGroup=(Element)iter.next(); Group mg=new Group(); String defaultLink=currentGroup.attributeValue("link"); if(defaultLink==null || defaultLink.isEmpty()){ defaultLink="-"; } mg.setLink(defaultLink); mg.setName(currentGroup.attributeValue("name")); mg.setSymbol(currentGroup.attributeValue("symbol")); List<Node> groupChild=currentGroup.selectNodes("./item"); for(Node currentItem:groupChild){ Element anchor=(Element)currentItem.selectSingleNode("./anchor"); Element idEle=(Element)currentItem.selectSingleNode("./id"); Element link=(Element)currentItem.selectSingleNode("./link"); try{ GroupItem item=new GroupItem(); item.setAnchor(anchor.getText()); item.setId(idEle.getText()); item.setLink(link.getText()); mg.getItems().add(item); }catch(NullPointerException e){ e.printStackTrace(); } } group.add(mg); } } public static ParserManagerPanel getInstance(InputStream input){ if(instance==null){ instance=new ParserManagerPanel(input); } return instance; } public List<Group> getStruct(){ return group; } }
public class Group implements Serializable{ /** * */ private static final long serialVersionUID = 1L; private String name; private String symbol; private String link; private List<GroupItem> items= null; public Group() { super(); // TODO Auto-generated constructor stub items=new ArrayList<>(); } //SET/GET }
public class GroupItem implements Serializable{ /** * */ private static final long serialVersionUID = 1L; private String anchor; private String id; private String link; public GroupItem() { super(); // TODO Auto-generated constructor stub } //SET/GET }
@XmlAccessorType(XmlAccessType.FIELD) @XmlRootElement(name="group") public class Group implements Serializable{ /** * */ private static final long serialVersionUID = 1L; @XmlAttribute private String name; @XmlAttribute private String symbol; @XmlAttribute(required = false) private String link; @XmlElement(name="item") private List<GroupItem> items= null; public Group() { super(); // TODO Auto-generated constructor stub items=new ArrayList<>(); } //GET/SET }
@XmlAccessorType(XmlAccessType.FIELD) @XmlRootElement(name="item") public class GroupItem implements Serializable{ /** * */ private static final long serialVersionUID = 1L; @XmlElement private String anchor; @XmlElement private String id; @XmlElement private String link; public GroupItem() { super(); // TODO Auto-generated constructor stub } //GET/SET }
@XmlAccessorType(XmlAccessType.FIELD) @XmlRootElement(name="root") public class GroupPanel { @XmlElement(name="group") private List<Group> groups= null; @XmlAttribute private String ico; public GroupPanel() { super(); // TODO Auto-generated constructor stub groups=new ArrayList<>(); } //GET/SET }
JAXBContext jc = JAXBContext.newInstance(GroupPanel.class, Group.class, GroupItem.class); Unmarshaller u = jc.createUnmarshaller(); GroupPanel gs = (GroupPanel) u.unmarshal(new File("/managerGroup.xml"));