java xml 和 dom4j 使用ABC

java应用中到处可见xml的身影 ,从web.xml到各框架的配置文件,至wsdp中多数是xml:jaxp(java api for xml processing),jaxb(java Architecture for xml binding),sjsxp(sun java streaming xml parser),....可以访问wsdp(java web service developer pack)的网址:http://www.oracle.com/technetwork/java/webservicespack-jsp-140788.html.我只起个头.里面涉及的知识很广。看完本文的读者可以阅读这两本书:java and xml,java web service

part 1: java xml

1.xml解析器

相信多数java程序员都知道,官方只负责组织架构api,api的实现由不同的厂商来开发.在xml上,好像是迟了一点,在StAX面世之后,jaxp还没出炉,已知的技术有:

1.1 w3c.org 的Dom,跨语言的,跨平台的文档对象模型 .这个不论是在前端和服务器中都有汲列。Document是可读可写的内存树,在非顺序读和写的时候比较出色通常配合xpath

1.2 saxproject.org的sax,也是跨语言的,基于事件回调的处理模型, 比dom的优势是省内存,是顺序读。

1.3 bea的StAX试图统一dom和sax的领土,在jsr 173项目之下.网址:https://www.jcp.org/en/jsr/detail?id=173.我没用过,就不多说了


上面的都是卖点,但不是解析器。我知道的解析器有apache 的 xerces:http://xerces.apache.org/.

支持

    SAX 2.0.2
    DOM Level 3 Core, Load and Save
    DOM Level 2 Core, Events, Traversal and Range
    JAXP 1.4
    StAX 1.0 Event API (javax.xml.stream.events)


2.jaxp是什么

是java对sax,dom的封口,让你用一个jaxp即可使用dom,也可以使用sax.用sax解析xml时使用SAXParserFactory;用dom解析时使用DocumentBuilderFactory


3.jaxb是什么

在java对象和xml之间架起一座桥梁。让你不用理会dom,sax,stax.你面对的要么是xml,要么是java bean.使用marshaller把java对象转到xml(可以一个实例一个xml也可以多个实例一个xml),unmarshaller把xml中的数据还原为java对象实例 ,包的位置:javax.xml.bind.


part 2: dom4j

1.dom4j不是xml解析器,它跟jdom不同之处在于其提供了一套xml抽像接口,顶接口:node,Attribute, Branch, CDATA, CharacterData, Comment, Document(不是w3c的Document), DocumentType, Element, Entity, ProcessingInstruction, Text都是node的子接口


2.默认工厂:DocumentFactory

还有几个具体用处的子工厂:BeanDocumentFactory, DatatypeDocumentFactory, DatatypeElementFactory, DOMDocumentFactory, IndexedDocumentFactory, NonLazyDocumentFactory, UserDataDocumentFactory


说一说:DOMDocumentFactory,它继承了DocumentFactory并实现了org.w3c.dom.DOMImplementation.如果有一个方法接受org.w3c.dom.Element,你可以传给它一个DOMDocumentFactory实例创建的org.dom4j.Element


3.dom4j中也可以使用sax,dom,stax来解析xml,创建的解析器是通过jaxp创建的,他们都在org.dom4j.io包中,org.dom4j.io.DOMReader,org.dom4j.io.SAXReader.org.dom4j.io.XPP3Reader.解析方法都是:read


4.序列化:指的是输出到字符串对象,文件,控制台,可使用:org.dom4j.io.XMLWriter,除此之外还有:

org.dom4j.io.DOMWriter输出到org.dom4j.document中返回一个org.w3c.dom.Document,

org.dom4j.io.SAXWriter输出到org.xml.sax.ContentHandler


5.使用Dom4j解析RSS url

5.1 使用ElementHandler

public class SAXRssParser{
	private final SAXReader reader;
	private final List<RssItem> items;
	
	public SAXRssParser() {
		super();
		this.reader =  new SAXReader();
		this.items = new ArrayList<>();
	}

	public boolean parser(final URL url) {
		// TODO Auto-generated method stub
		reader.addHandler("/rss/channel/item",new ElementHandler(){
			final ItemChildElementHandler titleHandler=new ItemChildElementHandler();
			final ItemChildElementHandler linkHandler=new ItemChildElementHandler();
			final ItemChildElementHandler dateHandler=new ItemChildElementHandler();
			final ItemChildElementHandler descripHandler=new ItemChildElementHandler();
			
			@Override
			public void onStart(ElementPath elementPath) {
				// TODO Auto-generated method stub
				elementPath.addHandler("title",titleHandler);
				elementPath.addHandler("link",linkHandler);
				elementPath.addHandler("pubDate",dateHandler);
				elementPath.addHandler("description",descripHandler);
			}
			
			@Override
			public void onEnd(ElementPath elementPath) {
				// TODO Auto-generated method stub
				elementPath.removeHandler("title");
				elementPath.removeHandler("link");
				elementPath.removeHandler("pubDate");
				elementPath.removeHandler("description");
				try {
					URL curURL = processRemoteLink(linkHandler.getNodeContent(),url);//处理方法
					Date curDate = processDate(dateHandler.getNodeContent());//处理方法
					items.add(
							new RssItem(
									curURL, 
									titleHandler.getNodeContent(), 
									descripHandler.getNodeContent(),
									curDate));
				} catch (MalformedURLException e) {
					e.printStackTrace();
				}
			}
		});
		try {
			reader.read(url);
		} catch (DocumentException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
		return items.size()>0?true:false;
	}
	
	
	public List<RssItem> getEntryList(){
		return items;
	}
	
	private class ItemChildElementHandler implements ElementHandler{
		private String tagName;
		private String tagText;
		
		@Override
		public void onStart(ElementPath elementPath) {
			// TODO Auto-generated method stub
			Element elt = elementPath.getCurrent();
			tagName=elt.getName();
		}
		@Override
		public void onEnd(ElementPath elementPath) {
			// TODO Auto-generated method stub
			Element elt = elementPath.getCurrent();
			tagText=elt.getText();
		}
		
		@SuppressWarnings("unused")
		public String getNodeNames(){
			return tagName;
		}
		
		public String getNodeContent(){
			return tagText;
		}
	}
}

RssItem对像

@ThreadSafe
public class RssItem implements Serializable{
	private static final long serialVersionUID = 673250215751499564L;
	/**
	 * 条目的连接地址
	 */
	private final URL url;
	/**
	 * 条目标题
	 */
	private final String title;
	/**
	 * 条目简述
	 */
	private final String description;
	/**
	 * 条目发布日期
	 */
	private final Date date;
	
	public RssItem(
			URL url, 
			String title, 
			String description, 
			Date date) {
		super();
		this.url = url;
		this.title = title;
		this.description = description;
		this.date = date;
	}
	public URL getUrl() {
		return url;
	}
	public String getTitle() {
		return title;
	}
	public String getDescription() {
		return description;
	}
	public Date getDate() {
		return date;
	}
	@Override
	public int hashCode() {
                //ETC
	}
	@Override
	public boolean equals(Object obj) {
                //ETC
	}
	@Override
	public String toString() {
                //ETC
	}
}

测试

public class SAXRssParserTest {

	public static void main(String[] args) {
		// TODO Auto-generated method stub
		String b="http://news.baidu.com/n?cmd=7&loc=4075&name=%D1%CC%CC%A8&tn=rss";
		final long beginTime=System.nanoTime();
		SAXRssParser sap=new SAXRssParser();
		try{
			if(sap.parser(new URL(b))){
				List<RssItem> news=sap.getEntryList();
				System.out.println("size:"+news.size());
				for(RssItem ri:news){
					System.out.println("title:"+ri.getTitle()+"@"+ri.getDate());
					System.out.println("link:"+ri.getUrl());
				}
			}
		}catch(MalformedURLException e){
			e.printStackTrace();
		}
		final long endTime=System.nanoTime();
		System.out.println("used Second: "+(endTime-beginTime)/1.0e9);
	}

}

5.2 使用VisitorSupport

public class SAXRssParser{
	private final SAXReader reader;
	private final List<RssItem> items;
	
	public SAXRssParser() {
		super();
		this.reader =  new SAXReader();
		this.items = new ArrayList<>();
	}

	public boolean parser(final URL url) {
		// TODO Auto-generated method stub
                Document document=reader.read(url);
                final RssVisitorSupport rvs=new RssVisitorSupport(url);
		document.accept(rvs);
		items.addAll(rvs.getNews());
		return rvs.getTotalStep()>0?true:false;
	}
	
	public List<RssItem> getEntryList(){
		return items;
	}
	
	class RssVisitorSupport extends VisitorSupport{
		private int step=0;
		private RssItemBuilder build=null;
		private final List<RssItem> news;
		private final URL referURL;
		
		public RssVisitorSupport(final URL referURL){
			this.referURL=referURL;
			this.news=new ArrayList<>();
		}
		@Override
		public void visit(Element node) {
			// TODO Auto-generated method stub
			String eleName=node.getName();
			if(eleName.equals("item")){
				build=new RssItemBuilder();
				step++;
			}
			if (eleName.equals("title") && build!=null) {
				build.setTitle(node.getText());
			}
			if (eleName.equals("link") && build!=null) {
				try{
					build.setURL(processRemoteLink(node.getText(),referURL));
				}catch(MalformedURLException e){
					e.printStackTrace();
				}
			}
			if (eleName.equals("pubDate") && build!=null) {
				build.setDate(processDate(node.getText()));
			}
			if (eleName.equals("description") && build!=null) {
				build.setDescription(node.getText());
			}
			if(build!=null && !build.isEmpty()){
				news.add(build.build());
				build=null;//不设置此值会出现重复数据
			}
		}
		public int getTotalStep(){
			return step;
		}
		public List<RssItem> getNews(){
			return news;
		}
	}
}

由于RssItem设计为不可变对象,所以在RssVisitorSupport中使用的对象:RssItemBuilder,使用了构建模式。关于Builder设计模式可以参考此文:

Builder Design Pattern in Java

我测了几个rss地址发现:VisitorSupport > ElementHandler > Iterator


6.jaxb示例

场景:以前在写后台程序时都有一个功能管理菜单,不知道jaxb为何时,都会创建一份xml,用一种解析器在程序启动时创建一个单例

6.1功能管理菜单xml

<?xml version="1.0" encoding="UTF-8"?>
<root ico="sec">
	<group name="会员管理" link="/user" symbol="sec_1">
		<item>
			<anchor>会员列表</anchor>
			<id>child_1_1</id>
			<link>/user</link>
		</item>
		<item>
			<anchor>个人信息</anchor>
			<id>child_1_2</id>
			<link>/user/person</link>
		</item>
		<item>
			<anchor>企业信息</anchor>
			<id>child_1_3</id>
			<link>/user/company</link>
		</item>
		<item>
			<anchor>安全问题</anchor>
			<id>child_1_4</id>
			<link>/user/secret</link>
		</item>
		<item>
			<anchor>信用记录</anchor>
			<id>child_1_5</id>
			<link>/user/trust</link>
		</item>
	</group>
	<group name="商品管理" link="/product" symbol="sec_2">
		<item>
			<anchor>商品列表</anchor>
			<id>child_2_1</id>
			<link>/product</link>
		</item>
		<item>
			<anchor>交易帐号</anchor>
			<id>child_2_2</id>
			<link>/product/account</link>
		</item>
		<item>
			<anchor>扩展字段</anchor>
			<id>child_2_3</id>
			<link>/product/field</link>
		</item>
		<item>
			<anchor>类型模板</anchor>
			<id>child_2_4</id>
			<link>/product/field/template</link>
		</item>
	</group>
	<group name="订单管理" link="/order" symbol="sec_3">
		<item>
			<anchor>订单列表</anchor>
			<id>child_3_1</id>
			<link>/order</link>
		</item>
		<item>
			<anchor>清单管理</anchor>
			<id>child_3_2</id>
			<link>/order/inventory</link>
		</item>
		<item>
			<anchor>点评管理</anchor>
			<id>child_3_3</id>
			<link>/order/pointer</link>
		</item>
	</group>
	<group name="财务管理" symbol="sec_4">
		<item>
			<anchor>网银交易渠道</anchor>
			<id>child_4_1</id>
			<link>/channel</link>
		</item>
		<item>
			<anchor>充值记录</anchor>
			<id>child_4_2</id>
			<link>/channel/cache</link>
		</item>
		<item>
			<anchor>银行卡管理</anchor>
			<id>child_4_3</id>
			<link>/bank/card</link>
		</item>	
		<item>
			<anchor>帐单管理</anchor>
			<id>child_4_4</id>
			<link>/bill</link>
		</item>
		<item>
			<anchor>现金记录</anchor>
			<id>child_4_5</id>
			<link>/bank/saction</link>
		</item>
		<item>
			<anchor>支付宝转账记录</anchor>
			<id>child_4_6</id>
			<link>/bill/ali</link>
		</item>
	</group>
	<group name="新闻管理" link="/news" symbol="sec_5">
		<item>
			<anchor>新闻列表</anchor>
			<id>child_5_1</id>
			<link>/news</link>
		</item>
		<item>
			<anchor>新闻栏目</anchor>
			<id>child_5_2</id>
			<link>/news/category</link>
		</item>
		<item>
			<anchor>新闻标题标识</anchor>
			<id>child_5_3</id>
			<link>/news/level</link>
		</item>
	</group>
	<group name="系统管理" symbol="sec_6">
		<item>
			<anchor>投诉/意见反馈</anchor>
			<id>child_6_1</id>
			<link>/feedback</link>
		</item>
		<item>
			<anchor>活跃日志</anchor>
			<id>child_6_2</id>
			<link>/user/active</link>
		</item>
		<item>
			<anchor>会员等级</anchor>
			<id>child_6_3</id>
			<link>/user/level</link>
		</item>
		<item>
			<anchor>手机短信</anchor>
			<id>child_6_4</id>
			<link>/recaptcha</link>
		</item>
		<item>
			<anchor>站内消息</anchor>
			<id>child_6_5</id>
			<link>/message</link>
		</item>
		<item>
			<anchor>关键词</anchor>
			<id>child_6_6</id>
			<link>/word</link>
		</item>
	</group>
</root>


6.2使用dom4j的saxreader解析上面的xml文件

import java.io.InputStream;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import net.project.entity.Group;
import net.project.entity.GroupItem;
import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.Element;
import org.dom4j.Node;
import org.dom4j.io.SAXReader;
/**
 * 传统的sax解析
 * @author xiaofanku
 * 20130701
 */
public class ParserManagerPanel {
	private static ParserManagerPanel instance=null;
	private final List<Group> group;
	
	private ParserManagerPanel(InputStream stream){
		this.group=new ArrayList<Group>();
		
		try{
			parser(new SAXReader().read(stream));
		}catch(DocumentException e){
			e.printStackTrace();
		}
	}
	private void parser(final Document doc){
		
			List<Node> list = doc.selectNodes("//group");
			for (Iterator<Node> iter = list.iterator(); iter.hasNext(); ) {
				Element currentGroup=(Element)iter.next();
				
				Group mg=new Group();
				String defaultLink=currentGroup.attributeValue("link");
				if(defaultLink==null || defaultLink.isEmpty()){
					defaultLink="-";
				}
				mg.setLink(defaultLink);
				mg.setName(currentGroup.attributeValue("name"));
				mg.setSymbol(currentGroup.attributeValue("symbol"));
				
				List<Node> groupChild=currentGroup.selectNodes("./item");
				for(Node currentItem:groupChild){
					Element anchor=(Element)currentItem.selectSingleNode("./anchor");
					Element idEle=(Element)currentItem.selectSingleNode("./id");
					Element link=(Element)currentItem.selectSingleNode("./link");
					try{
						GroupItem item=new GroupItem();
						item.setAnchor(anchor.getText());
						item.setId(idEle.getText());
						item.setLink(link.getText());
						mg.getItems().add(item);
					}catch(NullPointerException e){
						e.printStackTrace();
					}
				}
				group.add(mg);
			}

		
	}
	public static ParserManagerPanel getInstance(InputStream input){
		if(instance==null){
			instance=new ParserManagerPanel(input);
		}
		return instance;
	}
	public List<Group> getStruct(){
		return group;
	}
}

其中汲及的对象

public class Group implements Serializable{
	
	/**
	 * 
	 */
	private static final long serialVersionUID = 1L;
	private String name;
	private String symbol;
	private String link;
	private List<GroupItem> items= null;
	
	public Group() {
		super();
		// TODO Auto-generated constructor stub
		items=new ArrayList<>();
	}
        //SET/GET
}
public class GroupItem implements Serializable{
	/**
	 * 
	 */
	private static final long serialVersionUID = 1L;
	private String anchor;
	private String id;
	private String link;
	
	public GroupItem() {
		super();
		// TODO Auto-generated constructor stub
	}
        //SET/GET
}


6.3如果使用jaxb只需要多加几个注解,完全可以不用dom4j来将xml转成对象

@XmlAccessorType(XmlAccessType.FIELD)
@XmlRootElement(name="group")
public class Group implements Serializable{
	
	/**
	 * 
	 */
	private static final long serialVersionUID = 1L;

	@XmlAttribute
	private String name;
	
	@XmlAttribute
	private String symbol;
	
	@XmlAttribute(required = false)
	private String link;
	
	@XmlElement(name="item")
	private List<GroupItem> items= null;
	
	public Group() {
		super();
		// TODO Auto-generated constructor stub
		items=new ArrayList<>();
	}
        //GET/SET
}

@XmlAccessorType(XmlAccessType.FIELD)
@XmlRootElement(name="item")
public class GroupItem implements Serializable{
	/**
	 * 
	 */
	private static final long serialVersionUID = 1L;
	@XmlElement
	private String anchor;
	
	@XmlElement
	private String id;
	
	@XmlElement
	private String link;
	
	public GroupItem() {
		super();
		// TODO Auto-generated constructor stub
	}
        //GET/SET
}

新增一个类

@XmlAccessorType(XmlAccessType.FIELD)
@XmlRootElement(name="root")
public class GroupPanel {

	@XmlElement(name="group")
	private List<Group> groups= null;
	
	@XmlAttribute
	private String ico;
	
	public GroupPanel() {
		super();
		// TODO Auto-generated constructor stub
		groups=new ArrayList<>();
	}
        //GET/SET
}


最后是调用时的测试代码

			JAXBContext jc = JAXBContext.newInstance(GroupPanel.class, Group.class, GroupItem.class);
			Unmarshaller u = jc.createUnmarshaller();
			GroupPanel gs = (GroupPanel) u.unmarshal(new File("/managerGroup.xml"));


你可能感兴趣的:(java,xml,dom4j,dom,sax)