版权声明:
本文由冰云完成,首发于CSDN,未经许可,不得使用于任何商业用途。
文中代码部分引用自DOM4J文档。
欢迎转载,但请保持文章及版权声明完整。
如需联络请发邮件:icecloud(AT)sina.com
|
Attribute
|
Attribute定义了XML的属性
|
Branch
|
Branch为能够包含子节点的节点如XML元素(Element)和文档(Docuemnts)定义了一个公共的行为,
|
CDATA
|
CDATA 定义了XML CDATA 区域
|
CharacterData
|
CharacterData是一个标识借口,标识基于字符的节点。如CDATA,Comment, Text.
|
Comment
|
Comment 定义了XML注释的行为
|
Document
|
定义了XML文档
|
DocumentType
|
DocumentType 定义XML DOCTYPE声明
|
Element
|
Element定义XML 元素
|
ElementHandler
|
ElementHandler定义了 Element 对象的处理器
|
ElementPath
|
被
ElementHandler 使用,用于取得当前正在处理的路径层次信息
|
Entity
|
Entity定义 XML entity
|
Node
|
Node为所有的dom4j中XML节点定义了多态行为
|
NodeFilter
|
NodeFilter 定义了在dom4j节点中产生的一个滤镜或谓词的行为(predicate)
|
ProcessingInstruction
|
ProcessingInstruction 定义 XML 处理指令.
|
Text
|
Text 定义XML 文本节点.
|
Visitor
|
Visitor 用于实现Visitor模式.
|
XPath
|
XPath 在分析一个字符串后会提供一个XPath 表达式
|
// 从文件读取XML,输入文件名,返回XML文档
public Document read(String fileName)
throws MalformedURLException, DocumentException {
SAXReader reader =
new SAXReader();
Document document = reader.read(
new File(fileName));
return document;
}
|
public Element getRootElement(Document doc){
return doc.getRootElement();
}
|
// 枚举所有子节点
for ( Iterator i = root.elementIterator(); i.hasNext(); ) {
Element element = (Element) i.next();
// do something
}
// 枚举名称为foo的节点
for ( Iterator i = root.elementIterator(foo); i.hasNext();) {
Element foo = (Element) i.next();
// do something
}
// 枚举属性
for ( Iterator i = root.attributeIterator(); i.hasNext(); ) {
Attribute attribute = (Attribute) i.next();
// do something
}
|
public void treeWalk() {
treeWalk(getRootElement());
}
public void treeWalk(Element element) {
for (
int i = 0, size = element.nodeCount(); i < size; i++) {
Node node = element.node(i);
if (node
instanceof Element) {
treeWalk((Element) node);
}
else { // do something....
}
}
}
|
public class MyVisitor
extends VisitorSupport {
public void visit(Element element){
System.out.println(element.getName());
}
public void visit(Attribute attr){
System.out.println(attr.getName());
}
}
调用: root.accept(new MyVisitor())
|
public void bar(Document document) {
List list = document.selectNodes( //foo/bar );
Node node = document.selectSingleNode(//foo/bar/author);
String name = node.valueOf( @name );
}
|
public void findLinks(Document document)
throws DocumentException {
List list = document.selectNodes( //a/@href );
for (Iterator iter = list.iterator(); iter.hasNext(); ) {
Attribute attribute = (Attribute) iter.next();
String url = attribute.getValue();
}
}
|
// XML转字符串
Document document = ...;
String text = document.asXML();
// 字符串转XML
String text = <person> <name>James</name> </person>;
Document document = DocumentHelper.parseText(text);
|
public Document styleDocument(
Document document,
String stylesheet
)
throws Exception {
// load the transformer using JAXP
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer(
new StreamSource( stylesheet )
);
// now lets style the given document
DocumentSource source =
new DocumentSource( document );
DocumentResult result =
new DocumentResult();
transformer.transform( source, result );
// return the transformed document
Document transformedDoc = result.getDocument();
return transformedDoc;
}
|
public Document createDocument() {
Document document = DocumentHelper.createDocument();
Element root = document.addElement(root);
Element author1 =
root
.addElement(author)
.addAttribute(name, James)
.addAttribute(location, UK)
.addText(James Strachan);
Element author2 =
root
.addElement(author)
.addAttribute(name, Bob)
.addAttribute(location, US)
.addText(Bob McWhirter);
return document;
}
|
FileWriter out =
new FileWriter( foo.xml );
document.write(out);
|
public void write(Document document)
throws IOException {
// 指定文件
XMLWriter writer =
new XMLWriter(
new FileWriter( output.xml )
);
writer.write( document );
writer.close();
// 美化格式
OutputFormat format = OutputFormat.createPrettyPrint();
writer =
new XMLWriter( System.out, format );
writer.write( document );
// 缩减格式
format = OutputFormat.createCompactFormat();
writer =
new XMLWriter( System.out, format );
writer.write( document );
}
|
holen.xml
|
<?xml version="1.0" encoding="UTF-8"?>
<books>
<!--This is a test for dom4j, holen, 2004.9.11-->
<book show="yes">
<title>Dom4j Tutorials</title>
</book>
<book show="yes">
<title>Lucene Studing</title>
</book>
<book show="no">
<title>Lucene in Action</title>
</book>
<owner>O'Reilly</owner>
</books>
|
|
/**
* 建立一个XML文档,文档名由输入属性决定
*
@param filename 需建立的文件名
*
@return 返回操作结果, 0表失败, 1表成功
*/
public int createXMLFile(String filename){
/** 返回操作结果, 0表失败, 1表成功 */
int returnValue = 0;
/** 建立document对象 */
Document document = DocumentHelper.createDocument();
/** 建立XML文档的根books */
Element booksElement = document.addElement("books");
/** 加入一行注释 */
booksElement.addComment("This is a test for dom4j, holen, 2004.9.11");
/** 加入第一个book节点 */
Element bookElement = booksElement.addElement("book");
/** 加入show属性内容 */
bookElement.addAttribute("show","yes");
/** 加入title节点 */
Element titleElement = bookElement.addElement("title");
/** 为title设置内容 */
titleElement.setText("Dom4j Tutorials");
/** 类似的完成后两个book */
bookElement = booksElement.addElement("book");
bookElement.addAttribute("show","yes");
titleElement = bookElement.addElement("title");
titleElement.setText("Lucene Studing");
bookElement = booksElement.addElement("book");
bookElement.addAttribute("show","no");
titleElement = bookElement.addElement("title");
titleElement.setText("Lucene in Action");
/** 加入owner节点 */
Element ownerElement = booksElement.addElement("owner");
ownerElement.setText("O'Reilly");
try{
/** 将document中的内容写入文件中 */
XMLWriter writer =
new XMLWriter(
new FileWriter(
new File(filename)));
writer.write(document);
writer.close();
/** 执行成功,需返回1 */
returnValue = 1;
}
catch(Exception ex){
ex.printStackTrace();
}
return returnValue;
}
|
|
<?xml version="1.0" encoding="UTF-8"?>
<books><!--This is a test for dom4j, holen, 2004.9.11--><book show="yes"><title>Dom4j Tutorials</title></book><book show="yes"><title>Lucene Studing</title></book><book show="no"><title>Lucene in Action</title></book><owner>O'Reilly</owner></books>
|
|
/**
* 修改XML文件中内容,并另存为一个新文件
* 重点掌握dom4j中如何添加节点,修改节点,删除节点
*
@param filename 修改对象文件
*
@param newfilename 修改后另存为该文件
*
@return 返回操作结果, 0表失败, 1表成功
*/
public int ModiXMLFile(String filename,String newfilename){
int returnValue = 0;
try{
SAXReader saxReader =
new SAXReader();
Document document = saxReader.read(
new File(filename));
/** 修改内容之一: 如果book节点中show属性的内容为yes,则修改成no */
/** 先用xpath查找对象 */
List list = document.selectNodes("/books/book/@show" );
Iterator iter = list.iterator();
while(iter.hasNext()){
Attribute attribute = (Attribute)iter.next();
if(attribute.getValue().equals("yes")){
attribute.setValue("no");
}
}
/**
* 修改内容之二: 把owner项内容改为Tshinghua
* 并在owner节点中加入date节点,date节点的内容为2004-09-11,还为date节点添加一个属性type
*/
list = document.selectNodes("/books/owner" );
iter = list.iterator();
if(iter.hasNext()){
Element ownerElement = (Element)iter.next();
ownerElement.setText("Tshinghua");
Element dateElement = ownerElement.addElement("date");
dateElement.setText("2004-09-11");
dateElement.addAttribute("type","Gregorian calendar");
}
/** 修改内容之三: 若title内容为Dom4j Tutorials,则删除该节点 */
list = document.selectNodes("/books/book");
iter = list.iterator();
while(iter.hasNext()){
Element bookElement = (Element)iter.next();
Iterator iterator = bookElement.elementIterator("title");
while(iterator.hasNext()){
Element titleElement=(Element)iterator.next();
if(titleElement.getText().equals("Dom4j Tutorials")){
bookElement.remove(titleElement);
}
}
}
try{
/** 将document中的内容写入文件中 */
XMLWriter writer =
new XMLWriter(
new FileWriter(
new File(newfilename)));
writer.write(document);
writer.close();
/** 执行成功,需返回1 */
returnValue = 1;
}
catch(Exception ex){
ex.printStackTrace();
}
}
catch(Exception ex){
ex.printStackTrace();
}
return returnValue;
}
|
|
/**
* 格式化XML文档,并解决中文问题
*
@param filename
*
@return
*/
public int formatXMLFile(String filename){
int returnValue = 0;
try{
SAXReader saxReader =
new SAXReader();
Document document = saxReader.read(
new File(filename));
XMLWriter writer =
null;
/** 格式化输出,类型IE浏览一样 */
OutputFormat format = OutputFormat.createPrettyPrint();
/** 指定XML编码 */
format.setEncoding("GBK");
//org.dom4j.io.XMLWriter writer = new org.dom4j.io.XMLWriter(new FileOutputStream(file), format);
writer=
new XMLWriter(
new FileWriter(
new File(filename)),format);
writer.write(document);
writer.close();
/** 执行成功,需返回1 */
returnValue = 1;
}
catch(Exception ex){
ex.printStackTrace();
}
return returnValue;
}
|
Dom4jDemo.java
|
package com.holen.dom4j;
import java.io.File;
import java.io.FileWriter;
import java.util.Iterator;
import java.util.List;
import org.dom4j.Attribute;
import org.dom4j.Document;
import org.dom4j.DocumentHelper;
import org.dom4j.Element;
import org.dom4j.io.OutputFormat;
import org.dom4j.io.SAXReader;
import org.dom4j.io.XMLWriter;
/**
*
@author Holen Chen</< div>
|
dom4j学习总结(二)
(一)移除节点及属性
输出结果为:
1。正确的删除了类型为society的book节点
<?xml version="1.0" encoding="UTF-8"?>
<root><book type="science"><Name>Java</Name><price>100</price></book><author><name>chb</name><sex>boy</sex></author></root>
2。这样是不能删除sex节点的
<?xml version="1.0" encoding="UTF-8"?>
<root><book type="science"><Name>Java</Name><price>100</price></book><author><name>chb</name><sex>boy</sex></author></root>
3。这样就可以正确删除sex节点
<?xml version="1.0" encoding="UTF-8"?>
<root><book type="science"><Name>Java</Name><price>100</price></book><author><name>chb</name></author></root>
4。正确删除book节点的type属性
<?xml version="1.0" encoding="UTF-8"?>
<root><book><Name>Java</Name><price>100</price></book><author><name>chb</name></author></root>
分析:
第二个输出结果不能删除sex节点,我们需要看dom4j的API
public boolean remove(Element element)
Element
if the node is an
immediate child of this branch. If the given node is not an immediate child of this branch then the
Node.detach()
method should be used instead.
element
- is the element to be removed
从中我们可以看出,remove只能用在它自己的直接孩子节点上,不能用在孙子节点上,因为sex节点不是root节点的直接孩子节点,所以不能删除;而sex节点却是author节点的直接孩子节点,所以第三个输出可以删除。
(二)将两个Document合并为一个Document
先看一个错误的情况
(1)使用add()方法添加
调用CombineDocument函数,会出现以下错误:
org.dom4j.IllegalAddException: The node "org.dom4j.tree.DefaultElement@17bd6a1 [Element: <author attributes: []/>]" could not be added to the element "root" because: The Node already has an existing parent of "root"
at org.dom4j.tree.AbstractElement.addNode(AbstractElement.java:1521)
at org.dom4j.tree.AbstractElement.add(AbstractElement.java:1002)
at xml_chb.dom4j_chb.CombineDocument(dom4j_chb.java:189)
at xml_chb.dom4j_chb.main(dom4j_chb.java:199)
Exception in thread "main"
即提示author节点已经有一个root节点了,不能再添加到另一个节点上去。
(2)使用appendContent()方法
即将doc_book.getRootElement().add(author);
改为:doc_book.getRootElement().appendContent(author);
输出结果为:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<book type="science"><Name>Java</Name><price>100</price></book>
<book type="society"><Name>Society security</Name><price>130</price></book>
<name>chb</name><sex>boy</sex>
</root>
可以看出,缺少了author节点,只是把author节点的子节点添加上去了,但是由此可见,appendContent方法是有希望的。
我们看一下dom4j的API:
public void appendContent(Branch branch)
Collection.addAll(java.util.Collection)
method.
branch
- is the branch whose content will be added to me.
(3)使用正确的appendContent方法
将:Element author=(Element)doc_author.selectSingleNode("//author");
doc_book.getRootElement().appendContent(author);
改为:doc_book.getRootElement().appendContent(doc_author.getRootElement());
输出:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<book type="science"><Name>Java</Name><price>100</price></book>
<book type="society"><Name>Society security</Name><price>130</price></book>
<author><name>chb</name><sex>boy</sex></author>
</root>
是正确结果
(4)另一种可行的方法
Dom4j 编码问题彻底解决 作者:lonsen
http://www.5inet.net/Develop/Java/036579,Dom4j_BianMaWenDiCheDeJieJue.aspx
“中文问题没商量”之Dom4j中的编码问题
作者: 盛忠良
http://blog.lupaworld.com/blog/htm/do_showone/tid_2261.html
JAVA编码问题的一些理解
http://www.jspcn.net/htmlnews/11049393353751902.html
用dom4j解析中文字符时,出现org.dom4j.DocumentException: Invalid byte 1 of 1-byte UTF-8 sequence.各位帮小弟看看
http://dev.9983.com/ku/5403/4683267.asp
自己的总结:
1、“org.dom4j.DocumentException: Invalid byte 1 of 1-byte UTF-8 sequence.”异常分析和解决:
分析:
该异常由下面的reader.read(file);语句抛出:
SAXReader reader = new SAXReader();
Document doc = reader.read(file);
产生这个异常的原因是:
所读的xml文件实际是GBK或者其他编码的,而xml内容中却用<?xml version="1.0" encoding="utf-8"?>指定编码为utf-8,所以就报异常了!
注释:参考网上的《Java/J2EE中文问题终极解决之道》一文,编码问题原因应该是:操作系统编码为GBK,而xml指定为utf-8,SAXReader使用系统的默认编码GBK,所以存在需要转换编码的问题,也就自然会出现乱码了!解决:让文件编码和java 操作该文件的接口的编码一致;
解决:
情况一:该xml文件由dom4j生成;
解决方法:用 org.dom4j.io.XMLWriter xmlWriter = new org.dom4j.io.XMLWriter(
new FileOutputStream(fileName));
代替
xmlWriter = new XMLWriter(new FileWriter(fileName));
,指定编码为utf-8生成xml文件;
详细参考资料1:
Dom4j 编码问题彻底解决 作者:lonsen
http://www.5inet.net/Develop/Java/036579,Dom4j_BianMaWenDiCheDeJieJue.aspx
情况二:解析从jsp页面中读取到的用户输入的xml描述内容时,reader.read()抛出异常;
解决方法:
调用read前先把xml内容转为utf-8编码:(使用支持编码格式的函数)
public void validate(FacesContext context, UIComponent component, Object obj)
throws ValidatorException {
String xmldescription = (String) obj;
byte[] bytes =xmldescription.getBytes();
RelationXmlParser.isXmlOK("E://jiangcm//templateXMLSchema.xsd",bytes);
……
}
public static boolean isXmlOK(String xsdFile, byte[] tagetXml) throws SAXException, IOException, DocumentException
{
SAXReader reader = new SAXReader();
……
InputStream in = new ByteArrayInputStream(tagetXml);
InputStreamReader utf8In=new InputStreamReader(in,"utf-8");
……
}