http://my.oschina.net/xpbug/blog/104412
3月21日 深圳 OSC 源创会正在报名中,送华为海思开发板
每个人都知道什么是XML,也知道它的格式.如果深入点理解如何使用XML,可能就不是每个人都知道的了. XML是一种自描述性文档,它的作用是内容的承载,和展示没有任何关系.所以,如何将XML里的数据以合理的方式取出展示,是XML编程的主要部分. 这篇文章从广度上来描述XML的一切特性.
XML有一大堆的官方文档和Spec文档以及教程.但是它们都太专业,文字太官方,又难懂,文字多,例子少,篇幅分散且跨度大. 于是需要一篇小文章,以通俗的话语以概括的角度来阐述XML领域的技术.再给几个小的example. 这就是我写这篇文章的原因.写它也是为了自我学习总结.
本文所用的代码结构如下图:
首先确定这篇文章使用的XML例子,后面所有的代码都基于此例.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
xml
version
=
"1.0"
encoding
=
"UTF-8"
?>
xml-stylesheet
type
=
"test/xsl"
href
=
"bookStore.xsl"
?>
<
bookStore
name
=
"java"
xmlns
=
"http://joey.org/bookStore"
xmlns:audlt
=
"http://japan.org/book/audlt"
xmlns:xsi
=
"http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation
=
"bookStore.xsd"
>
<
keeper
>
<
name
>Joey
name
>
keeper
>
<
books
>
<
book
id
=
"1"
>
<
title
>XML
title
>
<
author
>Steve
author
>
book
>
<
book
id
=
"2"
>
<
title
>JAXP
title
>
<
author
>Bill
author
>
book
>
<
book
id
=
"3"
audlt:color
=
"yellow"
>
<
audlt:age
> >18
audlt:age
>
<
title
>Love
title
>
<
author
>teacher
author
>
book
>
books
>
bookStore
>
|
1
|
xml
version
=
"1.0"
encoding
=
"uft-8"
>
|
1
|
|
1
2
3
|
xml-stylesheet
type
=
"text/css"
href
=
"cd_catalog.css"
?>
或者
xml-stylesheet
type
=
"text/xsl"
href
=
"simple.xsl"
?>
|
1
2
3
|
<
note
xmlns
=
"http://www.w3schools.com"
xmlns:xsi
=
"http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation
=
"http://www.w3schools.com note.xsd"
>
...
note
>
|
XML存储时所使用的字符编码. 这个编码告诉解析程序应该使用什么编码格式来对XML解码. 为了国际通用,使用UTF-8吧. 对于纯英文,UTF8只需要一个字节来表示一个英文字符. XML的size也不会太大.
命名空间语法包括声明部分 默认命名xmlns="
命名空间解决了两个问题.
1
2
|
document.getElementsByTagNameNS(
"http://japan.org/book/audlt"
,
"age"
);
document.getElementsByTagName(
"audlt:age"
);
|
验证XML合法性靠的是DTD或者XSD.这是XML的两个规范. XSD比DTD要新,所以也先进.
本文中的XML里面声明了DTD的引用,XML parser就会自动加载DTD来验证XML. 这需要给parser设定两个前提.一是开启了验证模式,而是明白DTD的加载位置. XML parser可以是JS,java或者browser. 加载位置可以使用PUBLIC ID或者SYSTEM ID来判断.请看下面的声明:
1
|
|
上面的声明没有PUBLIC ID, 只有SYSTEM ID, SYSTEM ID=XML当前路径+"/bookStore.dtd". 可见system id是一个相对与XML的路径.
声明PUBLIC ID:
1
|
|
PUBLIC ID也为"bookStore.dtd". 这时候,Parser会自动根据这两个ID去尝试加载DTD文件,如果加载不到,则抛出exception. JAVA中,我们可以通过实现EntityResolver接口的方法来自定义DTD的所在位置. 详情请看JAVA部分.
本文用的DTD是:
1
2
3
4
5
6
7
8
9
|
|
使用XSD来验证XML只需要一个XSD的定义文件,开启Parser的XSD验证功能. XSD的验证方法在后面的JAVA代码中可以看到. 本文使用的XSD如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
xml
version
=
"1.0"
encoding
=
"UTF-8"
?>
<
xsd:schema
xmlns:xsd
=
"http://www.w3.org/2001/XMLSchema"
>
<
xsd:element
name
=
"bookStore"
type
=
"bookStoreType"
/>
<
xsd:complexType
name
=
"bookStoreType"
>
<
xsd:sequence
>
<
xsd:element
name
=
"keeper"
type
=
"keeperType"
>
xsd:element
>
<
xsd:element
name
=
"books"
type
=
"booksType"
>
xsd:element
>
xsd:sequence
>
<
xsd:attribute
name
=
"name"
type
=
"xsd:string"
>
xsd:attribute
>
xsd:complexType
>
<
xsd:complexType
name
=
"keeperType"
>
<
xsd:sequence
>
<
xsd:element
name
=
"name"
type
=
"xsd:string"
>
xsd:element
>
xsd:sequence
>
xsd:complexType
>
<
xsd:complexType
name
=
"booksType"
>
<
xsd:sequence
>
<
xsd:element
name
=
"book"
type
=
"bookType"
>
xsd:element
>
xsd:sequence
>
xsd:complexType
>
<
xsd:complexType
name
=
"bookType"
>
<
xsd:sequence
>
<
xsd:element
name
=
"title"
type
=
"xsd:string"
>
xsd:element
>
<
xsd:element
name
=
"author"
type
=
"xsd:string"
>
xsd:element
>
xsd:sequence
>
<
xsd:attribute
name
=
"id"
type
=
"xsd:int"
>
xsd:attribute
>
xsd:complexType
>
xsd:schema
>
|
如下面的代码片段所示,XML可以有stylesheet转换成其他格式, 如HTML, TXT等. stylesheet可以是css,也可以是xsl.
1
|
xml-stylesheet
type
=
"test/xsl"
href
=
"bookStore.xsl"
?>
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
|
xml
version
=
"1.0"
encoding
=
"UTF-8"
?>
<
xsl:stylesheet
version
=
"1.0"
xmlns:xsl
=
"http://www.w3.org/1999/XSL/Transform"
xmlns:b
=
"http://joey.org/bookStore"
xmlns:a
=
"http://japan.org/book/audlt"
>
<
xsl:output
method
=
"html"
version
=
"1.0"
encoding
=
"UTF-8"
indent
=
"yes"
>
xsl:output
>
<
xsl:template
match
=
"/"
>
<
html
>
<
body
>
<
h2
>Book Store<<<
xsl:value-of
select
=
"/b:bookStore/@name"
>
xsl:value-of
>>>
h2
>
<
div
>
There are <
xsl:value-of
select
=
"count(/b:bookStore/b:books/b:book)"
>
xsl:value-of
> books.
div
>
<
div
>
Keeper of this store is <
xsl:value-of
select
=
"/b:bookStore/b:keeper/b:name"
>
xsl:value-of
>
div
>
<
xsl:for-each
select
=
"/b:bookStore/b:books/b:book"
>
<
div
> Book:
<
span
>title=<
xsl:value-of
select
=
"b:title"
>
xsl:value-of
>
span
>;
<
span
>author=<
xsl:value-of
select
=
"b:author"
>
xsl:value-of
>
span
>
<
xsl:if
test
=
"@a:color"
>
<
span
style
=
"color:yellow"
>H Book, require age<
xsl:value-of
select
=
"a:age"
>
xsl:value-of
>
span
>
xsl:if
>
div
>
xsl:for-each
>
body
>
html
>
xsl:template
>
xsl:stylesheet
>
|
Javascript对XML的支持在IE和FF+Chrome上是不同的. IE使用的ActiveXObject来生成一个XML的实例.FF与Chrome等其它主流浏览器均遵循w3c规范. 生成的XML document可以使用其DOM方法对dom tree进行操作. 也可以借助框架dojo,jquery等简化操作.
下面这个例子是使用JS对XML进行XSLT转化,从而生成HTML.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
|
function
createXMLDoc(xmlStr) {
var
xmlDoc;
if
(window.DOMParser) {
// FF Chrome
var
parser=
new
DOMParser();
xmlDoc=parser.parseFromString(xmlStr,
"text/xml"
);
}
else
if
(window.ActiveXObject){
// Internet Explorer
xmlDoc=
new
ActiveXObject(
"Microsoft.XMLDOM"
);
xmlDoc.async=
"false"
;
xmlDoc.loadXML(xmlStr);
}
return
xmlDoc;
}
function
transform(xmlDoc, xslDoc) {
if
(window.XSLTProcessor) {
// chrome FF
var
xslp =
new
XSLTProcessor();
xslp.importStylesheet(xslDoc);
return
xslp.transformToFragment(xmlDoc,document);
}
else
if
(window.ActiveXObject){
// IE
return
xmlDoc.transformNode(xslDoc);
}
}
var
xmlStr =
[
'
,
'
,
'
,
'
,
'
,
'
,
''
].join(
''
);
var
xslStr =
[
''
,
'
,
'
,
'
,
''
,
''
,
'
,
'
,
'
,
'
,
'
,
''
,
''
,
''
,
''
,
''
].join(
''
);
var
xmlDoc = createXMLDoc(xmlStr);
var
xslDoc = createXMLDoc(xslStr);
var
dom = transform(xmlDoc, xslDoc);
console.log(dom.childNodes[0].outerHTML);
|
Java对XML的支持被称为JAXP(Java API for XML Processing). JAXP被当做标准,放入了J2SE1.4.从此以后,JRE自带XML的处理类库. 当然,JAXP允许使用第三方的XML Parser,不同的parser有着不同的优缺点,用户可以自己选择. 但所有的Parser均必须实现JAXP所约定的Interface. 掌握JAXP,需要知道以下内容. 这些都会在后面进行描述.
每个接口与类的使用方法就不使用文字描述了,后面会用代码和注释的方式一一介绍JAXP的类库. 在描述SAX,StAX,DOM等方法之前,有必要做一个highlevel的比较. 每一个解析方法的优缺点是什么?改如何选择它们.
首先,XML解析器存在SAX, StAX和DOM, 而XML文件生成方法又有StAX和DOM. XPath是一个查询DOM的工具. XSLT是转换XML格式的工具. 如下图所示:
XML的解析从数据结构上来讲,分两大类: Streaming和Tree. Streaming又分为SAX和StAX. Tree就是DOM. SAX和StAX均是顺序解析XML,并生成读取事件.我们可以通过监听事件来得到我们想要的内容. DOM是一次性的以tree结构形式载入内存.
Streaming VS DOM
Pull VS Push
SAX | StAX | DOM | |
API Type | Push, Streaming | Pull, Streaming | Tree, In momery |
Support XPath? | No | No | Yes |
Read XML | Yes | Yes | Yes |
Write XML | No | Yes | Yes |
CRUD | No | No | Yes |
Parsing Validation (DTD, XSD) |
Yes | Optional (JDK embedded |
Yes |
SAXParser是调用XMLReader的, 如果使用SAXParser,则需要传参DefaultHandler. DefaultHandler实现了上图的4个Handler接口. 你也可以直接使用XMLReader,然后调用它的parser方法.只是在parser前,需set每个Handler. SAXParser是Event-Driven设计模式, 随着读取XML的字节,随着传递event给handler来处理.
读的工作其实是有XMLReader来做的,所有的events也是XMLReader产生的.所以,将一个非XML格式的文件模拟成一个XML,只需要复写XMLReader,读取非XML文件时,发出假的Event,这样handler将会把这个文件当做一个XML来处理. 这种机制会在XSLT中用到.
关于模拟XML
SAX可以将一个非XML格式文件的读取模拟成一个XML的文件的读取.通过构造XML的读取Event. 只是SAX需要复写XMLReader.
用于处理XML的各种数据类型的读取事件.这里面的事件有
用于处理XML解析阶段所发生的警告和错误.里面有三个方法,warning(), error()和fatalError(). waring和error用于处理XML的validation(DTD或XSD)错误.这种错误并不影响XML的解析,你可以把这种错误产生的exception压下来,而不向上抛.这样XML的解析不会被终断. fatalError是XML结构错误,这种错误无法被压制,即使我的handler不抛,Parser会向外抛exception.
DTD定义中存在ENTITY和NOTATION.这都属于用户自定义属性. XML Parser无法理解用户自定义的ENTITY或者NOTATION, 于是它把这方面的验证工作交给了DTDHandler. DTDHandler里面只有2个方法:notationDecl和unparsedEntityDecl. 我们实现这两个方法来验证我们的NOTATION部分是否正确.
在XML的验证段落里面提到过DTD的定位. EntityResolver可以帮助我们做这件事情. EntityResolver里面只有一个方法,叫做ResolveEntity(publicId, systemId). 每当Parser需要使用external文件的时候,就会调用这个方法. 我们可以在这个方法里面做一些预处理. 代码如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
|
public
class
MyEntityResolver
implements
EntityResolver {
@Override
public
InputSource resolveEntity(String publicId, String systemId)
throws
SAXException, IOException {
if
(
"bookStore.dtd"
.equals(publicId)) {
InputStream in =
this
.getClass().getResourceAsStream(
"/jaxp/resources/bookStore.dtd"
);
InputSource is =
new
InputSource(in);
return
is;
}
return
null
;
}
}
|
请注意里面是如何开启validation模式的. XSD有两种开启方法.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
|
public
class
MySAX {
private
SAXParser parser;
public
static
void
main(String[] args)
throws
Exception {
new
MySAX();
}
public
MySAX()
throws
ParserConfigurationException, SAXException, IOException {
// Use "javax.xml.parsers.SAXParserFactory" system property to specify a Parser.
// java -Djavax.xml.parsers.SAXParserFactory=yourFactoryHere [...]
// If property is not specified, use J2SE default Parser.
// The default Parser is "com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl".
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setNamespaceAware(
true
);
// Use XSD defined by JAXP 1.3, JAVA1.5
//SchemaFactory sf = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
//spf.setSchema(sf.newSchema(this.getClass().getResource("/jaxp/resources/bookStore.xsd")));
// or Use old way defined by JAXP 1.2
// parser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage","http://www.w3.org/2001/XMLSchema");
// parser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaSource", new File("schema.xsd"));
// XSD disabled, use DTD. spf.setValidating(true); this.parser = spf.newSAXParser();
// You can directly use SAXParser to parse XML. Or use XMLReader.
// SAXParser warps and use XMLReader internally.
// I will use XMLReader here.
//this.parser.parse(InputStrean, DefaultHandler);
XMLReader reader =
this
.parser.getXMLReader();
reader.setContentHandler(
new
MyContentHandler());
reader.setDTDHandler(
new
MyDTDHandler());
reader.setErrorHandler(
new
MyErrorHandler());
reader.setEntityResolver(
new
MyEntityResolver());
InputStream in =
this
.getClass().getResourceAsStream(
"/jaxp/resources/bookStore.xml"
);
InputSource is =
new
InputSource(in);
is.setEncoding(
"UTF-8"
);
reader.parse(is);
}
}
|
JAVA对XML的解析标准存在DOM, JDOM, DOM4J. 有人认为JDOM和DOM4J都是DOM的另一种实现方法,这是错误的.
得到DOM数据模型以后,可以使用DOM的遍历方法来寻找元素,也可以使用XPATH来查找指定元素,XPath的重点注意事项是NamespaceContext. 接下来是DOM的code实例.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
|
public
class
MyDOM {
public
static
void
main(String[] args)
throws
Exception {
new
MyDOM();
}
public
MyDOM()
throws
Exception {
// Use "javax.xml.parsers.DocumentBuilderFactory" system property to specify a Parser.
// java -Djavax.xml.parsers.DocumentBuilderFactory=yourFactoryHere [...]
// If property is not specified, use J2SE default Parser.
// The default Parser is "com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl".
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setIgnoringComments(
false
);
dbf.setNamespaceAware(
true
);
dbf.setIgnoringElementContentWhitespace(
true
);
// Use XSD defined by JAXP 1.3, JAVA1.5
// SchemaFactory sf = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
// dbf.setSchema(sf.newSchema(this.getClass().getResource("/jaxp/resources/bookStore.xsd")));
// or Use old way defined by JAXP 1.2
// dbf.setAttribute("http://java.sun.com/xml/jaxp/properties/schemaLanguage","http://www.w3.org/2001/XMLSchema");
// dbf.setAttribute("http://java.sun.com/xml/jaxp/properties/schemaSource", new File("schema.xsd"));
// dbf.setSchema(schema);
// XSD disabled, use DTD.
dbf.setValidating(
true
);
DocumentBuilder db = dbf.newDocumentBuilder();
db.setErrorHandler(
new
MyErrorHandler());
db.setEntityResolver(
new
MyEntityResolver());
Document document = db.parse(
this
.getClass().getResourceAsStream(
"/jaxp/resources/bookStore.xml"
));
// Operate on Document according to DOM module.
NodeList list = document.getElementsByTagNameNS(
"http://joey.org/bookStore"
,
"book"
);
System.out.println(list.item(
2
).getAttributes().item(
0
).getLocalName());
// Node that if you don't specify name space, you need to use Qualified Name.
System.out.println(document.getElementsByTagName(
"audlt:age"
).item(
0
).getTextContent());
// Use xpath to query xml
XPathFactory xpf = XPathFactory.newInstance();
XPath xp = xpf.newXPath();
// Need to set a namespace context.
NamespaceContext nc =
new
NamespaceContext() {
@Override
public
String getNamespaceURI(String prefix) {
if
(prefix.equals(
"b"
))
return
"http://joey.org/bookStore"
;
if
(prefix.equals(
"a"
))
return
"http://japan.org/book/audlt"
;
return
null
;
}
@Override
public
String getPrefix(String namespaceURI) {
if
(namespaceURI.equals(
"http://joey.org/bookStore"
))
return
"b"
;
if
(namespaceURI.equals(
"http://japan.org/book/audlt"
))
return
"a"
;
return
null
;
}
@Override
public
Iterator getPrefixes(String namespaceURI) {
return
null
;
}
};
xp.setNamespaceContext(nc);
System.out.println(xp.evaluate(
"/b:bookStore/@name"
, document));
System.out.println(xp.evaluate(
"/b:bookStore/b:books/b:book[@id=3]/@a:color"
, document));
}
}
|
StAX和SAX比较,代码简单,且可以写XML. 但StAX规范对于解析时的validation不是强制的.所以,JDK自带StAX解析器就不支持Parsing Validation.
StAX存在两种API, Cursor API(XMLStreamReader, XMLStreamWriter)和Iterator API(XMLEventReader, XMLEventWriter). Cursor API就是一个像游标一样的读或者写API. 我们得不停的调用XML writer和XML reader来读写XML每一个字段,这是的代码逻辑层和XML解析层交叉在一起,很混乱. Iterator API将逻辑层和XML解析层分离,对Event进行封装,所有的数据都封装在Event中,逻辑层和解析层靠Event实体来打交道,实现了松耦合. 这是我的理解:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
|
public
class
MyStAX {
public
static
void
main(String[] args)
throws
Exception {
coursorAPIReadWrite();
eventAPIReadWrite();
}
// use cursor API to read and write XML.
public
static
void
coursorAPIReadWrite()
throws
Exception {
XMLInputFactory xif = XMLInputFactory.newInstance();
// Set properties for validation, namespace...
// But, JDK embeded StAX parser does not support validation.
//xif.setProperty(XMLInputFactory.IS_VALIDATING, true);
xif.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE,
true
);
// Handle the external Entity.
xif.setXMLResolver(
new
XMLResolver() {
public
Object resolveEntity(String publicID, String systemID,
String baseURI, String namespace)
throws
XMLStreamException {
if
(publicID.equals(
"bookStore.dtd"
)) {
return
Class.
class
.getResourceAsStream(
"/jaxp/resources/bookStore.dtd"
);
}
return
null
;
}
});
XMLOutputFactory xof = XMLOutputFactory.newInstance();
// Set namespace repairable. Sometimes it will bring you bug. Use it carefully.
// xof.setProperty(XMLOutputFactory.IS_REPAIRING_NAMESPACES, true);
InputStream sourceIn = Class.
class
.getResourceAsStream(
"/jaxp/resources/bookStore.xml"
);
OutputStream targetOut = System.out;
//new FileOutputStream(new File("target.xml"));
XMLStreamReader reader = xif.createXMLStreamReader(sourceIn);
XMLStreamWriter writer = xof.createXMLStreamWriter(targetOut, reader.getEncoding());
writer.writeStartDocument(reader.getEncoding(), reader.getVersion());
while
(reader.hasNext()) {
int
event = reader.next();
switch
(event) {
case
XMLStreamConstants.DTD:
out(reader.getText());
writer.writeCharacters(
"\n"
);
writer.writeDTD(reader.getText());
writer.writeCharacters(
"\n"
);
break
;
case
XMLStreamConstants.PROCESSING_INSTRUCTION:
out(reader.getPITarget());
writer.writeCharacters(
"\n"
);
writer.writeProcessingInstruction(reader.getPITarget(), reader.getPIData());
break
;
case
XMLStreamConstants.START_ELEMENT:
out(reader.getName());
NamespaceContext nc = reader.getNamespaceContext();
writer.setNamespaceContext(reader.getNamespaceContext());
writer.setDefaultNamespace(nc.getNamespaceURI(
""
));
writer.writeStartElement(reader.getPrefix(), reader.getLocalName(), reader.getNamespaceURI());
for
(
int
i=
0
; i
QName qname = reader.getAttributeName(i);
String name=qname.getLocalPart();
if
(qname.getPrefix()!=
null
&& !qname.getPrefix().equals(
""
)) {
//name = qname.getPrefix()+":"+name;
}
writer.writeAttribute(name, reader.getAttributeValue(i));
}
for
(
int
i=
0
; i
writer.writeNamespace(reader.getNamespacePrefix(i), reader.getNamespaceURI(i));
}
break
;
case
XMLStreamConstants.ATTRIBUTE:
out(reader.getText());
break
;
case
XMLStreamConstants.SPACE:
out(
"SPACE"
);
writer.writeCharacters(
"\n"
);
break
;
case
XMLStreamConstants.CHARACTERS:
out(reader.getText());
writer.writeCharacters(reader.getText());
break
;
case
XMLStreamConstants.END_ELEMENT:
out(reader.getName());
writer.writeEndElement();
break
;
case
XMLStreamConstants.END_DOCUMENT:
writer.writeEndDocument();
break
;
default
:
out(
"other"
);
break
;
}
}
writer.close();
reader.close();
}
public
static
void
eventAPIReadWrite()
throws
Exception {
XMLInputFactory xif = XMLInputFactory.newInstance();
xif.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE,
true
);
// Handle the external Entity.
xif.setXMLResolver(
new
XMLResolver() {
public
Object resolveEntity(String publicID, String systemID,
String baseURI, String namespace)
throws
XMLStreamException {
if
(publicID.equals(
"bookStore.dtd"
)) {
return
Class.
class
.getResourceAsStream(
"/jaxp/resources/bookStore.dtd"
);
}
return
null
;
}
});
XMLOutputFactory xof = XMLOutputFactory.newInstance();
InputStream sourceIn = Class.
class
.getResourceAsStream(
"/jaxp/resources/bookStore.xml"
);
OutputStream targetOut = System.out;
XMLEventReader reader = xif.createXMLEventReader(sourceIn);
XMLEventWriter writer = xof.createXMLEventWriter(targetOut);
while
(reader.hasNext()) {
XMLEvent event = reader.nextEvent();
out(event.getEventType());
writer.add(event);
}
reader.close();
writer.close();
}
public
static
void
out(Object o) {
System.out.println(o);
}
}
|
上面了解了SAX,DOM和STAX,它们均为XML解析方法. 其中SAX只适合解析读取. DOM则是XML内存中的数据展现. STAX可以解析,也可以写出到文件系统.
如果将DOM从内存输出XML文件. 如果需要将一个XML文件转换成一个HTML或任意其他格式文件,则需要JAXP的XSLT特性. 这里的转换包括:
XSLT的下面包含了4个包:
从上面可以看出,JAXP可以进行4*4=16种转换方式.(sax, sax), (sax, dom), (sax, stream)...
再高级一点,利用SAXSouce----->DOMResult的转化功能, 和SAX模拟XML读取功能, XSLT可以将一个非XML格式的文件,转换成一个DOM. 下面的代码将包含此例. 代码中还包含另外一个例子,就是把XML按照XSL的格式转换成HTML.
注意, XSLT处理DTD有技巧:
在xml2html的转换中, 使用StreamSource在代码的书写上是最简单的, 但为什么使用了SAXSource? 那是因为要转换的XML中引用了DTD, StreamSource无法处理外部引用, 会导致Transformer抛TransformerException. 失败的异常内容为DTD文件找不到. 那么,在这种情况下,我们只能使用SAXSource,并给它赋予一个可以解析外部DTD引用的XMLReader. 终于成功了.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
|
public
class
MyXSLT {
TransformerFactory tff;
public
static
void
main(String[] args)
throws
Exception {
MyXSLT xslt =
new
MyXSLT();
xslt.xml2html();
xslt.str2xml();
}
public
MyXSLT() {
tff = TransformerFactory.newInstance();
}
public
void
xml2html()
throws
Exception {
Transformer tr = tff.newTransformer(
new
SAXSource(
new
InputSource(
this
.getClass().getResourceAsStream(
"/jaxp/resources/bookStore.xsl"
))));
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser parser = spf.newSAXParser();
parser.getXMLReader().setEntityResolver(
new
EntityResolver() {
@Override
public
InputSource resolveEntity(String publicId, String systemId)
throws
SAXException, IOException {
if
(
"bookStore.dtd"
.equals(publicId)) {
InputStream in =
this
.getClass().getResourceAsStream(
"/jaxp/resources/bookStore.dtd"
);
InputSource is =
new
InputSource(in);
return
is;
}
return
null
;
}
});
Source source =
new
SAXSource(parser.getXMLReader(),
new
InputSource(
this
.getClass().getResourceAsStream(
"/jaxp/resources/bookStore.xml"
)));
Result target =
new
StreamResult(System.out);
tr.transform(source, target);
}
// "[joey,bill,cat]" will be transformed to
//
public
void
str2xml()
throws
Exception {
final
String[] names =
new
String[]{
"joey"
,
"bill"
,
"cat"
};
Transformer tr = tff.newTransformer();
Source source =
new
SAXSource(
new
XMLReader() {
private
ContentHandler handler;
@Override
public
void
parse(InputSource input)
throws
IOException,
SAXException {
handler.startDocument();
handler.startElement(
""
,
"test"
,
"test"
,
null
);
for
(
int
i=
0
; i
handler.startElement(
""
,
"name"
,
"name"
,
null
);
handler.characters(names[i].toCharArray(),
0
, names[i].length());
handler.endElement(
""
,
"name"
,
"name"
);
}
handler.endElement(
""
,
"test"
,
"test"
);
handler.endDocument();
}
@Override
public
void
parse(String systemId)
throws
IOException, SAXException {
}
@Override
public
boolean
getFeature(String name)
throws
SAXNotRecognizedException, SAXNotSupportedException {
return
false
;
}
@Override
public
void
setFeature(String name,
boolean
value)
throws
SAXNotRecognizedException, SAXNotSupportedException {
}
@Override
public
Object getProperty(String name)
throws
SAXNotRecognizedException, SAXNotSupportedException {
return
null
;
}
@Override
public
void
setProperty(String name, Object value)
throws
SAXNotRecognizedException, SAXNotSupportedException {
}
@Override
public
void
setEntityResolver(EntityResolver resolver) {
}
@Override
public
EntityResolver getEntityResolver() {
return
null
;
}
@Override
public
void
setDTDHandler(DTDHandler handler) {
}
@Override
public
DTDHandler getDTDHandler() {
return
null
;
}
@Override
public
void
setContentHandler(ContentHandler handler) {
this
.handler = handler;
}
@Override
public
ContentHandler getContentHandler() {
return
handler;
}
@Override
public
void
setErrorHandler(ErrorHandler handler) {
}
@Override
public
ErrorHandler getErrorHandler() {
return
null
;
}
},
new
InputSource());
Result target =
new
StreamResult(System.out);
tr.transform(source, target);
}
}
|