这一篇文章接着前一篇来接续讲解如何使用Dom方式操作XML数据,这一篇文章主要介绍如何解析(parse)XML文件,本文实例XML文件是上一篇的生成的文件,我们看看能不能完整的读出来,这个XML文件内容如下:
XML/HTML代码
- <?xml version="1.0" encoding="utf-8"?>
- <book_store name="new hua" website="http://www.ourunix.org">
- <book>
- <name>Hamlet</name>
- <author>William Shakespeare</author>
- <price>$20</price>
- <grade>good</grade>
- </book>
- <book>
- <name>shuihu</name>
- <author>naian shi</author>
- <price>$200</price>
- <grade>good</grade>
- </book>
- </book_store>
主要方法
1、加载读取XML文件
Python代码
- minidom.parse(filename)
2、获取XML文档对象
Python代码
- doc.documentElement
3、 获取XML节点属性值
Python代码
- node.getAttribute(AttributeName)
4、获取XML节点对象集合
Python代码
- node.getElementsByTagName(TagName)
5、 获取XML节点值
Python代码
- node.childNodes[index].nodeValue
代码演示
同样先用一个简单版本来演示下如何使用Dom解析XML文件,代码如下:
Python代码
- ''
-
-
-
-
-
-
- import xml.dom.minidom as Dom
- import sys
-
- if __name__ == "__main__":
- try:
- xml_file = Dom.parse("./book_store.xml")
- except Exception, e:
- print e
- sys.exit()
- node_root = xml_file.documentElement
- name = node_root.getAttribute("name")
- website = node_root.getAttribute("website")
- print "name of book store: %s\nwebsite of book store: %s" %(name, website)
-
- node_book_list = node_root.getElementsByTagName("book")
- for book_node in node_book_list:
- book_name_node = book_node.getElementsByTagName("name")[0]
- book_name_value = book_name_node.childNodes[0].data
-
- book_author_node = book_node.getElementsByTagName("author")[0]
- book_author_value = book_author_node.childNodes[0].data
-
- book_price_node = book_node.getElementsByTagName("price")[0]
- book_price_value = book_price_node.childNodes[0].data
-
- book_grade_node = book_node.getElementsByTagName("grade")[0]
- book_grade_value = book_grade_node.childNodes[0].data
-
- print "book: %s\t author: %s\t price: %s\t grade: %s\t" %(book_name_value, book_author_value, book_price_value, book_grade_value)
运行结果如下:
name of book store: new hua
website of book store: http://www.ourunix.org
book: Hamlet author: William Shakespeare price: $20 grade: good
book: shuihu author: naian shi price: $200 grade: good
同样接着来一个所谓的高级版本:
XML/HTML代码
- '''
- Created on 2012-8-28
-
- @author: walfred
- @module: domxml.XMLParser
- @description:
- '''
-
- import xml.dom.minidom as Dom
- import sys
-
- class XMLParser:
- def __init__(self, xml_file_path):
- try:
- self.xml = Dom.parse(xml_file_path)
- except:
- sys.exit()
- self.book_list = list()
-
- def getNodeName(self, prev_node, node_name):
- return prev_node.getElementsByTagName(node_name)
-
- def getNodeAttr(self, node, att_name):
- return node.getAttribute(att_name)
-
- def getNodeValue(self, node):
- return node.childNodes[0].data.encode("utf-8")
-
- def parse(self):
- node_root = self.xml.documentElement
- print "store: %s, website: %s" %(self.getNodeAttr(node_root, "name"), \
- self.getNodeAttr(node_root, "website"))
-
- node_book_list = self.getNodeName(node_root, "book")
-
- for node_book in node_book_list:
- book_info = dict()
- node_book_name = self.getNodeName(node_book, "name")[0]
- book_name_value = self.getNodeValue(node_book_name)
- book_info["name"] = book_name_value
-
- node_book_author = self.getNodeName(node_book, "author")[0]
- book_author_value = self.getNodeValue(node_book_author)
- book_info["author"] = book_author_value
-
- node_book_price = self.getNodeName(node_book, "price")[0]
- book_price_value = self.getNodeValue(node_book_price)
- book_info["price"] = book_price_value
-
- node_book_grade = self.getNodeName(node_book, "grade")[0]
- book_garde_value = self.getNodeValue(node_book_grade)
- book_info["grade"] = book_garde_value
-
- self.book_list.append(book_info)
-
- def getBookList(self):
- return self.book_list
-
- if __name__ == "__main__":
- myXMLParser = XMLParser("book_store.xml")
- myXMLParser.parse()
- print myXMLParser.getBookList()
-
完
声明: 本文采用 BY-NC-SA 协议进行授权. 转载请注明转自: Python:Dom解析XML文件(读XML)