用python来解析xml文件(简单情况)

首先,sax解析最直观,当然,也可以容许xml文件出些错。

先给定一个xml文件book.xml,

< catalog >
< book isbn ="0-596-00128-2" >
< title > Python &amp; XML </ title >
< author > Jones,Drake </ author >
</ book >
< book isbn ="0-596-00085-5" >
< title > ProgrammingPython </ title >
< author > Lutz </ author >
</ book >
< book isbn ="0-596-00281-5" >
< title > LearningPython </ title >
< author > Lutz,Ascher </ author >
</ book >
< book isbn ="0-596-00797-3" >
< title > PythonCookbook </ title >
< author > Martelli,Ravenscroft,Ascher </ author >
</ book >
<!-- imaginemoreentrieshere -->
</ catalog >

写一个BookHandler, 如下:

# -*-coding:utf-8-*-

import xml.sax.handler

class BookHandler(xml.sax.handler.ContentHandler):
def __init__ (self):
self.inTitle
= 0 # handleXMLparserevents
self.mapping = {} # astatemachinemodel

def startElement(self,name,attributes):
if name == " book " : # onstartbooktag
self.buffer = "" # saveISBNfordictkey
self.isbn = attributes[ " isbn " ]
elif name == " title " : # onstarttitletag
self.inTitle = 1 # savetitletexttofollow

def characters(self,data):
if self.inTitle: # ontextwithintag
self.buffer += data # savetextifintitle

def endElement(self,name):
if name == " title " :
self.inTitle
= 0 # onendtitletag
self.mapping[self.isbn] = self.buffer # storetitletextindict

import xml.sax
import pprint

parser
= xml.sax.make_parser()
handler
= BookHandler()
parser.setContentHandler(handler)
parser.parse(
' book.xml ' )

pprint.pprint(handler.mapping)

结果如下:

Process started >>>
{u'0-596-00085-5': u'Programming Python',
u'0-596-00128-2': u'Python & XML',
u'0-596-00281-5': u'Learning Python',
u'0-596-00797-3': u'Python Cookbook'}<<< Process finished.
================ READY ================

不过,这是比较简单的情况了。而且我们可以看到,结果全是以unicode串输出的。


<script type="text/javascript"><!-- google_ad_client = "ca-pub-7104628658411459"; /* wide1 */ google_ad_slot = "8564482570"; google_ad_width = 728; google_ad_height = 90; //--></script><script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"></script>

你可能感兴趣的:(python)