python读取XML文件

文章转载:https://mediumcn.com/python3/how-to-read-xml

Python自带xml库,可以通过xml.dom读取xml文件。比如有如下xml文件



	mediumcn ltd
	
		Ben
		30,000
	
	
		Jim
		30,000
	
	
		Alen
		40,000
	
from xml.dom import minidom

doc = minidom.parse("assets/test.xml")

name = doc.getElementsByTagName("name")[0]
print(name.firstChild.data)

staffs = doc.getElementsByTagName("staff")
for staff in staffs:
        sid = staff.getAttribute("id")
        nickname = staff.getElementsByTagName("nickname")[0]
        salary = staff.getElementsByTagName("salary")[0]
        print("id:%s, nickname:%s, salary:%s" %
              (sid, nickname.firstChild.data, salary.firstChild.data))

输出: 

mediumcn ltd
id:1001, nickname:Ben, salary:30,000
id:1002, nickname:Jim, salary:30,000
id:1003, nickname:Alen, salary:40,000

这种方法不太严谨,没有判断节点是否为叶子节点,就打印了data。更严谨的做法如下:

def getNodeText(node):
    nodelist = node.childNodes
    result = []
    for node in nodelist:
        if node.nodeType == node.TEXT_NODE:
            result.append(node.data)
    return ''.join(result)

name = doc.getElementsByTagName("name")[0]
print("Node Name : %s" % name.nodeName)
print("Node Value : %s \n" % getNodeText(name))


staffs = doc.getElementsByTagName("staff")
for staff in staffs:
        sid = staff.getAttribute("id")
        nickname = staff.getElementsByTagName("nickname")[0]
        salary = staff.getElementsByTagName("salary")[0]
        print("id:%s, nickname:%s, salary:%s" %
              (sid, getNodeText(nickname), getNodeText(salary)))

通过判断if node.nodeType == node.TEXT_NODE:来准确识别xml叶子节点,这样可以避免数据错误。

 

 

 

你可能感兴趣的:(python)