文章转载:https://mediumcn.com/python3/how-to-read-xml
Python自带xml库,可以通过xml.dom读取xml文件。比如有如下xml文件
mediumcn ltd
Ben
30,000
Jim
30,000
Alen
40,000
from xml.dom import minidom
doc = minidom.parse("assets/test.xml")
name = doc.getElementsByTagName("name")[0]
print(name.firstChild.data)
staffs = doc.getElementsByTagName("staff")
for staff in staffs:
sid = staff.getAttribute("id")
nickname = staff.getElementsByTagName("nickname")[0]
salary = staff.getElementsByTagName("salary")[0]
print("id:%s, nickname:%s, salary:%s" %
(sid, nickname.firstChild.data, salary.firstChild.data))
输出:
mediumcn ltd
id:1001, nickname:Ben, salary:30,000
id:1002, nickname:Jim, salary:30,000
id:1003, nickname:Alen, salary:40,000
这种方法不太严谨,没有判断节点是否为叶子节点,就打印了data。更严谨的做法如下:
def getNodeText(node):
nodelist = node.childNodes
result = []
for node in nodelist:
if node.nodeType == node.TEXT_NODE:
result.append(node.data)
return ''.join(result)
name = doc.getElementsByTagName("name")[0]
print("Node Name : %s" % name.nodeName)
print("Node Value : %s \n" % getNodeText(name))
staffs = doc.getElementsByTagName("staff")
for staff in staffs:
sid = staff.getAttribute("id")
nickname = staff.getElementsByTagName("nickname")[0]
salary = staff.getElementsByTagName("salary")[0]
print("id:%s, nickname:%s, salary:%s" %
(sid, getNodeText(nickname), getNodeText(salary)))
通过判断if node.nodeType == node.TEXT_NODE:
来准确识别xml叶子节点,这样可以避免数据错误。