xml.etree.ElementTree 模块实现了一个简单而高效的API解析和创建xml数据,该模块对恶意构造的数据是不安全的,如果需要解析不受信任或未经身份验证的数据,请参考xml漏洞
import xml.etree.ElementTree as ET
tree = ET.parse(xml_path)
root = tree.getroot()
root
# result
<Element 'data' at 0x000002248A147A98>
生成的root有一个tag和字典属性
print(root.tag)
print(root.attrib)
# result
data
{}
通过迭代来查看root的子节点的tag和attrib
for child in root:
print(child.tag)
print(child.attrib)
# result
country
{'name': 'Liechtenstein'}
country
{'name': 'Singapore'}
country
{'name': 'Panama'}
子节点是嵌套的,可以通过索引来访问特定的节点
root[1][2].text
# result
'59900'
Element 中的一些方法,可以递归的遍历它下面的所有子树,如Element.iter()
for neighbor in root.iter('neighbor'):
print(neighbor.attrib)
# result
{'name': 'Austria', 'direction': 'E'}
{'name': 'Switzerland', 'direction': 'W'}
{'name': 'Malaysia', 'direction': 'N'}
{'name': 'Costa Rica', 'direction': 'W'}
{'name': 'Colombia', 'direction': 'E'}
1.Element.findall(): 找到带有标签的元素,该标签是当前元素的直接子元素。
2.Element.find() :找到第一个带有特定标签的子元素。
3. Element.text:访问标签的内容
4. Element.get():访问标签的属性值
for country in root.findall('country'):
rank = country.find('rank').text
name = country.get('name')
print('rank:', rank)
print('name:', name)
print('\n')
# result
rank: 1
name: Liechtenstein
rank: 4
name: Singapore
rank: 68
name: Panama
ElementTree提供了一种构建XML文档并将xml写入文件的简单方法。
1.ElementTree.write() :创建xml文件或向xml中写入数据。
2.Element.set():添加和修改标签的属性和属性值。
3.Element.append():添加子节点
假设将每个国家的排名加1,并添加updated 属性:
for rank in root.iter('rank'):
new_rank = int(rank.text) + 1
rank.text = str(new_rank)
rank.set('updated', 'yes')
tree.write('output.xml')
输出结果显示如下:
4、Element.remove()移除元素,
移除排名大于50的国家
for country in root.findall('country'):
rank = int(country.find('rank').text)
if rank > 50:
root.remove(country)
tree.write('output.xml')
输出结果显示如下:
5、创建xml文档
SubElement():用于创建新的子节点
a = ET.Element('a')
b = ET.SubElement(a, 'b')
c = ET.SubElement(a, 'c')
d = ET.SubElement(c, 'd')
ET.dump(a)
# result
<a><b /><c><d /></c></a>
6、解析xml空间
If the XML input has namespaces, tags and attributes with prefixes in the form prefix:sometag get expanded to {uri}sometag where the prefix is replaced by the full URI. Also, if there is a default namespace, that full URI gets prepended to all of the non-prefixed tags.
<?xml version="1.0"?>
<actors xmlns:fictional="http://characters.example.com"
xmlns="http://people.example.com">
<actor>
<name>John Cleese</name>
<fictional:character>Lancelot</fictional:character>
<fictional:character>Archie Leach</fictional:character>
</actor>
<actor>
<name>Eric Idle</name>
<fictional:character>Sir Robin</fictional:character>
<fictional:character>Gunther</fictional:character>
<fictional:character>Commander Clement</fictional:character>
</actor>
</actors>
搜索名称空间XML示例的更好方法是使用自己的前缀创建字典,并在搜索函数中使用这些前缀:
root = fromstring(xml_text)
for actor in root.findall('{http://people.example.com}actor'):
name = actor.find('{http://people.example.com}name')
print(name.text)
for char in actor.findall('{http://characters.example.com}character'):
print(' |-->', char.text)
ns = {'real_person': 'http://people.example.com',
'role': 'http://characters.example.com'}
for actor in root.findall('real_person:actor', ns):
name = actor.find('real_person:name', ns)
print(name.text)
for char in actor.findall('role:character', ns):
print(' |-->', char.text)
输出结果:
John Cleese
|--> Lancelot
|--> Archie Leach
Eric Idle
|--> Sir Robin
|--> Gunther
|--> Commander Clement