python学习之 -- xml.etree.ElementTree解析xml

Python -- xml.etree.ElementTree学习


ElementTree的xml是一个轻量级的DOM解析,有解析速度快,消耗内存小等优点


ElementTree中心就是Element类,它是设计用来存储分级tag标签的数据结构;
----------------------------------------------------------------------------------------------------------------------------
1. 先谈谈解析对象,xml的结构:
a. tag标签  string类型
b. attributes 标签属性  字典类型数据
c. text  标签的值value
d. 子标签 child element


创建element实例,可以使用构造函数和SubElement;ElementTree结构可以包含许多Element,并且可以转换成xml,也可以从xml解析而来


ElementTree represents the whole XML document as a tree, and Element represents a single node in this tree.


纯手工创建一个xml文件:
  a = ET.Element('a')
		  b = ET.SubElement(a, 'b')
		  c = ET.SubElement(a, 'c')
		  d = ET.SubElement(c, 'd')
		  ET.dump(a)
		


----------------------------------------------------------------------------------------------------------------------------
2. 解析xml的步骤:
以以下country_xml为例:

			
				
					1
					2008
					141100
					
					
				
				
					4
					2011
					59900
					
				
				
					68
					2011
					13600
					
					
				
			


------------------------------------------------------------------------------------------------------------------------
import xml.etree.ElementTree as ET 
1. 导入xml数据 ---------- 直接从xml文件导入:
ElementTree = ET.parse("country.xml") #整个xml树状结构
Element = ElementTree.getroot() #获取root节点 ElementTree
  
  导入xml数据 ---------- 从一个xml字符串导入,并得首节点:
   Element_root = ET.fromstring(count_as_string)
------------------------------------------------------------------------------------------------------------------------
2. 查找数据
查找数据的方法有Element.iter('text') .findall('text') find('text')
iter(): 递归的查找,会查找当前节点,它的子节点。子节点......
findall(): 只会查找当前节点的子节点那一级目录
find():只是查找第一个,查找到后,可以用get('attribute_name')获取属性的值

example:
			#!bin/bash


			__author__ = 'JackZhous'


			import logging
			import xml.etree.ElementTree as ET
			import sys




			def script(xml_path, mode):
				tree = ET.parse(xml_path)
				node_root = tree.getroot()
				iter_mode = '1'
				if iter_mode == mode:
					for node in node_root.iter('country'):
						name = node.get('name')
						year = node.find('year').text
						print ('name = ' , name, 'year = ' , year)
				else:
					for node in node_root.findall('country'):
						name = node.get('name')
						year = node.find('year').text
						print ('name = ' + name, 'year = ' + year)


			if __name__ == '__main__':
				print ("脚本名:", sys.argv[0])
				print ("参数1:" , sys.argv[1])
				print ('参数2:' , sys.argv[2])
				script(sys.argv[1],sys.argv[2])





------------------------------------------------------------------------------------------------------------------------
3. 修改xml数据
根据上一步骤,查找到你感兴趣的数据后,可以使用修改节点属性值(element.text)或者增加/改变属性值set('attributes','values')或者删除某一个节点(remove(element)),最后一步直接输出到文件ElementTree.write('country.xml')
if(name == 'Jackzhous'):
                node1.remove(node)
------------------------------------------------------------------------------------------------------------------------
4. 解决有名字空间namespace的xml问题,例如android的manifest里面有xmlns:android="http://schemas.android.com/apk/res/android"
命名空间里面装着很多标签名,防止这些
用字典或者字符串类型数据替换,如上dictionary = {'android':'http://schemas.android.com/apk/res/android'},或者 android_name = 'http://schemas.android.com/apk/res/android'
查找的时候前者用find('android:name',dictionary)  后者直接find(android_name:)

用命令空间进行查找时,需要特殊标识,如下:
android_name = 'http://schemas.android.com/apk/res/android'
查找该名字空间下name="a.b.activity",则用:
tree.find("./application/Activity[@{"+android_name+"}name='" + "a.b.activity']")这就可以找到
  .代表当前节点  application/activity依次在这两个节点下[]这个符号里面表示查找的特性
  
  以上表达式不明白请看:
  tag Selects all child elements with the given tag. For example, spam selects all child elements named spam, and spam/egg selects all grandchildren named egg in all children named spam.
* Selects all child elements. For example, */egg selects all grandchildren named egg.
. Selects the current node. This is mostly useful at the beginning of the path, to indicate that it’s a relative path.
// Selects all subelements, on all levels beneath the current element. For example, .//egg selects all egg elements in the entire tree.
.. Selects the parent element.
[@attrib] Selects all elements that have the given attribute.
[@attrib='value'] Selects all elements for which the given attribute has the given value. The value cannot contain quotes.
[tag] Selects all elements that have a child named tag. Only immediate children are supported.
[tag='text'] Selects all elements that have a child named tag whose complete text content, including descendants, equals the given text.
[position] Selects all elements that are located at the given position. The position can be either an integer (1 is the first position), the expression last() (for the last position), or a position relative to the last position (e.g. last()-1).


for循环语法,以android的manifest文件为例:

查找主activity名字

ET.register_namespace('android',android)
	tree = ET.parse(path)
	root = tree.find('application')
	for activity in root.findall('activity'):
		target = activity.find("./intent-filter/action[@{"+ android + "}name='" + "android.intent.action.MAIN']")
		if target is None:
			print('node has no intent-filter')
			continue
		main_activity = activity.get("{%s}name" % android)
		print('got the main activity ' + main_activity)
		break


备注:详情请访问:https://docs.python.org/2/library/xml.etree.elementtree.html?highlight=elementtree

你可能感兴趣的:(python)