Python解析xml文件: ElementTree解析xml节点属性排序问题

Python ElementTree解析xml节点属性排序问题

    • 问题摘要
    • 软件环境说明
    • 节点属性自定义排序
    • 总结说明

问题摘要

在使用python中的xml.etree.ElementTree库解析xml文件时,节点的属性默认会被重新排序。虽然xml文件节点属性顺序并没有什么实际意义,但有时在文件的可读性上会大打折扣。以高通平台小基站射频校准配置文件为例,笔者希望该xml文件的格式如下所示,这样很容易找出Tx、Rx、Nl等等射频链路的校准配置。


<CalConfigDefinitions>
	<BoardConfigs>
		<BoardDef name="F02001-2" configs="F02001-2_LTE20" />
	BoardConfigs>
	<CalConfigs>
		<CalConfig name="F02001-2_LTE5" topology="407">
			<TxConfig txPathIdList="TX1,TX2" antennaNum="1,2" bwConst="LTE5" band="B41" fg="2" enableFb="false" txRefFreq="2590" listOfTxSweepFreqMhz="2498.5,2510.5,2520.5,2530.5,2540.5,2560.5,2580.5,2600.5,2620.5,2640.5,2660.5,2670.5,2680.5,2687.5" firstGainState="40" firstGainStatePowerLimitHigh="10" firstGainStatePowerLimitLow="-45" highestAllowedGainState="0" targetMaxPower="17" txDcLeakageLimit="-40" txIqImageLimit="-40" maxFreqSweepPeakToPeakDelta="10" rxFbDcLeakageLimitDbfs="-34" rxFbIqImageLimitDbc="-40" rxfbSignalDbfsMin="-20" rxfbSignalDbfsMax="0" txFreqSweepTargetReferencePowerDbm="17" txDigitalGainMaxDb="-3" txHighDcLeakageLimit="-43" txLowCutoffGainState="30" calDataAlsoAppliesTo="LTE10,LTE15,LTE20" txMonotonicCheckThresholdDb="-3" />
			<RxFb1AclrNoiseCal rxfbPathIdList="FB1,FB2" bwConst="LTE5" band="B41" fg="2" freq="2590" rxfbMaxGainState="16" />
			<RxConfig rxPathIdList="RX1,RX2" antennaNum="1,2" bwConst="LTE5" band="B41" fg="2" rxRefFreqMhz="2590" listOfRxSweepFreqMhz="2498.5,2510.5,2520.5,2530.5,2540.5,2560.5,2580.5,2600.5,2620.5,2640.5,2660.5,2670.5,2680.5,2687.5" rxGainStateList="0,1,2,3,4,5,6,7" rxSigGenPowersForGainState="-66,-59,-56,-51,-46,-41,-31,-21" rxDcLeakageLimitDbfs="-40" rxIqImageLimitDbc="-40" maxFreqSweepPeakToPeakDelta="10" rxSignalDbfsMin="-40" rxSignalDbfsMax="-20" calDataAlsoAppliesTo="LTE10,LTE15,LTE20" />
			<RxConfig rxPathIdList="NL1" antennaNum="3" bwConst="LTE20" band="B25" fg="3" rxRefFreqMhz="1960" listOfRxSweepFreqMhz="1930,1940,1945,1950,1955,1960,1965,1970,1975,1980,1985,1990,1995" rxGainStateList="0,1,2,3,4,5" rxSigGenPowersForGainState="-65,-57,-51,-40,-29,-18" rxDcLeakageLimitDbfs="-40" rxIqImageLimitDbc="-40" maxFreqSweepPeakToPeakDelta="10" rxSignalDbfsMin="-40" rxSignalDbfsMax="-20" />
		CalConfig>
	CalConfigs>
CalConfigDefinitions>

笔者想实现自动生成该xml文件,但是当使用Python xml.etree.ElementTree库解析xml文件时,发现写入的节点属性顺序如下所示(默认修改成以字母排序),这样显得很混乱不易查找。下面来解决节点属性顺序变化的问题,读者不需要纠结该xml文件的实际意义,本文仅以此为例。


<CalConfigDefinitions>
	<BoardConfigs>
		<BoardDef configs="F02001-2_LTE20" name="F02001-2" />
	BoardConfigs>
	<CalConfigs>
		<CalConfig name="F02001-2_LTE5" topology="407">
			<TxConfig antennaNum="1,2" band="B41" bwConst="LTE5" calDataAlsoAppliesTo="LTE10,LTE15,LTE20" enableFb="false" fg="2" firstGainState="40" firstGainStatePowerLimitHigh="10" firstGainStatePowerLimitLow="-45" highestAllowedGainState="0" listOfTxSweepFreqMhz="2498.5,2510.5,2520.5,2530.5,2540.5,2560.5,2580.5,2600.5,2620.5,2640.5,2660.5,2670.5,2680.5,2687.5" maxFreqSweepPeakToPeakDelta="10" rxFbDcLeakageLimitDbfs="-34" rxFbIqImageLimitDbc="-40" rxfbSignalDbfsMax="0" rxfbSignalDbfsMin="-20" targetMaxPower="17" txDcLeakageLimit="-40" txDigitalGainMaxDb="-3" txFreqSweepTargetReferencePowerDbm="17" txHighDcLeakageLimit="-43" txIqImageLimit="-40" txLowCutoffGainState="30" txMonotonicCheckThresholdDb="-3" txPathIdList="TX1,TX2" txRefFreq="2590" />
			<RxFb1AclrNoiseCal band="B41" bwConst="LTE5" fg="2" freq="2590" rxfbMaxGainState="16" rxfbPathIdList="FB1,FB2" />
			<RxConfig antennaNum="1,2" band="B41" bwConst="LTE5" calDataAlsoAppliesTo="LTE10,LTE15,LTE20" fg="2" listOfRxSweepFreqMhz="2498.5,2510.5,2520.5,2530.5,2540.5,2560.5,2580.5,2600.5,2620.5,2640.5,2660.5,2670.5,2680.5,2687.5" maxFreqSweepPeakToPeakDelta="10" rxDcLeakageLimitDbfs="-40" rxGainStateList="0,1,2,3,4,5,6,7" rxIqImageLimitDbc="-40" rxPathIdList="RX1,RX2" rxRefFreqMhz="2590" rxSigGenPowersForGainState="-66,-59,-56,-51,-46,-41,-31,-21" rxSignalDbfsMax="-20" rxSignalDbfsMin="-40" />
			<RxConfig antennaNum="3" band="B25" bwConst="LTE20" fg="3" listOfRxSweepFreqMhz="1930,1940,1945,1950,1955,1960,1965,1970,1975,1980,1985,1990,1995" maxFreqSweepPeakToPeakDelta="10" rxDcLeakageLimitDbfs="-40" rxGainStateList="0,1,2,3,4,5" rxIqImageLimitDbc="-40" rxPathIdList="NL1" rxRefFreqMhz="1960" rxSigGenPowersForGainState="-65,-57,-51,-40,-29,-18" rxSignalDbfsMax="-20" rxSignalDbfsMin="-40" />
		CalConfig>
	CalConfigs>
CalConfigDefinitions>

软件环境说明

Python版本: Python 3.7
Xml解析库: xml.etree.ElementTree
IDE: PyCharm

节点属性自定义排序

笔者使用ElementTree类中的 write 方法生成xml文件,生成的tree中的节点属性顺序是自定义的,当写入文件时顺序就会发生变化,于是怀疑文件写入时默认对属性进行了排序。

查找ElementTree.py源码,发现 _serialize_xml 方法中果然对属性进行了排序,如下代码块所示(实际下文中并非 _serialize_xml 源码,在源码基础上对xml文件进行了部分缩进处理,读者可忽略此部分处理,如果感兴趣可以查看总结说明章节参考笔者的另一篇关于xml缩进处理的博客)。注释位置通过sorted方法对节点属性进行了词汇排序,使自定义顺序被打乱。

只需要修改这一行代码,将
for k, v in sorted(items):
改成
for k, v in items:
即可实现节点属性顺序的自定义。

def _serialize_xml(write, elem, qnames, namespaces,
                   short_empty_elements, level=1, **kwargs):
    tag = elem.tag
    text = elem.text
    if tag is Comment:
        write("" % text)
    elif tag is ProcessingInstruction:
        write("" % text)
    else:
        tag = qnames[tag]
        if tag is None:
            if text:
                write(_escape_cdata(text))
            for e in elem:
                _serialize_xml(write, e, qnames, None,
                               short_empty_elements=short_empty_elements)
        else:
            write("<" + tag)
            items = list(elem.items())
            if items or namespaces:
                if namespaces:
                    for v, k in sorted(namespaces.items(),
                                       key=lambda x: x[1]):  		
                        if k:
                            k = ":" + k
                        write(" xmlns%s=\"%s\"" % (
                            k,
                            _escape_attrib(v)
                            ))
                for k, v in sorted(items):  		# ##### 将此行内容替换成  for k, v in items: 
                    if isinstance(k, QName):
                        k = k.text
                    if isinstance(v, QName):
                        v = qnames[v.text]
                    else:
                        v = _escape_attrib(v)
                    write(" %s=\"%s\"" % (qnames[k], v))
            if text or len(elem) or not short_empty_elements:
                write(">\n" + level * "\t")
                if text:
                    write(_escape_cdata(text))
                for e in elem:
                    _serialize_xml(write, e, qnames, None,
                                   short_empty_elements=short_empty_elements,level=level+1)
                write(" + tag + ">")
            else:
                write(" />")
    if elem.tail:
        write(_escape_cdata(elem.tail))

总结说明

  1. 文中引用的ElementTree.py中的代码并非源码,是添加了部分缩进处理之后的代码,源码生成的xml文件是没有任何缩进和换行处理的,读者大可不必纠结此部分差异。如果读者感兴趣,可以查看笔者的另一篇博客。
    Python解析xml文件: ElementTree解析xml换行和缩进美化问题

  2. 此文档仅用于Python学习记录和交流,如能帮助到读者倍感荣幸。

你可能感兴趣的:(Python,Python,解析xml,xml节点属性顺序)