数据样例:
<Result>
<weibo id="1">
<sentence id="1" opinionated="N">我是句子sentence>
<sentence id="2" opinionated="N">我是句子sentence>
<sentence id="3" opinionated="Y" polarity="NEG">我也是句子sentence>
weibo>
<weibo id="5">
<sentence id="1" opinionated="Y" polarity="NEG">句子句子sentence>
<sentence id="2" opinionated="N">依然是句子sentence>
<sentence id="3" opinionated="Y" polarity="POS">最后一个句子sentence>
weibo>
Result>
python代码:
import xml.etree.cElementTree as et # 读取xml文件的包
import pandas as pd
##### 读取xml文件,放到dataframe df_xml中
xml_tree = et.ElementTree(file='xxx.xml') # 文件路径
dfcols = ['sentence', 'opinionated', 'polarity']
df_xml = pd.DataFrame(columns=dfcols)
root = xml_tree.getroot();
for sub_node in root:
for node in sub_node:
#print(node, node.tag, node.attrib, node.text)
sentence = node.text
opinionated = node.attrib.get('opinionated')
polarity = node.attrib.get('polarity')
df_xml = df_xml.append(
pd.Series([sentence, opinionated, polarity], index=dfcols),
ignore_index=True)