XML stands for eXtensible Markup Language. It was designed to store and transport small to medium amounts of data and is widely used for sharing structured information.
Python enables you to parse and modify XML documents. In order to parse XML document, you need to have the entire XML document in memory. In this tutorial, we will see how we can use XML minidom class in Python to load and parse XML files.
We have created a sample XML file that we are going to parse.
Inside the file, we can see the first name, last name, home, and the area of expertise (SQL, Python, Testing and Business)
Once we have parsed the document, we will print out the “node name” of the root of the document and the “firstchild tagname”. Tagname and nodename are the standard properties of the XML file.
Note: Nodename and child tagname are the standard names or properties of an XML dom.
Next, We can also call the list of XML tags from the XML document and printed out. Here we printed out the set of skills like SQL, Python, Testing and Business.
We can create a new attribute by using the “createElement” function and then append this new attribute or tag to the existing XML tags. We added a new tag “BigData” in our XML file.
import xml.dom.minidom
def main():
# use the parse() function to load and parse an XML file
doc = xml.dom.minidom.parse("Myxml.xml");
# print out the document node and the name of the first child tag
print doc.nodeName
print doc.firstChild.tagName
# get a list of XML tags from the document and print each one
expertise = doc.getElementsByTagName("expertise")
print "%d expertise:" % expertise.length
for skill in expertise:
print skill.getAttribute("name")
#Write a new XML tag and add it into the document
newexpertise = doc.createElement("expertise")
newexpertise.setAttribute("name", "BigData")
doc.firstChild.appendChild(newexpertise)
print " "
expertise = doc.getElementsByTagName("expertise")
print "%d expertise:" % expertise.length
for skill in expertise:
print skill.getAttribute("name")
if name == "__main__":
main();
import xml.dom.minidom
def main():
# use the parse() function to load and parse an XML file
doc = xml.dom.minidom.parse("Myxml.xml");
# print out the document node and the name of the first child tag
print (doc.nodeName)
print (doc.firstChild.tagName)
# get a list of XML tags from the document and print each one
expertise = doc.getElementsByTagName("expertise")
print ("%d expertise:" % expertise.length)
for skill in expertise:
print (skill.getAttribute("name"))
# Write a new XML tag and add it into the document
newexpertise = doc.createElement("expertise")
newexpertise.setAttribute("name", "BigData")
doc.firstChild.appendChild(newexpertise)
print (" ")
expertise = doc.getElementsByTagName("expertise")
print ("%d expertise:" % expertise.length)
for skill in expertise:
print (skill.getAttribute("name"))
if __name__ == "__main__":
main();
ElementTree is an API for manipulating XML. ElementTree is the easy way to process XML files.
We are using the following XML document as the sample data:
<data>
<items>
<item name="expertise1">SQLitem>
<item name="expertise2">Pythonitem>
items>
data>
we must first import the xml.etree.ElementTree module.
import xml.etree.ElementTree as ET
Now let’s fetch the root element:
root = tree.getroot()
Following is the complete code for reading above xml data
import xml.etree.ElementTree as ET
tree = ET.parse('items.xml')
root = tree.getroot()
# all items data
print('Expertise Data:')
for elem in root:
for subelem in elem:
print(subelem.text)
output:
Expertise Data:
SQL
Python
Python enables you to parse the entire XML document at one go and not just one line at a time. In order to parse XML document you need to have the entire document in memory.