Introducing AXIOM: The Axis Object Model

Introduction

XML has become one of the major technologies used today for business integration software evolution. Lots of object models are being used today to manipulate XML in various ways. AXIOM will improve XML manipulation by providing a new lightweight object model built around pull parsing, enabling efficient and easy manipulation of XML. AXIOM is the object model for Apache Axis 2, the next generation of the Apache web services engine. AXIOM is different from existing XML object models in various ways, the major one being the way it incrementally builds the memory model of an incoming XML source.

AXIOM itself does not contain a parser and it depends on StAX for input and output.

This tutorial will first show you how to obtain AXIOM and it will then go through the fundamental features of the AXIOM architecture. You will learn how to create XML documents from scratch, using elements, attributes, element content ("texts"), and namespaces. You will see how to read and write XML files from and to disk.

Installing AXIOM

AXIOM comes bundled with the Axis2 M1 release. The lib directory contains the axis-om-m1.jar file. However, more adventurous users can download the latest source, via Subversion, from the Apache Axis2 project and build the sources using Maven. AXIOM is maintained under the xml module of Apache Axis2. One can find more information at the Axis2 Subversion site.

AXIOM Architecture

AXIOM uses StAX reader and writer interfaces to interact with the external world, as shown in Figure 1. However, you can still use SAX and DOM to interact with AXIOM. Use of the standard StAX interfaces will enable AXIOM to interact with any kind of input source, be it an input stream, file, standard data binding tool, etc.


Figure 1. AXIOM interaction

Now let's take a deeper look at AXIOM architecture.

AXIOM uses a "builder" that will build the XML object model in memory, according to the events pulled from the underlying StAX parser, but will not create the entire object model at once. Instead, it only builds when the relevant information is absolutely required. This builder concept is the key to the most promising feature, the deferred building support for AXIOM. The builder comes into the picture when you are building an object model from an existing resource. If you build the object model programmatically, then you don't have to use builders.

This builder can optionally provide the events generated by the StAX parser to the user directly, while building the object model or not. This feature is called caching in AXIOM. This enables one to work at the event level, minimizing the memory requirement, or to work with the object model, improving performance.

If one opts to set the cache on (i.e., to build the object model by pulling events), then he can later retrieve the infoset through the AXIOM API. At any particular time, the XML object model will be either "partially" built or fully built. This concept is new to the XML processing world. AXIOM builder builds the object model only to the extent required by the ultimate user, but will not build the whole model at once. For example, take the following XML fragment.

<Employees>
  <Employee>
    <Name>Eran Chinthaka</Name>
<Project>Axis2</Project>
    <WorkPlace>Ambalangoda, Sri Lanka</WorkPlace>
  </Employee>
  <Employee>
     <Name>Ajith Harshana</Name>
     <Project>Axis2</Project>
     <WorkPlace>Kuliyapitiya, Sri Lanka</WorkPlace>
        </Employee>
</Employees>

Say the user wants to get the project of the first employee. This will make the builder build the object structure representing only up to the fourth line of the XML fragment. The rest will be kept "untouched" in the stream and the object structure contains only up to line 4. Then, if the user wants to know the project of the second employee, the builder builds only up to line 9. All of these things will happen transparently to the user, simply providing better performance.

The relationship of the builder to the XML data and the object model is shown in Figure 2.


Figure 2. AXIOM architecture

One of the most interesting things about AXIOM is that the model discussed so far is not by any means dependent on a particular programming language. Therefore, the AXIOM architecture can be implemented using any programming language that has an implementation of StAX. Moreover, this concept does not talk about the memory representation of the object model. The Axis2 project contains an implementation of the concept with a linked list object model, which has proven to be lightweight, fast, and efficient compared to other object models. There was another parallel effort made to implement this concept using a table model as well, which is now in the scratch area of the Apache Axis2 project. Even though the current major implementation of AXIOM uses a linked list model, one can implement the same concept using any other suitable memory model, as well.

AXIOM comes bundled with several builders:

  • StAXOMBuilder: This will build a generic memory model from any XML input source, such as a file, string, stream, etc.
  • StAXSOAPModelBuilder: This will build an object structure of SOAP XML in memory, which can be accessed using an "SOAPish" API. For example, when using it, you get a SOAPEnvelope class, with which you can call methods like getHeaders() and getBody(). But this API is still an extension of the generic AXIOM API. This is the model mainly used within the Axis2 project.
  • MTOMBuilder: This can be regarded as the first implementation of MTOM, the new API for sending attachments using some optimization algorithms. The latest AXIOM sources have full support for MTOM.

Please note that the current AXIOM implementation lags support for processing instructions and DTD information items of the XML infoset. But there is an ongoing effort within the Axis2 team to provide these features as well.

Using AXIOM

Creating AXIOM from Scratch

You can create an AXIOM using different methods. Let's try to do it programmatically this time.

import org.apache.axis.om.OMElement;
import org.apache.axis.om.OMFactory;

public class FirstExample {
    public static void main(String[] args) {
        OMElement documentElement =
            OMFactory.newInstance().createOMElement(
                               "MyDocumentElement",
                               "http://chinthaka.org",
                               "myPrefix");
        documentElement.setValue("Sample Text");
    }
}

The first line sets up the OMFactory (remember, the "OM" in AXIOM stands for "object model"). This OMFactory will enable to switch between different Java implementations of AXIOM. For example, I mentioned earlier that the current implementation is based on a linked list model. But if someone needed to use her own implementation of the AXIOM API, she could do that without touching a single line that uses those classes. The OMFactory.newInstance() method is smart enough to pick up the first implementation of AXIOM from the classpath. For this reason, it is highly recommended that you create new OM objects using the OMFactory.

Note that we have passed three parameters to create an OMElement. AXIOM is very much aware of namespaces and encourages the use of them. So the method signature is createOMElement(String localName, String namespaceURI, String namespacePrefix). If this namespace is already defined in the scope, AXIOM will assign that to this element, without declaring a new one.

Texts are also considered as nodes in AXIOM. You can either create an OMText and add that to OMElement, or you can simply use the element.setValue() method.

Adding Elements, Attributes, and Namespaces

First, let's create a namespace that can be used later, and then we'll use it to create the documentElement. Since we are using the factory for object creation, let's assign that to a new variable, omFactory, as well.

OMFactory omFactory = OMFactory.newInstance();
OMNamespace ns =
    omFactory.createOMNamespace("http://chinthaka.org",
                                "myPrefix");

OMElement documentElement =
    omFactory.createOMElement("MyDocumentElement", ns);

OMElement secondEle =
    OMFactory.newInstance().createOMElement("SecondElement", ns);
secondEle.setValue("Sample Text");
secondEle.insertAttribute("myAttr", "attrValue", ns);
documentElement.addChild(secondEle);

documentElement.declareNamespace("http://something.com",
                                "somePrefix")

Adding an attribute to an element is as easy as saying element.insertAttribute(String attrName, String attrValue, OMNamespace ns). Here you have the option of passing null to the namespace. The addChild() method allows you to add either an OMElement or an OMText.

You can use the declareNamespace() method to add a new namespace method to the element.

Serializing

I've mentioned in the first segment of this article that AXIOM depends on the StAX interface to interact with external world. For serializing, AXIOM uses the StAX writer interface.

try {
    XMLStreamWriter writer =
        XMLOutputFactory.newInstance().createXMLStreamWriter(
                                        System.out);
    documentElement.serialize(writer, false);
    writer.flush();
} catch (XMLStreamException e) {
   e.printStackTrace();
}

Create a writer to any output stream and call the serialize() method of an OMElement. Notice the Boolean flag in the serialize method. It has no meaning in this instance, but would be important if you were building the object model from an existing resource, like a file, using a builder. At any given point, you may have not built the whole XML representation, but you want to serialize the whole thing. Serializing will go through the whole input stream and will print it to an output stream. Once accessed, the input stream cannot be accessed again. So you must have the option to build the object model while accessing the incoming stream. The true Boolean flag will ask the builder to build the object model while serializing, and a false flag will just flush the XML to the outgoing stream from the incoming stream. This "caching" concept was introduced earlier.

This is what you will get as the output:

<myPrefix:MyDocumentElement 
    xmlns:myPrefix="http://chinthaka.org">
     <myPrefix:SecondElement
        myPrefix:myAttr="attrValue">Sample Text
     </myPrefix:SecondElement>
</myPrefix:MyDocumentElement>

Editor's note: Line breaks and indentation have been added to the XML to suit the java.net page layout.

Building from an Existing Source

You can build AXIOM from any input stream corresponding to XML. Here, the advantage is that you can start building as soon as you receive the first bit, without waiting to finish the whole stream.

FileReader soapFileReader = new FileReader(fileName); 
XMLStreamReader parser =
    XMLInputFactory.newInstance().createXMLStreamReader(
                                soapFileReader); 
StAXOMBuilder builder =
    new StAXOMBuilder(OMFactory.newInstance(), parser); 
OMElement documentElement = builder.getDocumentElement();

You have to create a StAX reader from the XML file and then pass that to the StAXOMBuilder with a reference to the preferred OMFactory. Then you can get the document element from that.

The best thing here in AXIOM is that you can mix elements that are partially built with programmatically built elements. AXIOM will take care of both types of elements.

Accessing the XML Infoset

Let's see how we can retrieve children of an element, providing specific information like QName. You can use a OMElement method to retrieve its children, given a QName. This will provide you with an iterator.

For example, let's say you want to find a children with the local name "project" and namespace URI http://myproject.org.

QName elementQName = new QName("project", "http://myproject.org");
Iterator infoIter =
    documentElement.getChildrenWithName(elementQName);
    while (infoIter.hasNext()) {
        OMElement element = (OMElement) infoIter.next();
        System.out.println("Matching Element Name = " +
            element.getFirstElement().getText());
        }

Note here that AXIOM is very much concerned about namespaces, so one has to provide a QName to retrieve a child. getChildWithName(QName) will return the first matching node, while getChildrenWithName(QName) will return an iterator.

The beauty of the parser here is that the iterator returned does not have information until it is asked for it. The iterator asks the builder to build if and only if the iterator needs information. There are lots of enhancements like this within AXIOM, to make it as lightweight as possible without compromising performance.

One more thing here to note is that we have called contributor.getFirstElement() to get the first element. But the method contributor.getFirstChild() may return a node of type text if there are leading spaces before the children of contributor element. The getText() method returns all of the texts that are direct children of an element, irrespective of location. Those two features were purposely introduced to preserve the full infoset, as is required by most security implementations.

Getting StAX Events from an Element

Let's say that you want to work on the events level and want to get events of a particular element.

XMLStreamReader streamReader = 
    documentElement.getPullParser(true);

You will be provided with an instance of the StAX stream reader, which is internally implemented in AXIOM. The Boolean flag is used to set the cache on or off.

Let's look at how smart AXIOM is when handling a complex scenario. If the documentElement() is half-built, AXIOM will generate the StAX events from the in-memory object model, for the built parts. For the rest, it will get the events directly from the builder and pass them to the user. In this process, if the user wants the cache on, the builder will build the object structure while handing over the events to the user.

If one needs to get SAX events from AXIOM, its just a matter of writing a converter from StAX events to SAX events, which is very easy.

Conclusion

This article introduced you to the AXIOM concept for XML handling and explained the implementation of it found in the Apache Axis2 project. The AXIOM API was designed to keep convenience and developer-friendliness in mind. I introduced only some of the methods in AXIOM, and AXIOM is continuously being improved to provide a better and better implementation. I strongly recommend that curious users to have a peek at the current sources found under the Apache Axis2 project. That said, note that the current AXIOM implementation will not provide full infoset support--though our community has made progress in making AXIOM a full infoset-supported object model.

Resources

S. W. Eran Chinthaka is a pioneering member of Apache Axis2, AXIOM and Synapse projects, working fulltime with WSO2 Inc..

你可能感兴趣的:(apache,xml,REST,subversion,performance)