By Alex Gusev
class CNodeFactory : public IXMLNodeFactory { public: CFile m_file; CNodeFactory(); ~CNodeFactory(); virtual HRESULT STDMETHODCALLTYPE QueryInterface(REFIID riid, void __RPC_FAR *__RPC_FAR *ppvObject) { *ppvObject = this; return S_OK; } virtual ULONG STDMETHODCALLTYPE AddRef( void) { return 2; } virtual ULONG STDMETHODCALLTYPE Release( void) { return 2; } HRESULT STDMETHODCALLTYPE NotifyEvent( IXMLNodeSource* pSource, XML_NODEFACTORY_EVENT iEvt) ; HRESULT STDMETHODCALLTYPE BeginChildren( IXMLNodeSource * pSource,XML_NODE_INFO * pNodeInfo); HRESULT STDMETHODCALLTYPE EndChildren( IXMLNodeSource * pSource, BOOL fEmpty, XML_NODE_INFO * pNodeInfo); HRESULT STDMETHODCALLTYPE Error( IXMLNodeSource * pSource, HRESULT hrErrorCode, USHORT cNumRecs, XML_NODE_INFO ** apNodeInfo); HRESULT STDMETHODCALLTYPE CreateNode( IXMLNodeSource * pSource, PVOID pNodeParent, USHORT cNumRecs, XML_NODE_INFO ** apNodeInfo); private: void PrintNode(const XML_NODE_INFO* pNode); void Print(const wchar_t* pText); };
For simplicity, our node factory will just print the trace info into a file during parsing. You may add desired functionality as needed. The sample code follows:
#include "NodeFactory.h" LPCTSTR NodeTypeToString(XML_NODE_TYPE type) { #define CASE(X) case X: return _T(#X) switch(type) { CASE(XML_ELEMENT); CASE(XML_ATTRIBUTE); CASE(XML_PI); CASE(XML_XMLDECL); CASE(XML_DOCTYPE); CASE(XML_DTDATTRIBUTE); CASE(XML_ENTITYDECL); CASE(XML_ELEMENTDECL); CASE(XML_ATTLISTDECL); CASE(XML_NOTATION); CASE(XML_GROUP); CASE(XML_INCLUDESECT); CASE(XML_PCDATA); CASE(XML_CDATA); CASE(XML_IGNORESECT); CASE(XML_COMMENT); CASE(XML_ENTITYREF); CASE(XML_WHITESPACE); CASE(XML_NAME); CASE(XML_NMTOKEN); CASE(XML_STRING); CASE(XML_PEREF); CASE(XML_MODEL); CASE(XML_ATTDEF); CASE(XML_ATTTYPE); CASE(XML_ATTPRESENCE); CASE(XML_DTDSUBSET); CASE(XML_LASTNODETYPE); } #undef CASE return _T("Unknown type"); } void CNodeFactory::PrintNode(const XML_NODE_INFO* pNode) { wchar_t buf[4096]; wchar_t nullTerm[4096]; memcpy(nullTerm, pNode->pwcText, pNode->ulLen*sizeof(wchar_t)); nullTerm[pNode->ulLen] = 0; wsprintf(buf,L"%s: [%s]/r/n", NodeTypeToString((XML_NODE_TYPE)pNode->dwType), nullTerm); Print(buf); } void CNodeFactory::Print(const wchar_t* pText) { m_file.Write(pText,wcslen(pText)*sizeof(wchar_t)); } CNodeFactory::CNodeFactory() { if(!m_file.Open(L"//Test.dat",CFile::modeWrite | CFile::modeCreate)) PRINT(L"file could not be opened/n"); } CNodeFactory::~CNodeFactory() { m_file.Close(); } HRESULT CNodeFactory::NotifyEvent( IXMLNodeSource* pSource, XML_NODEFACTORY_EVENT iEvt) { PRINT(L"NotifyEvent"); return S_OK; } HRESULT CNodeFactory::BeginChildren(IXMLNodeSource * pSource, XML_NODE_INFO * pNodeInfo) { Print(L"BeginChildren/r/n"); return S_OK; } HRESULT CNodeFactory::EndChildren( IXMLNodeSource * pSource, BOOL fEmpty, XML_NODE_INFO * pNodeInfo) { Print(L"EndChildren/r/n"); return S_OK; } HRESULT CNodeFactory::Error( IXMLNodeSource * pSource, HRESULT hrErrorCode, USHORT cNumRecs, XML_NODE_INFO ** apNodeInfo) { wchar_t msg[256]; wsprintf(msg,L"/n Error %s/r/n",(*apNodeInfo)->pwcText); Print(msg); return S_OK; } HRESULT CNodeFactory::CreateNode( IXMLNodeSource * pSource, PVOID pNodeParent, USHORT cNumRecs, XML_NODE_INFO ** apNodeInfo) { Print(L"CreateNode/r/n"); for(int i=0; i - cNumRecs; ++i) PrintNode(apNodeInfo[i]); Print(L"CreateNode End/r/n"); return S_OK; }
As a matter of fact, that's about all the important technical stuff about this type of parser. All additional details about interfaces involved can be found in WinCE help. Pro and contra of this parser are similar to SAX versus DOM parsers. Obviously, it's faster; you may stop parsing at any time you want to; memory usage is less; and so forth. From the other side, DOM has a lot of nice features too.
As a practical programmer, I used to think that DOM is based on three main interfaces: IXMLDOMNode, IXMLDOMNodeList, and IXMLDOMNamedNodeMap. There are, of course, several important children of these interfaces; IXMLDOMDocument and IXMLDOMElement are just pretty useful ones. Thus, the typical loading of some XML document is a pretty simple thing:
#include <msxml.h> #include <atlbase.h> typedef CComPtr<IXMLDOMDocument,&__uuidof(IXMLDOMDocument)> IXMLDOMDocumentPtr; void LoadXML(CString sFilePath) { IXMLDOMDocumentPtr pXMLDoc; COleVariant vXmlFile(sFilePath); VARIANT_BOOL vSuccess; HRESULT hr = pXMLDoc.CoCreateInstance(__uuidof (DOMDocument)); pXMLDoc->put_validateOnParse(VARIANT_FALSE); pXMLDoc->put_resolveExternals(VARIANT_FALSE); pXMLDoc->put_preserveWhiteSpace(VARIANT_FALSE); hr = pXMLDoc->load(vXmlFile,&vSuccess); }
As you see, all you need to do is to call the load method. Well, and use ATL's smart pointers to make your life easier. I'd like to mention several things here.
First of all, pay closest attention to the three 'put' calls, which disable aditional processing. This is the fastest way to load an XML document.
Another side effect of such an approach is that you may save on memory usage. It's well known that a DOM parser is expensive from a memory perspective. you should always keep this constraint in mind whenworking under Windows CE, even though recently devices have enough available memory to satisfy almost all possible needs. But, for large XML files, it may be a significant issue. If you're in such a situation, you may consider using SAX-like parsers. Another option is to balance the data/tags ratio, which immediately may reduce the file size to an acceptable value. Sometimes, using UCS-2 instead of utf-8 may help too, when your app works with languages requiring something like 3-byte for some characters.
Next, speaking frankly, you should 'smartly' use the smart pointers with XML because they don't provide casting to different inherited interfaces, so you won't be able to use, for example, IXMLDOMNodePtr and IXMLDOMElementPtr (inherited from IXMLDOMNode) at the same place. Second, sometimes the XML parser behaves weirdly and just removes the XML file it is going to load. A workaround is to read the file manually into the buffer, and then use the
IXMLDOMDocument::loadXML(BSTR bstrXML,VARIANT_BOOL *isSuccessful)method instead. Keeping this in mind, all rest is a piece of cake. Suppose we have the following simple XML:
<?xml version="1.0" encoding = "utf-8"?> <CONTACTS> <CONTACT category="business"> <NAME>John Doe</NAME> <BIRTHDATE>1971-11-19</BIRTHDATE> <EMAIL>[email protected]</EMAIL> <PHONE>(425) 111-1111</PHONE> </CONTACT> <CONTACT category="private"> <NAME>Jonny Walker</NAME> <BIRTHDATE>1968-09-17</BIRTHDATE> <EMAIL>[email protected]</EMAIL> <PHONE>(425) 222-2222</PHONE> </CONTACT> </CONTACTS>
To obtain some data from a loaded XML document, you may use either XPath queries or 'direct' walking through document tree using IXMLDOMNode::get_firstChild and IXMLDOMNode::get_nextSibling (or their 'last' analogues for the opposite direction). Tree walking is the fastest method of surfing through the whole document. Nevetherless, if you need to find some data in a document tree (I guess it's the most common case), XPath works much better. IXMLDOMDocument has two methods to use with XPath: selectSingleNode and selectNodes to get one or several nodes, respectively. So, to get all contacts in the "business" category, you run the following query:
typedef CComPtr<IXMLDOMNodeList,&__uuidof(IXMLDOMNodeList)> IXMLDOMNodeListPtr; typedef CComPtr<IXMLDOMNode,&__uuidof(IXMLDOMNode)> IXMLDOMNodePtr; CComBSTR bstrQuery(L"/CONTACTS/CONTACT[@category=/"business/"]"); IXMLDOMNodeListPtr pNodeList; HRESULT hr = pXMLDoc->selectNodes(bstrQuery,&pNodeList); if ( SUCCEEDED(hr) ) { long lLen = 0; hr = pNodeList->get_length(&lLen); if ( SUCCEEDED(hr) ) { for (int i = 0; i < lLen; i++) { IXMLDOMNodePtr pNode; pNodeList->get_item(i,&pNode); // Get some attribute as an example COleVariant vNodeValue; IXMLDOMElement *pNodeElm = NULL; hr = pXMLNode->QueryInterface(IID_IXMLDOMElement, (void**)&pNodeElm); if ( SUCCEEDED(hr) ) { pNodeElm->getAttribute(L"attr_name",&vNodeValue); pNodeElm->Release(); } // Handle all rest as you need to ... } } }
You'll find other examples of XPath queries in WinCE Help. Btw, many programmers are lazy enough and use operator "//" instead of the full path in queries. Well, you should know that it's not for free; it'll cost you up to 15% of your performance. In reality, it's hard to compare XPath versus tree walking precisey. get_firstChild and the others lead to a huge number of COM calls. selectSingleNode does the job in one single shot doing much less walking because it may skip a lot of text nodes. From the other side, selectSingle node and the like need to compare each node with a matching pattern. So, the actual performance depends on the XML document's structure. But, in general, XPath queries give you a better performance improvement.
The next factor that has an influence on parsing performance is validation. The bottom line here is that, by skipping such validation, you may get a double or triple decrease in loading an XML document.
If you're developing some kind of Web application or application that needs to create different reports based on XML data, you have one more option to think about. It's XSL, right. In some situations, it works 5-7 times faster that the trivial sequentual XPath approach to build output. We will not dive into too many details here, just put in a simple example to illustrate the idea:
// XPath usage way CString sOutput; CComBSTR bstrQuery(L"/CONTACTS/CONTACT[@category=/"business/"]"); IXMLDOMNodeListPtr pNodeList; CComBSTR bstrText; HRESULT hr = pXMLDoc->selectNodes(bstrQuery,&pNodeList); if ( SUCCEEDED(hr) ) { IXMLDOMNodePtr pNode; pNodeList->nextNode(&pNode); while ( pNode != NULL ) { IXMLDOMNodePtr pNestedNode; if ( SUCCEEDED(pNode->selectSingleNode(L"PHONE", &pNestedNode)) ) { // Do something... pNestedNode->get_text(&bstrText); sOutput += bstrText; sOutput += L"<br>"; } pNode->Release(); pNodeList->nextNode(&pNode); } } // XSL template driven formatting <xsl:template xmlns:xsl="http://www.w3.org/TR/WD-xsl"> <xsl:for-each select=L"/CONTACTS/CONTACT[@category='business']"> <xsl:for-each select=L"PHONE"> <xsl:value-of/><br> </for-each> </for-each> </xsl:template> // transform document with XSL template IXMLDOMDocumentPtr pXSL; HRESULT hr = pXSL.CoCreateInstance(__uuidof (DOMDocument)); pXSL->load(L"//test.xsl"); CComBSTR bstrOutput; pXMLDoc->transformNode(pXSL, &bstrOutput); // Process output as needed ...
XSL itself is a theme for many books, so here let me just note that it gives you a nice opportunity to modify the look and feel of output without any changes in the application logic. So once again, separating data from logic works just fine.
Windows CE 4.x is when C# comes to the mobile world. Still, it has some performance troubles, but that's the only way to use managed code under WinCE for now. C# has powerful support for XML, so the preceding examples dealing with DOM may be easily rewritten using C# because nothing's changed in terms of XML. Following is an example of simple XML parsing. This sample does not pretend to be a well-programming example; it just illustrates the technique. The code calls to a Web service and then fills in the listbox with the retrieved data.
using System; using System.Drawing; using System.Collections; using System.Windows.Forms; using System.Data; using System.Xml; using System.IO; public class Form1 : System.Windows.Forms.Form { ....... private System.Windows.Forms.TextBox SessionID; ....... private void START_Click(object sender, System.EventArgs e) { long lStartTicks; long lEndTicks; long lSecEndTicks; String sResult; String sSessionKey; String sLogLine; StringReader sReaderResult; XmlTextReader xmlTextReaderResult; Cursor.Current = Cursors.WaitCursor; lStartTicks = Environment.TickCount; // create/open the log file FileStream LogFile = new FileStream ("//WSLog.txt", FileMode.OpenOrCreate); StreamWriter LogWriter = new StreamWriter (LogFile, System.Text.Encoding.UTF8); // create the obj SomeWebService.WSInventorty obj = new SomeWebService.WSInventorty (); obj.Url = URL.Text; // call the login method sResult = obj.DoLogin (UID.Text, PW.Text); lEndTicks = Environment.TickCount; String sClean = (lEndTicks - lStartTicks).ToString (); lStartTicks = lEndTicks; sReaderResult = new StringReader (sResult); xmlTextReaderResult = new XmlTextReader (sReaderResult); xmlTextReaderResult.WhitespaceHandling = WhitespaceHandling.None; xmlTextReaderResult.MoveToContent (); if (xmlTextReaderResult.Read ()) { lEndTicks = Environment.TickCount; SessionID.Text = xmlTextReaderResult.Value; sSessionKey = xmlTextReaderResult.Value; Time.Text = sClean; } else { SessionID.Text = "Error :-("; return; } sReaderResult.Close (); xmlTextReaderResult.Close (); Cursor.Current = Cursors.Default; // call the getinfo method // first, init the arrays with the data - we need vendors // and items ids. String [] sVendors = new String [100]; String [] sItems = new String [100]; InitData (sVendors, sItems); String sItemInfo; for (int i = 0; i < 100; i ++) { StringReader sReaderInfo; XmlTextReader xmlTextReaderInfo; lStartTicks = Environment.TickCount; sResult = obj.GetData(sSessionKey, sItems [i], sVendors [i]); lEndTicks = Environment.TickCount; sLogLine = i.ToString () + "," + (lEndTicks - lStartTicks).ToString () + "," + sItems [i] + ","; sReaderInfo = new StringReader (sResult); xmlTextReaderInfo = new XmlTextReader (sReaderInfo); xmlTextReaderInfo.WhitespaceHandling = WhitespaceHandling.None; xmlTextReaderInfo.MoveToContent (); sItemInfo = ""; while (xmlTextReaderInfo.Read ()) { int nXMLAttributesCount = xmlTextReaderInfo.AttributeCount; for (int n = 0; n < nXMLAttributesCount; n ++) { xmlTextReaderInfo.MoveToAttribute (n); if ("FullName" == xmlTextReaderInfo.Name) sLogLine += xmlTextReaderInfo.Value; sItemInfo += xmlTextReaderInfo.Value; sItemInfo += " "; } } LogWriter.Write (sLogLine); xmlTextReaderInfo.Close (); sReaderInfo.Close (); lSecEndTicks = Environment.TickCount; RTList.Items.Add (i.ToString () + " " + (lEndTicks - lStartTicks).ToString () + "|" + (lSecEndTicks - lEndTicks).ToString () + " " + sItemInfo); RTList.SelectedIndex = RTList.Items.Count - 1; } // close the writer LogWriter.Close (); } }
So, as you see, this all is pretty similar to C++ stuff.
It does not matter which version of WinCE your application is targeting. XML becomes a common technique anywhere. So, there are fewer and fewer reasons you won't use it.
Alex Gusev started to play with mainframes at the end of the 1980s, using Pascal and REXX, but soon switched to C/C++ and Java on different platforms. When mobile PDAs seriously rose their heads in the IT market, Alex did it too. Now, he works at an international retail software company as a team leader of the Mobile R department, making programmers' lives in the mobile jungles a little bit simpler.
To tell you the truth, I'm not an expert in XML at all. And before Pocket PC 2002, I did not need to be. But, these days are gone, and now Microsoft delivers a powerful XML parser as part of its mobile platforms. Still, it is not as state-of-the-art as the desktop one, but it has became useful enough. So, now you, as a programmer, may consider using XML as a storage layer for your application. XML today is a wide area, hardly coverable in one article, so here we will discuss only some basical aspects of XML usage in mobile applications.
XML may become a player in mobile games due to several reasons. Windows CE world is built on Unicode. This trivial fact often leads to an unpleasant issue: You should convert data from ASCII to Unicode and vice versa. Not a big deal really, but the standard API functions MultiByteToWideChar/WideCharToMultiByte don't work so well with languages other than English. Thus, you must implement your own convertor to be sure all is okay or use Unicode. If you should support different languages in your application, it may turn into a real headache. Data maintanance issues aren't worth being noted... You for sure may discover more reasons occuring in real projects.
XML gives us a nice opportunity to use ASCII files almost anywhere and anytime, even the same files with both desktop and WinCE. All you need to do is to use UTF-8 encoding and probably additional fonts for those code pages that are not included to a predefined set; for example, Hebrew or Arabic. If you port your application to Windows CE, XML will give you a consistent solution. Data size may be a painful point because XML can't be named as 'lightweight' technology, but some balanced XML structure always may be found. We will discuss this issue later in the article.
On early Pocket PC devices, there was (and still is) an XML parser as a part of HTML control. Beginning recently, Windows CE comes with a DOM XML parser of at least version 2.0. A SAX parser is supported only in Windows CE 4.0 and later. As with many other APIs, these parsers are not as rich as their desktop counterparts, but give us enough nice features. XPath queries are also supported; even that documentation often states the opposite.
First of all, let's consider the worst case—PocketPC 2000, where we have the only opportunity, a simple SAX-like XML parser. It was deprecated in XML 3.0 (under WinCE 4.0 and later), but you may find it on Pocket PC 2000/2002. I was surprised that I've failed to find any understandable examples about how people may use it. So, let's take a quick look at the general flow.
Actually, all seems to be pretty simple. The short theory may be formulated as follows:
The key trick here is that your need to implement IXMLNodeFactory to be able to proceed parsing. Then parser will call IXMLNodeFactory methods for each document node, just like SAX or expat. The next sample illustrates all said above:
#include <xmlparser.h> #define PRINT(X) ::MessageBox(NULL,X,L"XmlParser",MB_OK) HRESULT ParseXml(const TCHAR* wszURL, IXMLNodeFactory* pNodeFactory) { wchar_t msg[256]; IXMLParser* xp = NULL; HRESULT hr = 0; CoCreateInstance(CLSID_XMLParser,NULL,CLSCTX_INPROC_SERVER, IID_IXMLParser,(void**)&xp); hr = xp->SetURL(L"file:////OurURL//", wszURL, FALSE); if(FAILED(hr)) { wsprintf(msg,L"SetURL(%s) failed with hr=0x%x",wszURL,hr); PRINT(msg); return hr; } hr = xp->SetFactory(f); if(FAILED(hr)) { PRINT(L"SetFactory failed!"); return hr; } hr = xp->Run(-1); if(FAILED(hr)) { wsprintf(msg,L"Run failed with hr=0x%x",hr); PRINT(msg); return hr; } xp->Release(); return hr; }
As you may see, the parser has several sources of incoming data: URL, file (via IStream interface), and memory buffer. Please refer to WinCE help for additional details. After the parser's input and node factory are assigned, the Run method does its job. Now, let's focus on the node factory implementation. The header file is listed below:
#include "xmlparser.h" #define PRINT(X) ::MessageBox(NULL,X,L"XmlParser",MB_OK)