CMarkup: fast simple C++ XML parser 学习笔记

C++中创建和解析XML最好用的开源工具CMakeup,基本可以满足日常开发之用。
官网:www.firstobject.com, 在这里可以下载CMakeup源码,其中还包括一个Demo,可以用来学习之。
注:不知道为什么这个网站访问很慢,有时直接无法访问,在这里提供CMakeup源代码下载, 解析工具下载。

下面摘录自“www.firstobject.com”,注意黄底红字:
Create new XML documents, parse and modify existing XML documents from the methods of one simple C++ XML parser class.
Quick Start
  • Open the zip file and copy Markup.cpp and Markup.h into your C++ project folder
  • Add Markup.cpp and Markup.h to your project (makefile or IDE)
  • #include "Markup.h" where you use the CMarkup class

    Visual C++ specific:

  • In Visual C++ projects that use precompiled headers you will need to turn them off for Markup.cpp (see Pre-compiled Header Issue)
  • In Visual C++ to use STL string instead of MFC CString add MARKUP_STL to your C++ Preprocessor Definitions

CMarkup Methods

This is the master list of CMarkup class methods. The CMarkup methods are based on the originalEDOM design. The shaded methods are only available in the Developer Version of CMarkup.

Initialization
Load Populates the CMarkup object from a file and parses it
SetDoc Populates the CMarkup object from a string and parses it
Output
Save Writes the document to file
GetDoc Returns the whole document as a markup string
GetDocFormatted Returns the formatted markup string of the whole document
File mode
Open Opens file, initiating file mode for read or write (and append is a special case of write mode)
Close Closes file and ends file mode
Flush For file write mode, this flushes any partial document in memory (up to the closing tags) and the file stream itself
Changing the current position
FindElem Locates next element, optionally matching tag name or path
FindChildElem Locates next child element matching tag name or path
FindPrevElem Locates previous element, optionally matching tag name
FindPrevChildElem Locates previous child element, optionally matching tag name
FindNode Locates next node, optionally matching node type(s)
IntoElem Go "into" current main position element such that it becomes the current parent position
OutOfElem Makes the current parent position into the current main position
ResetPos Resets the current position to the start of the document
ResetMainPos Resets the current main position to before the first sibling
ResetChildPos Resets the current child position to before the first child
Adding to the Document
AddElem Adds an element after the current main position element or last sibling
InsertElem Inserts an element before the current main position element or first sibling
AddChildElem Adds an element after the current child position element or last child
InsertChildElem Inserts an element before the current child position element or first child
AddSubDoc Adds a subdocument after the current main position element or last sibling
InsertSubDoc Inserts a subdocument before the current main position element or first sibling
AddChildSubDoc Adds a subdocument after the current child position element or last child
InsertChildSubDoc Inserts a subdocument before the current child position element or first child
AddNode Adds a node after the current node or at the end of the parent element content
InsertNode inserts a node before the current node or at the beginning of the parent element content
Removing from the Document
RemoveElem Removes the current main position element including child elements
RemoveChildElem Removes the current child position element including its child elements
RemoveNode Removes the current node
RemoveAttrib Removes the specified attribute from the current main position element
RemoveChildAttrib Removes the specified attribute from the current child position element
Getting Values
GetData Returns the string value of the current main position element or node
GetChildData Returns the string value of the current child position element
GetElemContent Returns the string markup content of the current main position element including child elements
GetSubDoc Returns the subdocument markup string of the current main position element including child elements
GetChildSubDoc Returns the subdocument markup string of the current child position element including child elements
GetAttrib Returns the string value of the specified attribute of the main position element (or processing instruction)
GetChildAttrib Returns the string value of the specified attribute of the child position element
HasAttrib Returns true if the specified attribute exists in the main position element (or processing instruction)
HasChildAttrib Returns true if the specified attribute exists in the child position element
GetTagName Returns the tag name of the main position element (or processing instruction)
GetChildTagName Returns the tag name of the child position element
FindGetData Locates the next element matching the specified path and returns the string value
Setting Values
SetData Sets the value of the current main position element or node
SetChildData Sets the value of the current child position element
SetElemContent Sets the markup content of the current main position element
SetAttrib Sets the value of the specified attribute of the current main position element (or processing instruction)
SetChildAttrib Sets the value of the specified attribute of the current child position element
FindSetData Locates the next element matching the specified path and sets the value
Other Info
GetNthAttrib Returns the name and value of attribute specified by number for the current main position element
GetAttribName Returns the name of attribute specified by number for the current main position element
GetNodeType Returns the node type of the current node
GetElemLevel Returns the level of the current main position
GetElemFlags Returns the current main position element's flags
SetElemFlags Sets the current main position element's flags
GetOffsets Obtains the document text offsets of the current main position
GetAttribOffsets Obtains the document text offsets of the specified attribute in the current main position
Remembering positions
SavePos Saves the current position with an optional string name using a hash map
RestorePos Goes to the position saved with SavePos
SetMapSize Sets the size of a map for use with the SavePos and RestorePos methods
GetElemIndex Returns the integer index of the current main position element
GotoElemIndex Sets the current main position element to that of the given integer index
GetChildElemIndex Returns the integer index of the current child position element
GotoChildElemIndex Sets the current child position element to that of the given integer index
GetParentElemIndex Returns the integer index of the current parent position element
GotoParentElemIndex Sets the current parent position element to that of the given integer index
GetElemPath Returns a string representing the absolute path of the main position element
GetChildElemPath Returns a string representing the absolute path of the child position element
GetParentElemPath Returns a string representing the absolute path of the parent position element
Document Status
IsWellFormed Determines if document has a single root element and properly contained elements
GetResult Returns result markup from last parse or file operation
GetError Returns English error/result synopsis string from last parse or file operation
GetDocFlags Returns the document flags
SetDocFlags Sets the document flags
GetDocElemCount Returns the number of elements in the document
Static Utility Functions
ReadTextFile Reads a text file into a string
WriteTextFile Writes a string to a text file
GetDeclaredEncoding Returns the encoding name as a string from the XML declaration
EscapeText Returns the string with special characters encoded for markup
UnescapeText Returns the string with special characters unencoded for a string value
UTF8ToA Converts a UTF-8 string to a non-Unicode ("ANSI") string
AToUTF8 Converts a non-Unicode ("ANSI") string to UTF-8
UTF16To8 Converts a UTF-16 string to UTF-8
UTF8To16 Converts a UTF-8 string to UTF-16
EncodeBase64 Encodes a binary data buffer to a Base64 string
DecodeBase64 Encodes a Base64 string to a binary data buffer

































































































































































































































Fast start to XML in C++

Enough bull. You want to create XML or read and find things in XML. All you need to know about CMarkup is that it is just one object per XML document (for the API design concept see EDOM). And by the way the free firstobject XML Editor generates C++ source code for creating and navigating your own XML documents with CMarkup.

Creating an XML Document

To create an XML document, instantiate a CMarkup object and call AddElem to create the root element. At this point your document would simply contain the empty root element e.g. . Then call IntoElem to go "inside" the ORDER element so that you can create child elements under the root element (i.e. the root element will be the "container" of the child elements).

The following example code creates an XML document.

CMarkup xml;
xml.AddElem( "ORDER" );
xml.IntoElem();
xml.AddElem( "ITEM" );
xml.IntoElem();
xml.AddElem( "SN", "132487A-J" );
xml.AddElem( "NAME", "crank casing" );
xml.AddElem( "QTY", "1" );

This code generates the following XML. The root is the ORDER element; notice that its start tag is at the beginning and end tag  is at the bottom. When an element is under (i.e. inside or contained by) a parent element, the parent's start tag is before it and the parent's end tag is after it. The ORDER element contains one ITEM element. That ITEM element contains 3 child elements: SN, NAME, and QTY.



132487A-J
crank casing
1

As shown in the example, you create elements under an element by calling IntoElem to make your current main position (or "place holder") into your current parent position so you can begin adding child elements. CMarkup maintains a current position in order to keep your source code shorter and simpler. This same position logic is used when navigating a document.

You can write the above document to file with Save:

xml.Save( "C:\\Sample.xml" );

And you can retrieve the XML into a string with GetDoc:

MCD_STR strXML = xml.GetDoc();

Markup.h defines MCD_STR to the string type you compile CMarkup for, so we use MCD_STR in these examples, but you can use your own string type explicitly (e.g. std::string or CString).

Navigating an XML Document

You can navigate the data right inside the same CMarkup object you created in the example above; just call ResetPos if you want to go back to the beginning of the document. Or you can populate a new CMarkup object:

CMarkup xml;

From a file with Load:

xml.Load( "C:\\Sample.xml" );

Or from an XML string with SetDoc:

xml.SetDoc( strXML );

In the following example, we go inside the root ORDER element and loop through all ITEM elements with FindElem to get the serial number and quantity of each with GetData. The serial number is treated as a string and the quantity is converted to an integer using atoi (MCD_2PCSZ is defined in Markup.h to return the string's const pointer).

xml.FindElem(); // root ORDER element
xml.IntoElem(); // inside ORDER
while ( xml.FindElem("ITEM") )
{
    xml.IntoElem();
    xml.FindElem( "SN" );
    MCD_STR strSN = xml.GetData();
    xml.FindElem( "QTY" );
    int nQty = atoi( MCD_2PCSZ(xml.GetData()) );
    xml.OutOfElem();
}

For each item we find, we call IntoElem to interrogate its child elements, and then OutOfElemafterwards. As you get accustomed to this type of navigation you will know to check in your loops to make sure there is a corresponding OutOfElem call for every IntoElem call.

Adding Elements and Attributes

The above example for creating a document only created one ITEM element. Here is an example that creates multiple items loaded from a previously populated data source, plus a SHIPMENT information element in which one of the elements has an attribute we set with SetAttrib.

CMarkup xml;
xml.AddElem( "ORDER" );
xml.IntoElem(); // inside ORDER
for ( int nItem=0; nItem// inside ITEM
    xml.AddElem( "SN", aItems[nItem].strSN );
    xml.AddElem( "NAME", aItems[nItem].strName );
    xml.AddElem( "QTY", aItems[nItem].nQty );
    xml.OutOfElem(); // back out to ITEM level
}
xml.AddElem( "SHIPMENT" );
xml.IntoElem(); // inside SHIPMENT
xml.AddElem( "POC" );
xml.SetAttrib( "type", strPOCType );
xml.IntoElem(); // inside POC
xml.AddElem( "NAME", strPOCName );
xml.AddElem( "TEL", strPOCTel );

This code generates the following XML. The root ORDER element contains 2 ITEM elements and a SHIPMENT element. The ITEM elements both contain SN, NAME and QTY elements. The SHIPMENT element contains a POC element which has a type attribute, and NAME and TEL child elements.



132487A-J
crank casing
1


4238764-A
bearing
15


 type="non-emergency">
John Smith
555-1234


Finding Elements

The FindElem method goes to the next sibling element. If the optional tag name argument is specified, then it goes to the next element with a matching tag name. The element that is found becomes the current element, and the next call to FindElem will go to the next sibling or matching sibling after that current position.

When you cannot assume the order of the elements, you must move the position back before the first sibling with ResetMainPos in between your calls to the FindElem method. Looking at the ITEM element in the above example, if someone else is creating the XML and you cannot assume the SN element is before the QTY element, then call ResetMainPos before finding the QTY element.

{
    xml.IntoElem();
    xml.FindElem( "SN" );
    MCD_STR strSN = xml.GetData();
    xml.ResetMainPos();
    xml.FindElem( "QTY" );
    int nQty = atoi( MCD_2PCSZ(xml.GetData()) );
    xml.OutOfElem();
}

To find the item with a particular serial number, you can loop through the ITEM elements and compare the SN element data to the serial number you are searching for. By specifying the "ITEM" element tag name in the FindElem method we ignore all other sibling elements such as the SHIPMENT element. Also, instead of going into and out of the ITEM element to look for the SN child element, we use the FindChildElem and GetChildData methods for convenience.

xml.ResetPos(); // top of document
xml.FindElem(); // ORDER element is root
xml.IntoElem(); // inside ORDER
while ( xml.FindElem("ITEM") )
{
    xml.FindChildElem( "SN" );
    if ( xml.GetChildData() == strFindSN )
        break; // found
}
You are NOT on your own

This site has all kinds of examples of doing various XML operations. CMarkup has been widely used for many years. Of course it doesn't do everything, but almost every purpose has at least been discussed. Don't hesitate to ask if you have questions. A good place to go next is the CMarkup Methods.


你可能感兴趣的:(C++)