关于VC解析XML的高速类CMarkup

转自http://www.firstobject.com/dn_markup.htm

CMarkup

Terms: CMarkup Evaluation License Agreement
Download: Markup90.zip 449k release notes

Create new XML documents; parse and modify existing XML documents. All from the methods of one simple class utilizing MFC or STL strings. With CMarkup you can run the example right out of the zip file, look through its source code if you like, and compile XML capability into your own application in minutes.

Quick Start

The zip download includes a Visual C++ project and Windows executable that demonstrates CMarkup. But to add CMarkup to your project, regardless of platform or compiler, all you need from the zip are the Markup.cpp and Markup.h file. See the following steps.

  • Add Markup.cpp and Markup.h to your C++ project.
  • Add #include "Markup.h" where you use the CMarkup class.
  • In Visual C++ projects that use precompiled headers you will need to turn off precompiled headers for Markup.cpp (see Pre-compiled Header Issue).
  • To use STL string instead of MFC CString define MARKUP_STL in your C++ Preprocessor Definitions.

    Features

    The concise CMarkup Methods let you easily manipulate elements, attributes and data in the document.

    If you wish to leverage Microsoft's MSXML, the CMarkup project comes with the fully demonstrated CMarkupMSXML class so that you can use MSXML via CMarkup methods and get a head start with the C++ COM syntax. See MSXML Wrapper CMarkupMSXML.

    The download (see link to zip file above) contains source code for the Test Dialog project, which tests and demonstrates all of the CMarkup classes and build options. See notes on Licensing at the bottom of this article for details about commercial use.

  •    

    关于VC解析XML的高速类CMarkup_第1张图片
    Under one tenth of a second to load and parse a 3MB file! (2GHz CPU; fast on slower machines too, see CMarkup Performance Tests).

    Features of CMarkup are as follows:

  • Light: one small class that compiles into your program and maintains only a string for the document and an index array usually amounting to less than the memory size of the string.
  • Fast: the parser builds the index array in one quick pass.
  • Simple: CMarkup Methods make it ridiculously easy to create, navigate and modify XML.
  • Independent: does not require any external XML component
  • STL: use STL string or wstring instead of MFC CString (define MARKUP_STL).
  • UNICODE: can be compiled for UNICODE (including Windows CE); the XML document is persisted in a UTF-8 file but processed internally in Wide-Char.
  • UTF-8: works with UTF-8 documents; it accepts and returns UTF-8 strings (make sure _MBCS is not defined).
  • MBCS: Optionally works with Visual C++ MBCS (double-byte) strings (define _MBCS), which is not compatible with UTF-8.
  • MSXML: the MSXML Wrapper CMarkupMSXML demonstrate's Microsoft's XML service with easy CMarkup methods (requires Visual Studio and MFC). The MSXML build options demonstrate this class.

    XML for Everyday Data

    We often need to store and/or pass information in a file, or send a block of information from computer A to computer B. And the issue is always the same: How shall I format this data? Before XML, you might have considered "env" style e.g. PATH=C:/WIN95; "ini" style (grouped in sections); comma-delimited or otherwise delimited; or fixed character lengths. XML is now the established answer to that question except that programmers are sometimes discouraged by the size and complexity of XML solutions when all they need is something convenient to help parse and format angle brackets. For a quick read on the syntax rules of XML, see Introduction to XML.

    XML is better because of its flexible and hierarchical nature, plus its wide acceptance. Although XML uses more characters than delimited formats, it compresses down well if needed. The flexibility of XML becomes apparent when you want to expand the types of information your document can contain without requiring every consumer of the information to rewrite processing logic. You can keep the old information identified and ordered the same way it was while adding new attributes and elements (XML Versioning).

    Using CMarkup

    CMarkup is based on EDOM the "Encapsulated" Document Object Model, the key to simple XML processing. Its an approach to XML processing with the same general purpose as DOM (Document Object Model). But while DOM has numerous types of objects, EDOM defines only one object, the XML document. EDOM harks back to the original attraction of XML which was its simplicity.

    The CMarkup class encapsulates the XML document text, structure, and current positions. It has methods both to add elements and to navigate and get element attributes and data. The locations in the document where operations are performed are governed by the current position and the current child position. This current positioning allows you to work with the XML document without instantiating additional objects that point into the document. At all times, the object maintains a string representing the text of the document which can be retrieved using GetDoc.

    Check out the free firstobject XML Editor which generates C++ source code for creating and navigating your own XML documents with CMarkup.

    Creating an XML Document

    To create an XML document, instantiate a CMarkup object and call AddElem to create the root element. At this point, if you called AddElem("ORDER") your document would simply contain the empty ORDER element <ORDER/>. Then call AddChildElem to create elements under the root element (i.e. "inside" the root element, hierarchically speaking). The following example code creates an XML document and retrieves it into a CString:

    CMarkup xml;
    xml.AddElem( "ORDER" );
    xml.AddChildElem( "ITEM" );
    xml.IntoElem();
    xml.AddChildElem( "SN", "132487A-J" );
    xml.AddChildElem( "NAME", "crank casing" );
    xml.AddChildElem( "QTY", "1" );
    CString csXML = xml.GetDoc();

    This code generates the following XML. The root is the ORDER element; notice that its start tag <ORDER> is at the beginning and end tag </ORDER> is at the bottom. When an element is under (i.e. inside or contained by) a parent element, the parent's start tag is before it and the parent's end tag is after it. The ORDER element contains one ITEM element. That ITEM element contains 3 child elements: SN, NAME, and QTY.

    <ORDER>
    <ITEM>
    <SN>132487A-J</SN>
    <NAME>crank casing</NAME>
    <QTY>1</QTY>
    </ITEM>
    </ORDER>

    As shown in the example, you can create elements under a child element by calling IntoElem to move your current main position to where the current child position is so you can begin adding under that one. CMarkup maintains a current position in order to keep your source code shorter and simpler. This same position logic is used when navigating a document.

    Navigating an XML Document

    The XML string created in the above example can be parsed into a CMarkup object with the SetDoc method. You can also navigate it right inside the same CMarkup object where it was created; just call ResetPos if you want to go back to the beginning of the document.

    In the following example, after populating the CMarkup object from the csXML string, we loop through all ITEM elements under the ORDER element and get the serial number and quantity of each item:

    CMarkup xml;
    xml.SetDoc( csXML );
    while ( xml.FindChildElem("ITEM") )
    {
        xml.IntoElem();
        xml.FindChildElem( "SN" );
        CString csSN = xml.GetChildData();
        xml.FindChildElem( "QTY" );
        int nQty = atoi( xml.GetChildData() );
        xml.OutOfElem();
    }

    For each item we find, we call IntoElem before interrogating its child elements, and then OutOfElem afterwards. As you get accustomed to this type of navigation you will know to check in your loops to make sure there is a corresponding OutOfElem call for every IntoElem call.

    Adding Elements and Attributes

    The above example for creating a document only created one ITEM element. Here is an example that creates multiple items loaded from a previously populated data source, plus a SHIPMENT information element in which one of the elements has an attribute. This code also demonstrates that instead of calling AddChildElem, you can call IntoElem and AddElem. It means more calls, but some people find this more intuitive.

    CMarkup xml;
    xml.AddElem( "ORDER" );
    xml.IntoElem(); // inside ORDER
    for ( int nItem=0; nItem<aItems.GetSize(); ++nItem )
    {
        xml.AddElem( "ITEM" );
        xml.IntoElem(); // inside ITEM
        xml.AddElem( "SN", aItems[nItem].csSN );
        xml.AddElem( "NAME", aItems[nItem].csName );
        xml.AddElem( "QTY", aItems[nItem].nQty );
        xml.OutOfElem(); // back out to ITEM level
    }
    xml.AddElem( "SHIPMENT" );
    xml.IntoElem(); // inside SHIPMENT
    xml.AddElem( "POC" );
    xml.SetAttrib( "type", csPOCType );
    xml.IntoElem(); // inside POC
    xml.AddElem( "NAME", csPOCName );
    xml.AddElem( "TEL", csPOCTel );

    This code generates the following XML. The root ORDER element contains 2 ITEM elements and a SHIPMENT element. The ITEM elements both contain SN, NAME and QTY elements. The SHIPMENT element contains a POC element which has a type attribute, and NAME and TEL child elements.

    <ORDER>
    <ITEM>
    <SN>132487A-J</SN>
    <NAME>crank casing</NAME>
    <QTY>1</QTY>
    </ITEM>
    <ITEM>
    <SN>4238764-A</SN>
    <NAME>bearing</NAME>
    <QTY>15</QTY>
    </ITEM>
    <SHIPMENT>
    <POC type="non-emergency">
    <NAME>John Smith</NAME>
    <TEL>555-1234</TEL>
    </POC>
    </SHIPMENT>
    </ORDER>

    Finding Elements

    The FindElem and FindChildElem methods go to the next sibling element. If the optional tag name argument is specified, then they go to the next element with a matching tag name. The element that is found becomes the current element, and the next call to Find will go to the next sibling or matching sibling after that current position.

    When you cannot assume the order of the elements, you must reset the position in between calling the Find method. Looking at the ITEM element in the above example, if someone else is creating the XML and you cannot assume the SN element is before the QTY element, then call ResetChildPos before finding the QTY element.

    To find the item with a particular serial number, you can loop through the ITEM elements and compare the SN element data to the serial number you are searching for. This example differs from the original navigation example by calling IntoElem to go into the ORDER element and use FindElem("ITEM") instead of FindChildElem("ITEM"); either way is fine. And notice that by specifying the "ITEM" element tag name in the Find method we ignore all other sibling elements such as the SHIPMENT element.

    CMarkup xml;
    xml.SetDoc( csXML );
    xml.FindElem(); // ORDER element is root
    xml.IntoElem(); // inside ORDER
    while ( xml.FindElem("ITEM") )
    {
        xml.FindChildElem( "SN" );
        if ( xml.GetChildData() == csFindSN )
            break; // found
    }

    The Test Dialog

    The Markup.exe testbed for CMarkup is a Visual Studio 6.0 MFC project. When it starts, it performs diagnostics in the OnTest function to test CMarkup in the context of the particular build options that have been selected. You can step through the OnTest function to see a lot of examples of how to use CMarkup. The dialog displays the Build Version and RunTest results. For example:

    CMarkup 9.0 STL Debug Unicode
    RunTest complete 62 tests and 2702 checks for this build

    "CMarkup 9.0 STL Debug Unicode" means that it is the debug build with STL_MARKUP and UNICODE defined. The RunTest completed successfully. It lists a number of checks and tests. These counts are different for the different builds. It will say "File I/O Tests Skipped" if the CMarkupRunTest.xml file is not found. When you to run Markup.exe right out of the zip file you don't want it writing the sample XML file to disk; only if you unzip to a folder will it read and write the CMarkupReadWriteTest.xml file.

    Use the Open and Parse buttons to test a file. After parsing a file the dialog displays the results. Here is an example:

    Load or parse error (after 0 milliseconds)
    1500 bytes to 1033 wide chars, No start tag for end tag 'charset' at offset 335

    A parse error was encountered. The file was loaded though; it was 1500 bytes which were converted to 1033 Unicode wide characters (i.e. 2066 bytes).

    The Test Dialog keeps track of the last file parsed and the dialog screen position for convenience. These settings are stored in Documents and Settings/ User/ Application Data/ firstobject/ CMarkup/ settings.xml.

    Licensing

    The evaluation source code is made available so you can easily integrate and evaluate CMarkup in your applications, and look further into how it works. Simply put, if the source code was not so accessible it would be a less valuable product. Compared to compiled components, source code allows much better and lighter integration in C++ projects, and avoids having the customer depend on us for time-critical modifications.

    If you use the source code in a commercial application that ends up shipping, selling, or otherwise being delivered, you must purchase a Developer License (see CMarkup Evaluation License Agreement). A CMarkup Developer License entitles you to royalty-free use of CMarkup software technologies in your commercial applications. Since the evaluation version presented here does not include some of the methods and features of the developer version (CMarkup Developer), this is another reason to purchase a Developer License.

    For details on purchasing see Products and click on the Purchase link. CMarkup delivers rapid and painless integration of XML into your projects.

  •  

    你可能感兴趣的:(关于VC解析XML的高速类CMarkup)