SAX2 Programming Guide

from http://xerces.apache.org/xerces-c/program-sax2-3.html

 

 

Using the SAX2 API
 

The SAX2 API for XML parsers was originally developed for Java. Please be aware that there is no standard SAX2 API for C++, and that use of the Xerces-C++ SAX2 API does not guarantee client code compatibility with other C++ XML parsers.

The SAX2 API presents a callback based API to the parser. An application that uses SAX2 provides an instance of a handler class to the parser. When the parser detects XML constructs, it calls the methods of the handler class, passing them information about the construct that was detected. The most commonly used handler classes are ContentHandler which is called when XML constructs are recognized, and ErrorHandler which is called when an error occurs. The header files for the various SAX2 handler classes are in the xercesc/sax2/ directory.

As a convenience, Xerces-C++ provides DefaultHandler, a single class which is publicly derived from all the Handler classes. DefaultHandler's default implementation of the handler callback methods is to do nothing. A convenient way to get started with Xerces-C++ is to derive your own handler class from DefaultHandler and override just those methods in HandlerBase which you are interested in customizing. This simple example shows how to create a handler which will print element names, and print fatal error messages. The source code for the sample applications show additional examples of how to write handler classes.

This is the header file MySAX2Handler.hpp:

#include <xercesc/sax2/DefaultHandler.hpp>

class MySAX2Handler : public DefaultHandler {
public:
void startElement(
const XMLCh* const uri,
const XMLCh* const localname,
const XMLCh* const qname,
const Attributes& attrs
);
void fatalError(const SAXParseException&);
};

This is the implementation file MySAX2Handler.cpp:

#include "MySAX2Handler.hpp"
#include <iostream>

using namespace std;

MySAX2Handler::MySAX2Handler()
{
}

void MySAX2Handler::startElement(const XMLCh* const uri,
const XMLCh* const localname,
const XMLCh* const qname,
const Attributes& attrs)
{
char* message = XMLString::transcode(localname);
cout << "I saw element: "<< message << endl;
XMLString::release(&message);
}

void MySAX2Handler::fatalError(const SAXParseException& exception)
{
char* message = XMLString::transcode(exception.getMessage());
cout << "Fatal Error: " << message
<< " at line: " << exception.getLineNumber()
<< endl;
XMLString::release(&message);
}

The XMLCh and Attributes types are supplied by Xerces-C++ and are documented in the API Reference. Examples of their usage appear in the source code to the sample applications.


SAX2XMLReader
 
Constructing an XML Reader
 

In order to use Xerces-C++ SAX2 to parse XML files, you will need to create an instance of the SAX2XMLReader class. The example below shows the code you need in order to create an instance of SAX2XMLReader. The ContentHandler and ErrorHandler instances required by the SAX2 API are provided using the DefaultHandler class supplied with Xerces-C++.

    #include <xercesc/sax2/SAX2XMLReader.hpp>
#include <xercesc/sax2/XMLReaderFactory.hpp>
#include <xercesc/sax2/DefaultHandler.hpp>
#include <xercesc/util/XMLString.hpp>

#include <iostream>

using namespace std;
using namespace xercesc;

int main (int argc, char* args[]) {

try {
XMLPlatformUtils::Initialize();
}
catch (const XMLException& toCatch) {
char* message = XMLString::transcode(toCatch.getMessage());
cout << "Error during initialization! :/n";
cout << "Exception message is: /n"
<< message << "/n";
XMLString::release(&message);
return 1;
}

char* xmlFile = "x1.xml";
SAX2XMLReader* parser = XMLReaderFactory::createXMLReader();
parser->setFeature(XMLUni::fgSAX2CoreValidation, true);
parser->setFeature(XMLUni::fgSAX2CoreNameSpaces, true); // optional

DefaultHandler* defaultHandler = new DefaultHandler();
parser->setContentHandler(defaultHandler);
parser->setErrorHandler(defaultHandler);

try {
parser->parse(xmlFile);
}
catch (const XMLException& toCatch) {
char* message = XMLString::transcode(toCatch.getMessage());
cout << "Exception message is: /n"
<< message << "/n";
XMLString::release(&message);
return -1;
}
catch (const SAXParseException& toCatch) {
char* message = XMLString::transcode(toCatch.getMessage());
cout << "Exception message is: /n"
<< message << "/n";
XMLString::release(&message);
return -1;
}
catch (...) {
cout << "Unexpected Exception /n" ;
return -1;
}

delete parser;
delete defaultHandler;
return 0;
}

Supported Features in SAX2XMLReader
 

The behavior of the SAX2XMLReader is dependant on the values of the following features. All of the features below can be set using the function SAX2XMLReader::setFeature(cons XMLCh* const, const bool). And can be queried using the function bool SAX2XMLReader::getFeature(const XMLCh* const).

None of these features can be modified in the middle of a parse, or an exception will be thrown.

 

SAX2 Features
 
http://xml.org/sax/features/namespaces 
true:  Perform Namespace processing.  
false:  Do not perform Namespace processing.  
default:  true  
XMLUni Predefined Constant:  fgSAX2CoreNameSpaces  
note:  If the validation feature is set to true, then the document must contain a grammar that supports the use of namespaces.  
see:  http://xml.org/sax/features/namespace-prefixes  
see:  http://xml.org/sax/features/validation  
http://xml.org/sax/features/namespace-prefixes 
true:  Report the original prefixed names and attributes used for Namespace declarations.  
false:  Do not report attributes used for Namespace declarations, and optionally do not report original prefixed names.  
default:  false  
XMLUni Predefined Constant:  fgSAX2CoreNameSpacePrefixes  
http://xml.org/sax/features/validation 
true:  Report all validation errors.  
false:  Do not report validation errors.  
default:  false  
XMLUni Predefined Constant:  fgSAX2CoreValidation  
note:  If this feature is set to true, the document must specify a grammar. If this feature is set to false and document specifies a grammar, that grammar might be parsed but no validation of the document contents will be performed.  
see:  http://apache.org/xml/features/validation/dynamic  
see:  http://apache.org/xml/features/nonvalidating/load-external-dtd  

Xerces Features
 
http://apache.org/xml/features/validation/dynamic 
true:  The parser will validate the document only if a grammar is specified. (http://xml.org/sax/features/validation must be true).  
false:  Validation is determined by the state of the http://xml.org/sax/features/validation feature.  
default:  false  
XMLUni Predefined Constant:  fgXercesDynamic  
see:  http://xml.org/sax/features/validation  
http://apache.org/xml/features/validation/schema 
true:  Enable the parser's schema support.  
false:  Disable the parser's schema support.  
default:  true  
XMLUni Predefined Constant:  fgXercesSchema  
note  If set to true, namespace processing must also be turned on.  
see:  http://xml.org/sax/features/namespaces  
http://apache.org/xml/features/validation/schema-full-checking 
true:  Enable full schema constraint checking, including checking which may be time-consuming or memory intensive. Currently, particle unique attribution constraint checking and particle derivation restriction checking are controlled by this option.  
false:  Disable full schema constraint checking.  
default:  false  
XMLUni Predefined Constant:  fgXercesSchemaFullChecking  
note:  This feature checks the schema grammar itself for additional errors that are time-consuming or memory intensive. It does not affect the level of checking performed on document instances that use schema grammars.  
see:  http://apache.org/xml/features/validation/schema  
http://apache.org/xml/features/validating/load-schema 
true:  Load the schema.  
false:  Don't load the schema if it wasn't found in the grammar pool.  
default:  true  
XMLUni Predefined Constant:  fgXercesLoadSchema  
note:  This feature is ignored and no schemas are loaded if schema processing is disabled.  
see:  http://apache.org/xml/features/validation/schema  
http://apache.org/xml/features/nonvalidating/load-external-dtd 
true:  Load the external DTD.  
false:  Ignore the external DTD completely.  
default:  true  
XMLUni Predefined Constant:  fgXercesLoadExternalDTD  
note  This feature is ignored and DTD is always loaded when validation is on.  
see:  http://xml.org/sax/features/validation  
http://apache.org/xml/features/continue-after-fatal-error 
true:  Attempt to continue parsing after a fatal error.  
false:  Stops parse on first fatal error.  
default:  false  
XMLUni Predefined Constant:  fgXercesContinueAfterFatalError  
note:  The behavior of the parser when this feature is set to true is undetermined! Therefore use this feature with extreme caution because the parser may get stuck in an infinite loop or worse.  
http://apache.org/xml/features/validation-error-as-fatal 
true:  The parser will treat validation error as fatal and will exit depends on the state of http://apache.org/xml/features/continue-after-fatal-error.  
false:  The parser will report the error and continue processing.  
default:  false  
XMLUni Predefined Constant:  fgXercesValidationErrorAsFatal  
note:  Setting this true does not mean the validation error will be printed with the word "Fatal Error". It is still printed as "Error", but the parser will exit if http://apache.org/xml/features/continue-after-fatal-error is set to false.  
see:  http://apache.org/xml/features/continue-after-fatal-error  
http://apache.org/xml/features/validation/use-cachedGrammarInParse 
true:  Use cached grammar if it exists in the pool. 
false:  Parse the schema grammar. 
default:  false  
XMLUni Predefined Constant:  fgXercesUseCachedGrammarInParse  
note:  If http://apache.org/xml/features/validation/cache-grammarFromParse is enabled, this feature is set to true automatically and any setting to this feature by the user is a no-op. 
see:  http://apache.org/xml/features/validation/cache-grammarFromParse  
http://apache.org/xml/features/validation/cache-grammarFromParse 
true:  Cache the grammar in the pool for re-use in subsequent parses. 
false:  Do not cache the grammar in the pool 
default:  false  
XMLUni Predefined Constant:  fgXercesCacheGrammarFromParse  
note:  If set to true, the http://apache.org/xml/features/validation/use-cachedGrammarInParse is also set to true automatically. 
see:  http://apache.org/xml/features/validation/use-cachedGrammarInParse  
http://apache.org/xml/features/standard-uri-conformant 
true:  Force standard uri conformance.  
false:  Do not force standard uri conformance.  
default:  false  
XMLUni Predefined Constant:  fgXercesStandardUriConformant  
note:  If set to true, malformed uri will be rejected and fatal error will be issued.  
http://apache.org/xml/features/calculate-src-ofs 
true:  Enable src offset calculation.  
false:  Disable src offset calculation.  
default:  false  
XMLUni Predefined Constant:  fgXercesCalculateSrcOfs  
note:  If set to true, the user can inquire about the current src offset within the input source. Setting it to false (default) improves the performance. 
http://apache.org/xml/features/validation/identity-constraint-checking 
true:  Enable identity constraint checking.  
false:  Disable identity constraint checking.  
default:  true  
XMLUni Predefined Constant:  fgXercesIdentityConstraintChecking  
http://apache.org/xml/features/generate-synthetic-annotations 
true:  Enable generation of synthetic annotations. A synthetic annotation will be generated when a schema component has non-schema attributes but no child annotation.  
false:  Disable generation of synthetic annotations.  
default:  false  
XMLUni Predefined Constant:  fgXercesGenerateSyntheticAnnotations  
http://apache.org/xml/features/validate-annotations 
true:  Enable validation of annotations.  
false:  Disable validation of annotations.  
default:  false  
XMLUni Predefined Constant:  fgXercesValidateAnnotations  
note:  Each annotation is validated independently.  
http://apache.org/xml/features/schema/ignore-annotations 
true:  Do not generate XSAnnotations when traversing a schema. 
false:  Generate XSAnnotations when traversing a schema. 
default:  false  
XMLUni Predefined Constant:  fgXercesIgnoreAnnotations  
http://apache.org/xml/features/disable-default-entity-resolution 
true:  The parser will not attempt to resolve the entity when the resolveEntity method returns NULL. 
false:  The parser will attempt to resolve the entity when the resolveEntity method returns NULL. 
default:  false  
XMLUni Predefined Constant:  fgXercesDisableDefaultEntityResolution  
http://apache.org/xml/features/validation/schema/skip-dtd-validation 
true:  When schema validation is on the parser will ignore the DTD, except for entities. 
false:  The parser will not ignore DTDs when validating. 
default:  false  
XMLUni Predefined Constant:  fgXercesSkipDTDValidation  
see:  Schema Validation 
http://apache.org/xml/features/validation/ignoreCachedDTD 
true:  Ignore a cached DTD when an XML document contains both an internal and external DTD, and the use cached grammar from parse option is enabled. Currently, we do not allow using cached DTD grammar when an internal subset is present in the document. This option will only affect the behavior of the parser when an internal and external DTD both exist in a document (i.e. no effect if document has no internal subset). 
false:  Don't ignore cached DTD.  
default:  false  
XMLUni Predefined Constant:  fgXercesIgnoreCachedDTD  
see:  http://apache.org/xml/features/validation/use-cachedGrammarInParse  
http://apache.org/xml/features/validation/schema/handle-multiple-imports 
true:  During schema validation allow multiple schemas with the same namespace to be imported. 
false:  Don't import multiple schemas with the same namespace.  
default:  false  
XMLUni Predefined Constant:  fgXercesHandleMultipleImports  


Supported Properties in SAX2XMLReader
 

The behavior of the SAX2XMLReader is dependant on the values of the following properties. All of the properties below can be set using the function SAX2XMLReader::setProperty(const XMLCh* const, void*). It takes a void pointer as the property value. Application is required to initialize this void pointer to a correct type. Please check the column "Value Type" below to learn exactly what type of property value each property expects for processing. Passing a void pointer that was initialized with a wrong type will lead to unexpected result. If the same property is set more than once, the last one takes effect.

Property values can be queried using the function void* SAX2XMLReader::getProperty(const XMLCh* const). The parser owns the returned pointer, and the memory allocated for the returned pointer will be destroyed when the parser is deleted. To ensure accessibility of the returned information after the parser is deleted, callers need to copy and store the returned information somewhere else. Since the returned pointer is a generic void pointer, check the column "Value Type" below to learn exactly what type of object each property returns for replication.

None of these properties can be modified in the middle of a parse, or an exception will be thrown.

Xerces Properties
 
http://apache.org/xml/properties/schema/external-schemaLocation 
Description  The XML Schema Recommendation explicitly states that the inclusion of schemaLocation/ noNamespaceSchemaLocation attributes in the instance document is only a hint; it does not mandate that these attributes must be used to locate schemas. Similar situation happens to <import> element in schema documents. This property allows the user to specify a list of schemas to use. If the targetNamespace of a schema specified using this method matches the targetNamespace of a schema occurring in the instance document in schemaLocation attribute, or if the targetNamespace matches the namespace attribute of <import> element, the schema specified by the user using this property will be used (i.e., the schemaLocation attribute in the instance document or on the <import> element will be effectively ignored).  
Value  The syntax is the same as for schemaLocation attributes in instance documents: e.g, "http://www.example.com file_name.xsd". The user can specify more than one XML Schema in the list.  
Value Type  XMLCh*  
XMLUni Predefined Constant:  fgXercesSchemaExternalSchemaLocation  
http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation 
Description  The XML Schema Recommendation explicitly states that the inclusion of schemaLocation/ noNamespaceSchemaLocation attributes in the instance document is only a hint; it does not mandate that these attributes must be used to locate schemas. This property allows the user to specify the no target namespace XML Schema Location externally. If specified, the instance document's noNamespaceSchemaLocation attribute will be effectively ignored.  
Value  The syntax is the same as for the noNamespaceSchemaLocation attribute that may occur in an instance document: e.g."file_name.xsd".  
Value Type  XMLCh*  
XMLUni Predefined Constant:  fgXercesSchemaExternalNoNameSpaceSchemaLocation  
http://apache.org/xml/properties/scannerName 
Description  This property allows the user to specify the name of the XMLScanner to use for scanning XML documents. If not specified, the default scanner "IGXMLScanner" is used. 
Value  The recognized scanner names are:
1."WFXMLScanner" - scanner that performs well-formedness checking only.
2. "DGXMLScanner" - scanner that handles XML documents with DTD grammar information.
3. "SGXMLScanner" - scanner that handles XML documents with XML schema grammar information.
4. "IGXMLScanner" - scanner that handles XML documents with DTD or/and XML schema grammar information.
Users can use the predefined constants defined in XMLUni directly (fgWFXMLScanner, fgDGXMLScanner, fgSGXMLScanner, or fgIGXMLScanner) or a string that matches the value of one of those constants. 
Value Type  XMLCh*  
XMLUni Predefined Constant:  fgXercesScannerName  
note:   See Use Specific Scanner for more programming details.  
http://apache.org/xml/properties/security-manager 
Description  Certain valid XML and XML Schema constructs can force a processor to consume more system resources than an application may wish. In fact, certain features could be exploited by malicious document writers to produce a denial-of-service attack. This property allows applications to impose limits on the amount of resources the processor will consume while processing these constructs.  
Value  An instance of the SecurityManager class (see xercesc/util/SecurityManager). This class's documentation describes the particular limits that may be set. Note that, when instantiated, default values for limits that should be appropriate in most settings are provided. The default implementation is not thread-safe; if thread-safety is required, the application should extend this class, overriding methods appropriately. The parser will not adopt the SecurityManager instance; the application is responsible for deleting it when it is finished with it. If no SecurityManager instance has been provided to the parser (the default) then processing strictly conforming to the relevant specifications will be performed.  
Value Type  SecurityManager*  
XMLUni Predefined Constant:  fgXercesSecurityManager  
setInputBufferSize(const size_t bufferSize) 
Description  Set maximum input buffer size. This method allows users to limit the size of buffers used in parsing XML character data. The effect of setting this size is to limit the size of a ContentHandler::characters() call. The parser's default input buffer size is 1 megabyte.  
Value  The maximum input buffer size  
Value Type  XMLCh*  

你可能感兴趣的:(exception,xml,schema,validation,features,attributes)