XML:XML Schema Part 0: Primer Second Edition

W3C

XML Schema Part 0: Primer Second EditionW3C Recommendation http://www.w3.org/TR/xmlschema-0/

<!--* <h2><a id="w3c-doctype" name="w3c-doctype" />W3C Recommendation 2 May 2001, Second Edition 28 October 2004</h2> *-->
Latest version:
Editors:
David C. Fallside, IBM <[email protected]>
Priscilla Walmsley <[email protected]> - Second Edition

AbstractXML Schema Part 1: Structures and XML Schema Part 2: Datatypes provide the complete normative description of the XML Schema language. This primer describes the language features through numerous examples which are complemented by extensive references to the normative texts.

XML Schema Part 0: Primer is a non-normative document intended to provide an easily readable description of the XML Schema facilities, and is oriented towards quickly understanding how to create schemas using the XML Schema language.

Status of this DocumentW3C technical reports index at http://www.w3.org/TR/.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the

This is a W3C Recommendation, the first part of the Second Edition of XML Schema. This document has been reviewed by W3C Members and other interested parties and has been endorsed by the Director as a W3C Recommendation. It is a stable document and may be used as reference material. W3C's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionality and interoperability of the Web.

This document has been produced by the W3C XML Schema Working Group as part of the W3C XML Activity. The goals of the XML Schema language are discussed in the XML Schema Requirements document. The authors of this document are the members of the XML Schema Working Group. Different parts of this specification have different editors.

This document was produced under the 24 January 2002 Current Patent Practice (CPP) as amended by the W3C Patent Policy Transition Procedure. The Working Group maintains a public list of patent disclosures relevant to this document; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) with respect to this specification should disclose the information in accordance with section 6 of the W3C Patent Policy.

The English version of this specification is the only normative version. Information about translations of this document is available at http://www.w3.org/2001/05/xmlschema-translations.

This second edition is not a new version, it merely incorporates the changes dictated by the corrections to errors found in the first edition as agreed by the XML Schema Working Group, as a convenience to readers. A separate list of all such corrections is available at http://www.w3.org/2001/05/xmlschema-errata.

The errata list for this second edition is available at http://www.w3.org/2004/03/xmlschema-errata.

Please report errors in this document to [email protected] (archive).


1 Introduction1 and 2 of the XML Schema specification. The intended audience of this document includes application developers whose programs read and write schema documents, and schema authors who need to know about the features of the language, especially features that provide functionality above and beyond what is provided by DTDs. The text assumes that you have a basic understanding of XML 1.0 and Namespaces in XML. Each major section of the primer introduces new features of the language, and describes those features in the context of concrete examples.

This document, XML Schema Part 0: Primer, provides an easily approachable description of the XML Schema definition language, and should be used alongside the formal descriptions of the language contained in Parts

Basic Concepts: The Purchase Order (§2) covers the basic mechanisms of XML Schema. It describes how to declare the elements and attributes that appear in XML documents, the distinctions between simple and complex types, defining complex types, the use of simple types for element and attribute values, schema annotation, a simple mechanism for re-using element and attribute definitions, and nil values.

Advanced Concepts I: Namespaces, Schemas & Qualification (§3), the first advanced section in the primer, explains the basics of how namespaces are used in XML and schema documents. This section is important for understanding many of the topics that appear in the other advanced sections.

Advanced Concepts II: The International Purchase Order (§4), the second advanced section in the primer, describes mechanisms for deriving types from existing types, and for controlling these derivations. The section also describes mechanisms for merging together fragments of a schema from multiple sources, and for element substitution.

Advanced Concepts III: The Quarterly Report (§5) covers more advanced features, including a mechanism for specifying uniqueness among attributes and elements, a mechanism for using types across namespaces, a mechanism for extending types based on namespaces, and a description of how documents are checked for conformance.

In addition to the sections just described, the primer contains a number of appendices that provide detailed reference information on simple types and a regular expression language.

The primer is a non-normative document, which means that it does not provide a definitive (from the W3C's point of view) specification of the XML Schema language. The examples and other explanatory material in this document are provided to help you understand XML Schema, but they may not always provide definitive answers. In such cases, you will need to refer to the XML Schema specification, and to help you do this, we provide many links pointing to the relevant parts of the specification. More specifically, XML Schema items mentioned in the primer text are linked to an index [Index (§E)] of element names and attributes, and a summary table of datatypes, both in the primer. The table and the index contain links to the relevant sections of XML Schema parts 1 and 2.

2 Basic Concepts: The Purchase Orderpo.xml

The purpose of a schema is to define a class of XML documents, and so the term "instance document" is often used to describe an XML document that conforms to a particular schema. In fact, neither instances nor schemas need to exist as documents per se -- they may exist as streams of bytes sent between applications, as fields in a database record, or as collections of XML Infoset "Information Items" -- but to simplify the primer, we have chosen to always refer to instances and schemas as if they are documents and files.

Let us start by considering an instance document in a file called . It describes a purchase order generated by a home products ordering and billing application:

The purchase order consists of a main element, purchaseOrder, and the subelements shipTo, billTo, comment, and items. These subelements (except comment) in turn contain other subelements, and so on, until a subelement such as USPrice contains a number rather than any subelements. Elements that contain subelements or carry attributes are said to have complex types, whereas elements that contain numbers (and strings, and dates, etc.) but do not contain any subelements are said to have simple types. Some elements have attributes; attributes always have simple types.

The complex types in the instance document, and some of the simple types, are defined in the schema for purchase orders. The other simple types are defined as part of XML Schema's repertoire of built-in simple types.

Before going on to examine the purchase order schema, we digress briefly to mention the association between the instance document and the purchase order schema. As you can see by inspecting the instance document, the purchase order schema is not mentioned. An instance is not actually required to reference a schema, and although many will, we have chosen to keep this first section simple, and to assume that any processor of the instance document can obtain the purchase order schema without any information from the instance document. In later sections, we will introduce explicit mechanisms for associating instances and schemas.

2.1 The Purchase Order Schemapo.xsd

The purchase order schema is contained in the file :

Example
The Purchase Order Schema, po.xsd
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <xsd:annotation>
    <xsd:documentation xml:lang="en">
     Purchase order schema for Example.com.
     Copyright 2000 Example.com. All rights reserved.
    </xsd:documentation>
  </xsd:annotation>

  <xsd:element name="purchaseOrder" type="PurchaseOrderType"/>

  <xsd:element name="comment" type="xsd:string"/>

  <xsd:complexType name="PurchaseOrderType">
    <xsd:sequence>
      <xsd:element name="shipTo" type="USAddress"/>
      <xsd:element name="billTo" type="USAddress"/>
      <xsd:element ref="comment" minOccurs="0"/>
      <xsd:element name="items"  type="Items"/>
    </xsd:sequence>
    <xsd:attribute name="orderDate" type="xsd:date"/>
  </xsd:complexType>

  <xsd:complexType name="USAddress">
    <xsd:sequence>
      <xsd:element name="name"   type="xsd:string"/>
      <xsd:element name="street" type="xsd:string"/>
      <xsd:element name="city"   type="xsd:string"/>
      <xsd:element name="state"  type="xsd:string"/>
      <xsd:element name="zip"    type="xsd:decimal"/>
    </xsd:sequence>
    <xsd:attribute name="country" type="xsd:NMTOKEN"
                   fixed="US"/>
  </xsd:complexType>

  <xsd:complexType name="Items">
    <xsd:sequence>
      <xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
        <xsd:complexType>
          <xsd:sequence>
            <xsd:element name="productName" type="xsd:string"/>
            <xsd:element name="quantity">
              <xsd:simpleType>
                <xsd:restriction base="xsd:positiveInteger">
                  <xsd:maxExclusive value="100"/>
                </xsd:restriction>
              </xsd:simpleType>
            </xsd:element>
            <xsd:element name="USPrice"  type="xsd:decimal"/>
            <xsd:element ref="comment"   minOccurs="0"/>
            <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
          </xsd:sequence>
          <xsd:attribute name="partNum" type="SKU" use="required"/>
        </xsd:complexType>
      </xsd:element>
    </xsd:sequence>
  </xsd:complexType>

  <!-- Stock Keeping Unit, a code for identifying products -->
  <xsd:simpleType name="SKU">
    <xsd:restriction base="xsd:string">
      <xsd:pattern value="\d{3}-[A-Z]{2}"/>
    </xsd:restriction>
  </xsd:simpleType>

</xsd:schema>
schema

The purchase order schema consists of a element and a variety of subelements, most notably element, complexType, and simpleType which determine the appearance of elements and their content in instance documents.

Each of the elements in the schema has a prefix xsd: which is associated with the XML Schema namespace through the declaration, xmlns:xsd="http://www.w3.org/2001/XMLSchema", that appears in the schema element. The prefix xsd: is used by convention to denote the XML Schema namespace, although any prefix can be used. The same prefix, and hence the same association, also appears on the names of built-in simple types, e.g. xsd:string. The purpose of the association is to identify the elements and simple types as belonging to the vocabulary of the XML Schema language rather than the vocabulary of the schema author. For the sake of clarity in the text, we just mention the names of elements and simple types (e.g. simpleType), and omit the prefix.

previous sub-section next sub-section2.2 Complex Type Definitions, Element & Attribute DeclarationsOccurrence Constraints
2.2.2 Global Elements & Attributes
2.2.3 Naming Conflicts

2.2.1

In XML Schema, there is a basic difference between complex types which allow elements in their content and may carry attributes, and simple types which cannot have element content and cannot carry attributes. There is also a major distinction between definitions which create new types (both simple and complex), and declarations which enable elements and attributes with specific names and types (both simple and complex) to appear in document instances. In this section, we focus on defining complex types and declaring the elements and attributes that appear within them.

New complex types are defined using the complexType element and such definitions typically contain a set of element declarations, element references, and attribute declarations. The declarations are not themselves types, but rather an association between a name and the constraints which govern the appearance of that name in documents governed by the associated schema. Elements are declared using the element element, and attributes are declared using the attribute element. For example, USAddress is defined as a complex type, and within the definition of USAddress we see five element declarations and one attribute declaration:

The consequence of this definition is that any element appearing in an instance whose type is declared to be USAddress (e.g. shipTo in ) must consist of five elements and one attribute. These elements must be called name, street, city, state and zip as specified by the values of the declarations' name attributes, and the elements must appear in the same sequence (order) in which they are declared. The first four of these elements will each contain a string, and the fifth will contain a number. The element whose type is declared to be USAddress may appear with an attribute called country which must contain the string US.

The USAddress definition contains only declarations involving the simple types: string, decimal and NMTOKEN. In contrast, the PurchaseOrderType definition contains element declarations involving complex types, e.g. USAddress, although note that both declarations use the same type attribute to identify the type, regardless of whether the type is simple or complex.

In defining PurchaseOrderType, two of the element declarations, for shipTo and billTo, associate different element names with the same complex type, namely USAddress. The consequence of this definition is that any element appearing in an instance document (e.g. ) whose type is declared to be PurchaseOrderType must consist of elements named shipTo and billTo, each containing the five subelements (name, street, city, state and zip) that were declared as part of USAddress. The shipTo and billTo elements may also carry the country attribute that was declared as part of USAddress.

The PurchaseOrderType definition contains an orderDate attribute declaration which, like the country attribute declaration, identifies a simple type. In fact, all attribute declarations must reference simple types because, unlike element declarations, attributes cannot contain other elements or other attributes.

The element declarations we have described so far have each associated a name with an existing type definition. Sometimes it is preferable to use an existing element rather than declare a new element, for example:

Example
<xsd:element ref="comment" minOccurs="0"/>

This declaration references an existing element, comment, that was declared elsewhere in the purchase order schema. In general, the value of the ref attribute must reference a global element, i.e. one that has been declared under schema rather than as part of a complex type definition. The consequence of this declaration is that an element called comment may appear in an instance document, and its content must be consistent with that element's type, in this case, string.

2.2.1 Occurrence ConstraintsminOccurs

The comment element is optional within PurchaseOrderType because the value of the attribute in its declaration is 0. In general, an element is required to appear when the value of minOccurs is 1 or more. The maximum number of times an element may appear is determined by the value of a maxOccurs attribute in its declaration. This value may be a positive integer such as 41, or the term unbounded to indicate there is no maximum number of occurrences. The default value for both the minOccurs and the maxOccurs attributes is 1. Thus, when an element such as comment is declared without a maxOccurs attribute, the element may not occur more than once. Be sure that if you specify a value for only the minOccurs attribute, it is less than or equal to the default value of maxOccurs, i.e. it is 0 or 1. Similarly, if you specify a value for only the maxOccurs attribute, it must be greater than or equal to the default value of minOccurs, i.e. 1 or more. If both attributes are omitted, the element must appear exactly once.

Attributes may appear once or not at all, but no other number of times, and so the syntax for specifying occurrences of attributes is different than the syntax for elements. In particular, attributes can be declared with a use attribute to indicate whether the attribute is required (see for example, the partNum attribute declaration in po.xsd), optional, or even prohibited.

Default values of both attributes and elements are declared using the default attribute, although this attribute has a slightly different consequence in each case. When an attribute is declared with a default value, the value of the attribute is whatever value appears as the attribute's value in an instance document; if the attribute does not appear in the instance document, the schema processor provides the attribute with a value equal to that of the default attribute. Note that default values for attributes only make sense if the attributes themselves are optional, and so it is an error to specify both a default value and anything other than a value of optional for use.

The schema processor treats defaulted elements slightly differently. When an element is declared with a default value, the value of the element is whatever value appears as the element's content in the instance document; if the element appears without any content, the schema processor provides the element with a value equal to that of the default attribute. However, if the element does not appear in the instance document, the schema processor does not provide the element at all. In summary, the differences between element and attribute defaults can be stated as: Default attribute values apply when attributes are missing, and default element values apply when elements are empty.

The fixed attribute is used in both attribute and element declarations to ensure that the attributes and elements are set to particular values. For example, po.xsd contains a declaration for the country attribute, which is declared with a fixed value US. This declaration means that the appearance of a country attribute in an instance document is optional (the default value of use is optional), although if the attribute does appear, its value must be US, and if the attribute does not appear, the schema processor will provide a country attribute with the value US. Note that the concepts of a fixed value and a default value are mutually exclusive, and so it is an error for a declaration to contain both fixed and default attributes.

The values of the attributes used in element and attribute declarations to constrain their occurrences are summarized in Table 1.

Table 1. Occurrence Constraints for Elements and Attributes
minOccurs maxOccurs fixed default Elements (, ) ,
use fixed default Attributes , ,
Notes (1, 1) -, - required, -, - element/attribute must appear once, it may have any value (1, 1) 37, - required, 37, - element/attribute must appear once, its value must be 37 (2, unbounded) 37, - n/a element must appear twice or more, its value must be 37; in general, minOccurs and maxOccurs values may be positive integers, and maxOccurs value may also be "unbounded" (0, 1) -, - optional, -, - element/attribute may appear once, it may have any value (0, 1) 37, - n/a element may appear once, if it does not appear it is not provided; if it does appear and it is empty, its value is 37; if it does appear and it is not empty, its value must be 37 n/a optional, 37, - attribute may appear once, if it does appear its value must be 37, if it does not appear its value is 37 (0, 1) -, 37 n/a element may appear once; if it does not appear it is not provided; if it does appear and it is empty, its value is 37; otherwise its value is that given n/a optional, -, 37 attribute may appear once; if it does not appear its value is 37, otherwise its value is that given (0, 2) -, 37 n/a element may appear once, twice, or not at all; if the element does not appear it is not provided; if it does appear and it is empty, its value is 37; otherwise its value is that given; in general, minOccurs and maxOccurs values may be positive integers, and maxOccurs value may also be "unbounded" (0, 0) -, - prohibited, -, - element/attribute must not appear Note that neither minOccurs, maxOccurs, nor use may appear in the declarations of global elements and attributes. 2.2.2 Global Elements & Attributes schema

Global elements, and global attributes, are created by declarations that appear as the children of the element. Once declared, a global element or a global attribute can be referenced in one or more declarations using the ref attribute as described above. A declaration that references a global element enables the referenced element to appear in the instance document in the context of the referencing declaration. So, for example, the comment element appears in po.xml at the same level as the shipTo, billTo and items elements because the declaration that references comment appears in the complex type definition at the same level as the declarations of the other three elements.

The declaration of a global element also enables the element to appear at the top-level of an instance document. Hence purchaseOrder, which is declared as a global element in po.xsd, can appear as the top-level element in po.xml. Note that this rationale will also allow a comment element to appear as the top-level element in a document like po.xml.

There are a number of caveats concerning the use of global elements and attributes. One caveat is that global declarations cannot contain references; global declarations must identify simple and complex types directly. Put concretely, global declarations cannot contain the ref attribute, they must use the type attribute (or, as we describe shortly, be followed by an anonymous type definition). A second caveat is that cardinality constraints cannot be placed on global declarations, although they can be placed on local declarations that reference global declarations. In other words, global declarations cannot contain the attributes minOccurs, maxOccurs, or use.

2.2.3 Naming Conflictsprevious sub-section next sub-section

We have now described how to define new complex types (e.g. PurchaseOrderType), declare elements (e.g. purchaseOrder) and declare attributes (e.g. orderDate). These activities generally involve naming, and so the question naturally arises: What happens if we give two things the same name? The answer depends upon the two things in question, although in general the more similar are the two things, the more likely there will be a conflict.

Here are some examples to illustrate when same names cause problems. If the two things are both types, say we define a complex type called USStates and a simple type called USStates, there is a conflict. If the two things are a type and an element or attribute, say we define a complex type called USAddress and we declare an element called USAddress, there is no conflict. If the two things are elements within different types (i.e. not global elements), say we declare one element called name as part of the USAddress type and a second element called name as part of the Item type, there is no conflict. (Such elements are sometimes called local element declarations.) Finally, if the two things are both types and you define one and XML Schema has defined the other, say you define a simple type called decimal, there is no conflict. The reason for the apparent contradiction in the last example is that the two types belong to different namespaces. We explore the use of namespaces in schema in a later section.

2.3 Simple TypesList Types
2.3.2 Union Types

2.3.1

The purchase order schema declares several elements and attributes that have simple types. Some of these simple types, such as string and decimal, are built in to XML Schema, while others are derived from the built-in's. For example, the partNum attribute has a type called SKU (Stock Keeping Unit) that is derived from string. Both built-in simple types and their derivations can be used in all element and attribute declarations. Table 2 lists all the simple types built in to XML Schema, along with examples of the different types.

Table 2. Simple Types Built In to XML Schema Simple Type Examples (delimited by commas) Notes
string Confirm this is electric
normalizedString Confirm this is electric see (3)
token Confirm this is electric see (4)
base64Binary GpM7
hexBinary 0FB7
integer ...-1, 0, 1, ... see (2)
positiveInteger 1, 2, ... see (2)
negativeInteger ... -2, -1 see (2)
nonNegativeInteger 0, 1, 2, ... see (2)
nonPositiveInteger ... -2, -1, 0 see (2)
long -9223372036854775808, ... -1, 0, 1, ... 9223372036854775807 see (2)
unsignedLong 0, 1, ... 18446744073709551615 see (2)
int -2147483648, ... -1, 0, 1, ... 2147483647 see (2)
unsignedInt 0, 1, ...4294967295 see (2)
short -32768, ... -1, 0, 1, ... 32767 see (2)
unsignedShort 0, 1, ... 65535 see (2)
byte -128, ...-1, 0, 1, ... 127 see (2)
unsignedByte 0, 1, ... 255 see (2)
decimal -1.23, 0, 123.4, 1000.00 see (2)
float -INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN equivalent to single-precision 32-bit floating point, NaN is "not a number", see (2)
double -INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN equivalent to double-precision 64-bit floating point, see (2)
boolean true, false, 1, 0
duration P1Y2M3DT10H30M12.3S 1 year, 2 months, 3 days, 10 hours, 30 minutes, and 12.3 seconds
dateTime 1999-05-31T13:20:00.000-05:00 May 31st 1999 at 1.20pm Eastern Standard Time which is 5 hours behind Co-Ordinated Universal Time, see (2)
date 1999-05-31 see (2)
time 13:20:00.000, 13:20:00.000-05:00 see (2)
gYear 1999 1999, see (2) (5)
gYearMonth 1999-02 the month of February 1999, regardless of the number of days, see (2) (5)
gMonth --05 May, see (2) (5)
gMonthDay --05-31 every May 31st, see (2) (5)
gDay ---31 the 31st day, see (2) (5)
Name shipTo XML 1.0 Name type
QName po:USAddress XML Namespace QName
NCName USAddress XML Namespace NCName, i.e. a QName without the prefix and colon
anyURI
http://www.example.com/,
http://www.example.com/doc.html#ID5
language en-GB, en-US, fr valid values for xml:lang as defined in XML 1.0
ID XML 1.0 ID attribute type, see (1)
IDREF XML 1.0 IDREF attribute type, see (1)
IDREFS XML 1.0 IDREFS attribute type, see (1)
ENTITY XML 1.0 ENTITY attribute type, see (1)
ENTITIES XML 1.0 ENTITIES attribute type, see (1)
NOTATION XML 1.0 NOTATION attribute type, see (1)
NMTOKEN
US,
Brésil
XML 1.0 NMTOKEN attribute type, see (1)
NMTOKENS
US UK,
Brésil Canada Mexique
XML 1.0 NMTOKENS attribute type, i.e. a whitespace separated list of NMTOKEN's, see (1)
Notes: (1) To retain compatibility between XML Schema and XML 1.0 DTDs, the simple types ID, IDREF, IDREFS, ENTITY, ENTITIES, NOTATION, NMTOKEN, NMTOKENS should only be used in attributes. (2) A value of this type can be represented by more than one lexical format, e.g. 100 and 1.0E2 are both valid float formats representing "one hundred". However, rules have been established for this type that define a canonical lexical format, see XML Schema Part 2. (3) Newline, tab and carriage-return characters in a normalizedString type are converted to space characters before schema processing. (4) As normalizedString, and adjacent space characters are collapsed to a single space character, and leading and trailing spaces are removed. (5) The "g" prefix signals time periods in the Gregorian calendar.

New simple types are defined by deriving them from existing simple types (built-in's and derived). In particular, we can derive a new simple type by restricting an existing simple type, in other words, the legal range of values for the new type are a subset of the existing type's range of values. We use the element to define and name the new simple type. We use the

restriction element to indicate the existing (base) type, and to identify the "facets" that constrain the range of values. A complete list of facets is provided in Appendix B.

Suppose we wish to create a new type of integer called myInteger whose range of values is between 10000 and 99999 (inclusive). We base our definition on the built-in simple type integer, whose range of values also includes integers less than 10000 and greater than 99999. To define myInteger, we restrict the range of the integer base type by employing two facets called minInclusive and maxInclusive:

The example shows one particular combination of a base type and two facets used to define myInteger, but a look at the list of built-in simple types and their facets (

The purchase order schema contains another, more elaborate, example of a simple type definition. A new simple type called SKU is derived (by restriction) from the simple type <a hr

分享到:
评论
wapysun
  • 浏览: 4869231 次
  • 性别: Icon_minigender_1
  • 来自: 杭州
社区版块
存档分类
最新评论

你可能感兴趣的:(xml,IBM)