XML学习笔记-第一章 XML知识初始化

XML stands for Extensible Markup Language.

XML, like HTML, is based on the granddaddy of all markup
languages, Standard Generalized Markup Language (SGML).

SGML was created in 1974 as part of an IBM
document-sharing project, and officially became an International Organization for Standardization (ISO) standard in 1986, long before the Internet or anything like it was operational.

In 1998 the World Wide Web Consortium (W3C) met this need by combining the
basic features that separate data from format in SGML with extension of the HTML
tag formats that were adapted for the Web and came up with the first Extensible
Markup Language (XML) Recommendation. The three pillars of XML are
Extensibility, Structure, and Validity.

The original XML data validation
standard is called Data Type Definition (DTD), and the more recent evolution of
XML data validation is the XML Schema standard.

XML is not data integration. It’s simply the
glue that holds data integration solutions together with a multi-platform “lowest
common denominator” for data transportation.

XML is not HTML. While HTML is designed to describe display characteristics
of data on a Web page to browsers, XML is designed to represent data structures.
XML data can be transformed into HTML using Extensible Style Sheet
Transformations (XSLT).

XML documents that meet W3C XML document formatting recommendations
are described as being well-formed XML documents.

Element names can containletters, numbers, hyphens, underscores, periods, and colons when namespacesare used (more on namespaces later). Element names cannot contain spaces;

Element names can start with a letter,underscore, or colon, but cannot start with other non-alphabetic characters or a number, or the letters xml.The basic rules and guidelines for elements apply to attributes as well;

An <?xml?> element  is called an XML document declaration. An XML document declaration is an optional
element that is useful to determine the version of XML and the encoding type of the source data. It is not a required element for an XML document to be well formed in the W3C XML 1.0 specification.

UTF stands for Universal Character Set Transformation Format, and the number 8 or
16 refers to the number of bits that the character is stored in. in fact, an XML document that does not specify an encoding type must adhere to either UTF-8 or UTF-16 to be considered a well-formed XML 1.0 document.

Aside from UTF declarations for XML document encoding, any ISO registered
charset name that is registered by the Internet Assigned Numbers Authority (IANA)
is an acceptable substitute.

Root element must be first in the list and unique in the document. Quotes must be used on all attribute names. Comments should always follow the SGML comment tag format.

Namespaces are a method for separating and identifying duplicate XML element
names in an XML document. A reserved xmlns: prefix is used when declaring a namespace
name and value.The value of the attribute provides the unique identifier for the namespace. Once the namespace is declared, the namespace name can be used as a prefix in element names

Namespace declarations are recommended if your XML documents have any current or
future potential of being shared with other XML documents that may share the
same element names.

A well-formed XML document that meets all of the requirements of one or more specifications
is called a valid XML Document.

DTDs are in fact non-well-formed XML documents.DTD, the element and attribute declarations do not have to be in the same order as the element and attributes that they
represent.

W3C Schemas follows the rules of well-formed XML documents. the element and attribute declarations in a W3C Schema do not have to be in the same order as the element and attributes that they represent
in an XML document.

Special characters in a well-formed XML document can be referenced via a declared entity, Unicode, or hex character reference. Entity references must start with an ampersand (&), Unicode character references start with an ampersand and a pound sign (&#), and hexadecimal character references start with an ampersand,
pound sign, and an x (&#x). All entity, Unicode, and hexadecimal references end with a semicolon (;).

The addition of a DTD is necessary for the entity references in the entityreferences element. The values for the entity references must be defined outside of the XML document. Entity references can also be used as variables and combined with other entity references in a DTD.

Reserved Character Entities and References
Entity Reference Special Character
&amp;                   ampersand (&)
&apos;                  apostrophe or single quote (‘)
&gt;                        greater-than (>)
&lt;                          less-than (<)
&quot;                   double quote (“)

New character sets accommodation for evolving Unicode specifications form the base of new features for XML 1.1.

Some new Unicode characters that XML 1.1 processors recognize as part of well-formed element, attribute, and namespacenames are not accepted by XML 1.0 document syntax rules. These characters could already be used in XML 1.0 text and attribute values.

XML 1.1 instead defines which characters can specifically not be included in well-formed XML documents and considers any undefined characters as part of well-formed XML. This makes it easier to accommodate developing Unicode specifications.

Another feature of XML 1.1 is the capability to handle line-end characters generated
in IBM mainframe file formats, which has been a long-standing issue between XML
documents generated and shared across ASCII and EBCDIC-based platforms. XML
1.1 parsers are required to recognize and accept EBCDIC line-end characters (#x85)
and the Unicode line separator (#x2028). These values should be converted to one
of the XML 1.0 ASCII line-end characters-—linefeed (decimal 10, #xA), or carriage
return (decimal 13, #xD).

Namespaces for XML 1.1The essential difference between the XML Namespaces 1.0
and 1.1 recommendations is the ability to “undeclare” a previously defined namespace
declaration and its associated prefix.

你可能感兴趣的:(XML学习笔记-第一章 XML知识初始化)