How to Read HTML DTD-from HTML4.01 Specifications

How to read the HTML DTD
  (tenfyguo注:important!!,定义了HTML的元素包含关系以及元素允许和不允许的嵌套内容等规则,
    符合DTD定义的html格式称为well-formed html)

   Each element and attribute declaration in this specification is
   accompanied by its document type definition fragment. We have chosen
   to include the DTD fragments in the specification rather than seek a
   more approachable, but longer and less precise means of describing an
   element's properties. The following tutorial should allow readers
   unfamiliar with SGML to read the DTD and understand the technical
   details of the HTML specification.
  
  3.3.1 DTD Comments
 
   In DTDs, comments may spread over one or more lines. In the DTD,
   comments are delimited by a pair of "--" marks, e.g.
  
<!ELEMENT PARAM - O EMPTY       -- named property value -->

   Here, the comment "named property value" explains the use of the PARAM
   element type. Comments in the DTD are informative only.
  
  3.3.2 Parameter entity definitions
 
   The HTML DTD begins with a series of parameter entity definitions. A
   parameter entity definition defines a kind of macro that may be
   referenced and expanded elsewhere in the DTD. These macros may not
   appear in HTML documents, only in the DTD. Other types of macros,
   called character references, may be used in the text of an HTML
   document or within attribute values.
  
   When the parameter entity is referred to by name in the DTD, it is
   expanded into a string.
  
   A parameter entity definition begins with the keyword <!ENTITY %
   followed by the entity name, the quoted string the entity expands to,
   and finally a closing >. Instances of parameter entities in a DTD
   begin with "%", then the parameter entity name, and terminated by an
   optional ";".
   (tenfyguo:说明了参数实体定义的规则是如何展开的,本质是macro的思想,更本质的思想是软件复用的思想,

   从这里可以看出,老外们的协议定义,规范制定等等都处处体现了复用的思想)
  
   The following example defines the string that the "%fontstyle;" entity
   will expand to.
<!ENTITY % fontstyle "TT | I | B | BIG | SMALL">

   The string the parameter entity expands to may contain other parameter
   entity names. These names are expanded recursively. In the following
   example, the "%inline;" parameter entity is defined to include the
   "%fontstyle;", "%phrase;", "%special;" and "%formctrl;" parameter
   entities.
  
<!ENTITY % inline "#PCDATA | %fontstyle; | %phrase; | %special; | %formctrl;">

   You will encounter two DTD entities frequently in the HTML DTD:
   "%block;" "%inline;". They are used when the content model includes
   block-level and inline elements, respectively (defined in the section
   on the global structure of an HTML document).
  
  3.3.3 Element declarations
 
   The bulk of the HTML DTD consists of the declarations of element types
   and their attributes. The <!ELEMENT keyword begins a declaration and
   the > character ends it. Between these are specified:
    1. The element's name.
    2. Whether the element's tags are optional. Two hyphens that appear
       after the element name mean that the start and end tags are
       mandatory. One hyphen followed by the letter "O" indicates that
       the end tag can be omitted. A pair of letter "O"s indicate that
       both the start and end tags can be omitted.
       (tenfyguo:定义了元素是如何进行说明的,在元素名后面出现--,表示元素必须有开始和结束
       tag,-O表示结束tag可以忽略,OO表示开始和结束tag均可省略)
    3. The element's content, if any. The allowed content for an element
       is called its content model. Element types that are designed to
       have no content are called empty elements. The content model for
       such element types is declared using the keyword "EMPTY".
      
   In this example:
    <!ELEMENT UL - - (LI)+>
     * The element type being declared is UL.
     * The two hyphens indicate that both the start tag <UL> and the end
       tag </UL> for this element type are required.
     * The content model for this element type is declared to be "at
       least one LI element". Below, we explain how to specify content
       models.
      
   This example illustrates the declaration of an empty element type:
    <!ELEMENT IMG - O EMPTY>
     * The element type being declared is IMG.
     * The hyphen and the following "O" indicate that the end tag can be
       omitted, but together with the content model "EMPTY", this is
       strengthened to the rule that the end tag must be omitted.
     * The "EMPTY" keyword means that instances of this type must not
       have content.
      
    Content model definitions
   
   The content model describes what may be contained by an instance of an
   element type. Content model definitions may include:
     * The names of allowed or forbidden element types (e.g., the UL
       element contains instances of the LI element type, and the P
       element type may not contain other P elements).
     * DTD entities (e.g., the LABEL element contains instances of the
       "%inline;" parameter entity).
     * Document text (indicated by the SGML construct "#PCDATA"). Text
       may contain character references. Recall that these begin with &
       and end with a semicolon (e.g., "Herg&eacute;'s adventures of
       Tintin" contains the character entity reference for the "e acute"
       character).
      
   The content model of an element is specified with the following
   syntax. Please note that the list below is a simplification of the
   full SGML syntax rules and does not address, e.g., precedences.
  
   ( ... )
          Delimits a group.
         
   A
          A must occur, one time only.
         
   A+
          A must occur one or more times.
         
   A?
          A must occur zero or one time.
         
   A*
          A may occur zero or more times.
         
   +(A)
          A may occur.
         
   -(A)
          A must not occur.
         
   A | B
          Either A or B must occur, but not both.
         
   A , B
          Both A and B must occur, in that order.
         
   A & B
          Both A and B must occur, in any order.
     (tenfyguo:类似Regular Expression的次数描述语法)    
   Here are some examples from the HTML DTD:
   <!ELEMENT UL - - (LI)+>

   The UL element must contain one or more LI elements.
   <!ELEMENT DL    - - (DT|DD)+>

   The DL element must contain one or more DT or DD elements in any
   order.
   <!ELEMENT OPTION - O (#PCDATA)>

   The OPTION element may only contain text and entities, such as &amp;
   -- this is indicated by the SGML data type #PCDATA.
  
   A few HTML element types use an additional SGML feature to exclude
   elements from their content model. Excluded elements are preceded by a
   hyphen. Explicit exclusions override permitted elements.
  
   In this example, the -(A) signifies that the element A cannot appear
   in another A element (i.e., anchors may not be nested).
   <!ELEMENT A - - (%inline;)* -(A)>

   Note that the A element type is part of the DTD parameter entity
   "%inline;", but is excluded explicitly because of -(A).
  
   Similarly, the following element type declaration for FORM prohibits
   nested forms:
   <!ELEMENT FORM - - (%block;|SCRIPT)+ -(FORM)>
   <已经明确的说明了表单是不能嵌套使用的,这点在做爱情小镇的前台是已经栽过跟头!!切记!>

  3.3.4 Attribute declarations
 
   The <!ATTLIST keyword begins the declaration of attributes that an
   element may take. It is followed by the name of the element in
   question, a list of attribute definitions, and a closing >. Each
   attribute definition is a triplet that defines:
     * The name of an attribute.
     * The type of the attribute's value or an explicit set of possible
       values. Values defined explicitly by the DTD are case-insensitive.
       Please consult the section on basic HTML data types for more
       information about attribute value types.
     * Whether the default value of the attribute is implicit (keyword
       "#IMPLIED"), in which case the default value must be supplied by
       the user agent (in some cases via inheritance from parent
       elements); always required (keyword "#REQUIRED"); or fixed to the
       given value (keyword "#FIXED"). Some attribute definitions
       explicitly specify a default value for the attribute.
      
   In this example, the name attribute is defined for the MAP element.
   The attribute is optional for this element.
<!ATTLIST MAP
  name        CDATA     #IMPLIED
  >

   The type of values permitted for the attribute is given as CDATA, an
   SGML data type. CDATA is text that may contain character references.
  
   For more information about "CDATA", "NAME", "ID", and other data
   types, please consult the section on HTML data types.
  
   The following examples illustrate several attribute definitions:
rowspan     NUMBER     1         -- number of rows spanned by cell --
http-equiv  NAME       #IMPLIED  -- HTTP response header name  --
id          ID         #IMPLIED  -- document-wide unique id --
valign      (top|middle|bottom|baseline) #IMPLIED

   The rowspan attribute requires values of type NUMBER. The default
   value is given explicitly as "1". The optional http-equiv attribute
   requires values of type NAME. The optional id attribute requires
   values of type ID. The optional valign attribute is constrained to
   take values from the set {top, middle, bottom, baseline}.
  
    DTD entities in attribute definitions
   
   Attribute definitions may also contain parameter entity references.
  
   In this example, we see that the attribute definition list for the
   LINK element begins with the "%attrs;" parameter entity.
<!ELEMENT LINK - O EMPTY               -- a media-independent link -->
<!ATTLIST LINK
  %attrs;                              -- %coreattrs, %i18n, %events --
  charset     %Charset;      #IMPLIED  -- char encoding of linked resource --
  href        %URI;          #IMPLIED  -- URI for linked resource --
  hreflang    %LanguageCode; #IMPLIED  -- language code --
  type        %ContentType;  #IMPLIED  -- advisory content type --
  rel         %LinkTypes;    #IMPLIED  -- forward link types --
  rev         %LinkTypes;    #IMPLIED  -- reverse link types --
  media       %MediaDesc;    #IMPLIED  -- for rendering on these media --
  >

   Start tag: required, End tag: forbidden
  
   The "%attrs;" parameter entity is defined as follows:
<!ENTITY % attrs "%coreattrs; %i18n; %events;">

   The "%coreattrs;" parameter entity in the "%attrs;" definition expands
   as follows:
<!ENTITY % coreattrs
 "id          ID             #IMPLIED  -- document-wide unique id --
  class       CDATA          #IMPLIED  -- space-separated list of classes --
  style       %StyleSheet;   #IMPLIED  -- associated style info --
  title       %Text;         #IMPLIED  -- advisory title --"
  >
 (上面的这些属性是大多数的HTML元素的必须具备的属性,所以作为一个单独parameter entity进行
 统一定义,从这里可以看出,macros的思想在这里体现得很彻底!)
   The "%attrs;" parameter entity has been defined for convenience since
   these attributes are defined for most HTML element types.
  
   Similarly, the DTD defines the "%URI;" parameter entity as expanding
   into the string "CDATA".
<!ENTITY % URI "CDATA"
    -- a Uniform Resource Identifier,
       see [URI]
    -->

   As this example illustrates, the parameter entity "%URI;" provides
   readers of the DTD with more information as to the type of data
   expected for an attribute. Similar entities have been defined for
   "%Color;", "%Charset;", "%Length;", "%Pixels;", etc.
  
    Boolean attributes
   
   Some attributes play the role of boolean variables (e.g., the selected
   attribute for the OPTION element). Their appearance in the start tag
   of an element implies that the value of the attribute is "true". Their
   absence implies a value of "false".
  
   Boolean attributes may legally take a single value: the name of the
   attribute itself (e.g., selected="selected").
  
   This example defines the selected attribute to be a boolean attribute.
selected     (selected)  #IMPLIED  -- option is pre-selected --

   The attribute is set to "true" by appearing in the element's start
   tag:
<OPTION selected="selected">
...contents...
</OPTION>

   In HTML, boolean attributes may appear in minimized form -- the
   attribute's value appears alone in the element's start tag. Thus,
   selected may be set by writing:
<OPTION selected>

   instead of:
<OPTION selected="selected">

   Authors should be aware that many user agents only recognize the
   minimized form of boolean attributes and not the full form.

你可能感兴趣的:(html,character,Comments,attributes,macros,Types)