Elements
An element is the basic building block of an XML document. XHTML tags are all elements: <html>, <b>, <br />, etc. There are three primary types of elements:
- simple elements
These are elements that contain text or "parsed character data" (represented as #PCDATA in your DTD). In XHTML, the <b> tag is an example of a simple element.
- compound elements
These elements contain other elements, and sometimes PCDATA and other elements. In XHTML, the <html> tag is an example of a compound element.
- standalone elements
These elements are often called "singleton" tags. They do not contain any PCDATA or other elements. In XHTML, <br /> is an example of a standalone element.
Entities
An XML document is made up of data. XML documents can get data and declarations from many different sources, including CGI scripts, databases, and other XML files. Each of these items is an entity.
The file you use to write your XML declaration, document type declaration, and root element is called the document entity. In my example on the previous page, these would be:
- XML declaration
<?xml version="1.0" standalone="yes"?>
- document type declaration (in this case the entire DTD)
<!DOCTYPE family [
<!ELEMENT parent (#PCDATA)>
<!ELEMENT child (#PCDATA)>
]>
- root element
<family> ... </family>
Entities can be external to your XML document, like a stylesheet or XSL document, or internal like something you define. The most common internal entity is a general entity. This is used as an abbreviation for commonly used text, or text that is difficult to type.
To define an internal general entity, you use the <!ENTITY> tag in your DTD:
<!ENTITY name "text to be replaced">
For example: writing out "Jennifer Kyrnin, About Web Design Guide" is a bit long, but I can create an entity to add that into my XML documents with just 4 characters: <!ENTITY jkk "Jennifer Kyrnin, About Web Design Guide"> aand every time I type &jkk; in an XML document with that DTD, it will expand to read "Jennifer Kyrnin, About Web Design Guide".
To define an external entity, you create a very similar tag in your DTD, but you include the word "SYSTEM" so that the parser knows that this is an external entity. You also include the URI or location of the entity:
<!ENTITY name system "URI">
Attributes
If you are familiar with HTML, then you are familiar with attributes. These are the commands within the tags that give additional information or instructions to those elements. For example, in XHTML, you can use the tag <hr /> without any elements, but if you want to change the line, you use attributes, such as <hr size="40" width="80%" />
When you create elements with attributes, you need to declare the possible attributes in your DTD. To do this, you use the <!ATTLIST> tag:
<!ATTLIST element_name attribute_name type default_value>
For example, my <parent> element might have the attribute of "role" with two options "father" or "mother". This would be defined as:
<!ATTLIST parent role (father | mother) #required>
In this example, I've assigned the element <parent> with the required attribute of "role". The attribute can be either "father" or "mother".
There are ten types that can be assigned to attributes:
- CDATA - text
- enumerated - an exact list of options, like in my example above
- ID - a unique name for the element
- IDREF - the value of an ID type attribute
- IDREFS - multiple IDs, separated by whitespace
- ENTITY - the name of an entity declared in the DTD
- ENTITIES - multiple entities, separated by whitespace
- NMTOKEN - an XML name
- NMTOKENS - multiple XML names, separated by whitespace
- NOTATION - the name of a notation declared in the DTD
Notations
In XML, you may come across data that you would like to include in your documents that is not XML. Notations allow you to include that data in your documents by describing the format it and allowing your application to recognize and handle it.
The format for a notation is:
<!NOTATION name system "external_ID">
The name identifies the format used in the document, and the external_id identifies the notation - usually with MIME-types. For example, to include a GIF image in your XML document:
<!NOTATION GIF system "image/gif">
You can also use a "public" identifier, instead of "system". To do this you need to include both a public ID and a URI. Using the GIF example:
<!NOTATION GIF public
"-//IETF/NOSGML Media Type image/gif//EN"
"http://www.isi.edu/in-notes/iana/assignments/media-types/image/gif">
On the final page of this article, you will see what a simple DTD would look like within an XML document.
Part 3: A Sample DTD
By Jennifer Kyrnin, About.com
More About:
DTDs are not difficult to write, but it is often easier to write an XML document first, and then define the DTD based upon what you wrote. For this example, I wrote an XML document based on a portion of a family tree. Once the document was finished, I wrote my DTD to match.
The XML Document
<?xml version="1.0" standalone="yes"?>
<family>
<title>My Family</title>
<parent role="mother">Judy</parent>
<parent role="father">Layard</parent>
<child role="daughter">Jennifer</child>
<image source="JENN" />
<child role="son">Brendan</child>
&footer;
</family>
The DTD
<!DOCTYPE family [
<!ELEMENT title (#PCDATA)>
<!ELEMENT parent (#PCDATA)>
<!ATTLIST parent role (mother | father) #required>
<!ELEMENT child (#PCDATA)>
<!ATTLIST child role (daughter | son) #required>
<!NOTATION gif system "image/gif">
<!ENTITY JENN system
"http://images.about.com/sites/guidepics/html.gif"
NDATA gif>
<!ELEMENT image empty>
<!ATTLIST image source entity #required>
<!ENTITY footer "Brought to you by Jennifer Kyrnin">
]>