XSD: Structure definitions are not DTDs

Paul Prescod papresco at technologist.com
Mon May 25 18:28:30 BST 1998


I wanted to summarize and emphasize my arguments for throwing away the
"DTD model" in creating structure definitions. I've rather dribbled them
out over several days, so I thought that a coherent position paper would
be useful.

The DTD concept is more than 10 years old now. The world has changed alot
since then, and we know of many flaws with them. The most *obvious*
complaint is that they are in a notation that is different from XML
instance syntax. Personally, I think that the more subtle flaws are much
more major and that it would be a tragedy to correct this minor (and
debatable!) flaw without correcting the major (and perhaps less debatable)
flaws. There are also purely technical reasons that structure definition
documents cannot be used as a "drop-in" replacement for DTDs.

#1. Entities have nothing to do with "document types" or "document
structure." In fact, I can't think of any name that could unify entities,
elements and attributes other than "bag of definitions." In fact, entities
are *declarations* in that they must be declared before they can be used,
and element type definitions are *definitions* in that you can use them
without declaring them. Entities also relate to the physical structure of
an XML document (the mapping from small various text strings to a single
combined text string). Elements relate to the logical strutcure (the
mapping from a text string to a logical tree).

Let's also look at it practically. Schemata do not typically have to be
extended on a per-document basis, but DTDs *do*, because people need to
declare the entities (e.g. abbreviations, graphics) required for their
document.

#2. Documents could have multiple schemata, but XML and SGML allow only a
single DTD. For instance a single DTD could conform to HTML 1.0, HTML 2.0,
CML and many other HTML-like DTDs. I think that the application should be
able to be validate a document against as many of them as the user feels
it should be validated against. An author should also be able to claim
conformance to as many schemata as it wants.

#3. DTDs cannot be reused. In the precise sense defined by XML and SGML, a
DTD is a part of a document. It is not something that can be defined
standalone. HTML 4.0 is not a DTD. It is merely a set of markup
declarations. This can be easily verified in two ways. First, you can try
to go to either the XML specification or SGML standard and you will see
that DTDs are only defined in the context of a document. Second, you can
note that every version of HTML or any other major DTDs, is built with a
bunch of parameter entities which can be turned on and off in the document
instance. Thus even HTML 4.0 is a set of DTDs (loose, strict, etc.) 

#4. DTDs are already defined. Any document with a <!DOCTYPE ...> that
points to a Structure Definition Document (SDD) cannot be valid or even
well-formed. In other words, XSD cannot be "drop-in" replacements for XML
DTDs. Period.

#5. SDDs cannot define entities "in time". There is no way that an SDD can
make the following XML document well-formed:

<?SDD href="http://www.my.structure.definition.document"?>
<FOO>&bar</FOO>

Of course an appropriate DTD *can* make that document well-formed (and
valid). So XSDs cannot do everything that DTDs do.

#6. Handling external entities is *hard*. XML says that well-formedness
checkers do not have to download external entities. I think this is such a
black hole of headaches that I hope that nobody ever uses it. I especially
hope that XSD does not depend upon it. There is no parser that I know of
that can be directed to "re-parse" a document with an entity expanded. So
how would you implement "entity declaration" on top of these parsers?
Should XSD depend on features that have not been implemented yet?
Consider, if this example *was* legal XML:

<?SDD href="http://www.my.structure.definition.document">
<FOO>&bar</FOO>

Now I have an entity declaration in my SDD:

<entity name="bar" value="Hello World!">

How would you redirect a typical parser to reparse the document with the
entity replaced by the value?


In conclusion:

 * It is not technically possible to replace DTDs with XSDs at the XML
language level

 * It is not easy (or in all cases possible) to handle entities in XSDs
defined at the "application level".

I feel that SDDs should be substantially different from DTDs in at least
three respects: 

 * they should not be (cannot be!) invoked through the DOCTYPE statement, 

 * they should use XML element syntax,

 * an application built on SAX should be able to verify a document's
conformance to its SDD

 * they should not try to handle entity declarations and references.

 Paul Prescod  - http://itrc.uwaterloo.ca/~papresco

"You have the wrong number."
"Eh? Isn't that the Odeon?"
"No, this is the Great Theater of Life. Admission is free, but the 
taxation is mortal. You come when you can, and leave when you must. The 
show is continuous. Good-night." -- Robertson Davies, "The Cunning Man"


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list