XSD: Structure definitions are not DTDs

Peter Murray-Rust peter at ursus.demon.co.uk
Mon May 25 20:03:18 BST 1998


First - many thanks for the help you are giving us.

At 12:28 25/05/98 -0400, Paul Prescod wrote:
>I wanted to summarize and emphasize my arguments for throwing away the
>"DTD model" in creating structure definitions. I've rather dribbled them
>out over several days, so I thought that a coherent position paper would
>be useful.
>
>The DTD concept is more than 10 years old now. The world has changed alot
>since then, and we know of many flaws with them. The most *obvious*
>complaint is that they are in a notation that is different from XML
>instance syntax. Personally, I think that the more subtle flaws are much
>more major and that it would be a tragedy to correct this minor (and
>debatable!) flaw without correcting the major (and perhaps less debatable)
>flaws. There are also purely technical reasons that structure definition
>documents cannot be used as a "drop-in" replacement for DTDs.

I suspect that many of us (including me) do not intend them to be
replacements. If we did, we would certainly have to address many of the
points you mention. My own model is very limited and consists of analysing
the DTD 'after the event'. 
>
>#1. Entities have nothing to do with "document types" or "document
>structure." In fact, I can't think of any name that could unify entities,
>elements and attributes other than "bag of definitions." In fact, entities
>are *declarations* in that they must be declared before they can be used,
>and element type definitions are *definitions* in that you can use them
>without declaring them. Entities also relate to the physical structure of
>an XML document (the mapping from small various text strings to a single
>combined text string). Elements relate to the logical strutcure (the
>mapping from a text string to a logical tree).

I am happy to have such a well-put argument against entities as I don't
think they should form part of XSD.

>
>Let's also look at it practically. Schemata do not typically have to be
>extended on a per-document basis, but DTDs *do*, because people need to
>declare the entities (e.g. abbreviations, graphics) required for their
>document.


Agreed.
>
>#2. Documents could have multiple schemata, but XML and SGML allow only a
>single DTD. For instance a single DTD could conform to HTML 1.0, HTML 2.0,
>CML and many other HTML-like DTDs. I think that the application should be
>able to be validate a document against as many of them as the user feels
>it should be validated against. An author should also be able to claim
>conformance to as many schemata as it wants.

I understand and support this concept, but suspect that trying to address
it at present would widen the discussion too much.
>
>#3. DTDs cannot be reused. In the precise sense defined by XML and SGML, a
>DTD is a part of a document. It is not something that can be defined
>standalone. HTML 4.0 is not a DTD. It is merely a set of markup
>declarations. This can be easily verified in two ways. First, you can try
>to go to either the XML specification or SGML standard and you will see
>that DTDs are only defined in the context of a document. Second, you can
>note that every version of HTML or any other major DTDs, is built with a
>bunch of parameter entities which can be turned on and off in the document
>instance. Thus even HTML 4.0 is a set of DTDs (loose, strict, etc.) 

My own model is simply a bunch of markup declarations - after expansion of
PEs, etc.

>
>#4. DTDs are already defined. Any document with a <!DOCTYPE ...> that
>points to a Structure Definition Document (SDD) cannot be valid or even
>well-formed. In other words, XSD cannot be "drop-in" replacements for XML
>DTDs. Period.

If you are suggesting that we can't write:
<!DOCTYPE foo SYSTEM "foo.xsd" [] >

I'd agree. I am addressing the much more limited:
<!DOCTYPE foo SYSTEM "foo.dtd" [] >

and translating the result into XML.

>
>#5. SDDs cannot define entities "in time". There is no way that an SDD can
>make the following XML document well-formed:
>
><?SDD href="http://www.my.structure.definition.document"?>
><FOO>&bar</FOO>

Agreed. We are not trying to change the syntax of XML1.0

>
>Of course an appropriate DTD *can* make that document well-formed (and
>valid). So XSDs cannot do everything that DTDs do.
>
>#6. Handling external entities is *hard*. XML says that well-formedness

I agree with this. External text entities are ?simply? macro replacements
and disappear before the parser output. The other entities (NDATA) are too
complicated for me and most others.

>checkers do not have to download external entities. I think this is such a
>black hole of headaches that I hope that nobody ever uses it. I especially
>hope that XSD does not depend upon it. There is no parser that I know of

Agreed. I have argued against NOTATION (primarily upon the basis that (a) I
don't understand it and (b) the bits I do understand are much better
catered for by a MIME attribute. If we can have xml:lang, why can't we have
xml:mime???


>that can be directed to "re-parse" a document with an entity expanded. So
>how would you implement "entity declaration" on top of these parsers?
>Should XSD depend on features that have not been implemented yet?
>Consider, if this example *was* legal XML:
>
><?SDD href="http://www.my.structure.definition.document">
><FOO>&bar</FOO>
>
>Now I have an entity declaration in my SDD:
>
><entity name="bar" value="Hello World!">
>
>How would you redirect a typical parser to reparse the document with the
>entity replaced by the value?

I wouldn't :-)

>
>
>In conclusion:
>
> * It is not technically possible to replace DTDs with XSDs at the XML
>language level

If we are trying to replace them before parsing...
>
> * It is not easy (or in all cases possible) to handle entities in XSDs
>defined at the "application level".

Agreed as above.
>
>I feel that SDDs should be substantially different from DTDs in at least
>three respects: 
>
> * they should not be (cannot be!) invoked through the DOCTYPE statement, 

Agreed.
>
> * they should use XML element syntax,

Agreed.

>
> * an application built on SAX should be able to verify a document's
>conformance to its SDD

Yes, but this will take some working out if they are more than translations
of DTDs
>
> * they should not try to handle entity declarations and references.

Agreed.

The main conclusion is that if we try to be too ambitious, we shall
flounder. This point has been repeatedly made. We should therefore only
attempt the simpler aspects which I think we can manage.

	P.

Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list