DOCTYPE (was Re: Announcement: SAX 1998-01-12 Draft)

Peter Murray-Rust peter at
Tue Jan 13 16:11:02 GMT 1998

At 09:26 13/01/98 +0000, Sean Mc Grath wrote:
[... contribs from James Clark and Tyler Baker deleted ...]

>Isn't there a big issue looming here? How will software agents determine the
>of an XML document. I am aware of at least one example of a company planning
>to use the <!DOCTYPE declaration to do just that.
>I was one of those who voted for this but it is clear from what James is
>that it is plain wrong and misleading to single it out.
>The problem of typing though, remains doesn't it?


I have replied at some length, because this is a difficult issues and has
been debated many times before.  My message is meant to describe *what the
present position is*. NOT *what it would be nice is it was*. I do NOT think
debate on the latter is appropriate on XML-DEV.

We are back to Lewis Carroll: what is the type of the document and what is
the name of the type of the document, and what is the reference to the type
of the document... etc.  My current reading is:

<!DOCTYPE FOO PUBLIC "-//FOO-BAR//DTD V1.23//" "foo.dtd">

The FOO simply means that the root of the document is a single FOO element.
The only reason things it can be used for are:
	- telling you what is in the document (i.e. you might want to keep on
reading if the document root was POEM).
	- telling that parser that if the document does NOT have a root element of
type FOO it can throw a Draconian error and not do any more work.

IMO I can live without this :-)

The pubID says that the FOO organisation has produced a DTD identified by
this string (presumably this is V1.23 of a DTD, but it doesn't have to be.)
This is useful to me if:
	- I have heard of the FOO organisation
	- know where to find them
	- they provide a document whose identifier (NOT reference or address) is
the pubId string
	- I know how to find this document.
This is used in certain domains (e.g. publishing, where FPIs are known and
used.) However AFAIK there is no mechanism for locating them on the WWW, no
simple means of registering, no one paid to maintain a registry. Without
this their use in XML may be minimal.

The "foo.dtd" says that the external subset of the DTD can be found in a
named file (more generally a URL). This URL may be absolute or relative to
the current document. The NAME of the URL (i.e. the address) is a very poor
way of *identifying* the TYPE of the document, since it is the contents of
the URL that matter. 

It is probably no secret that this debate has exercised the WG at length.

In conclusion (IMO) the DOCTYPE statement really only serves to identify
the address of the external subset. It is equivalent to:

<!ENTITY % foo "foo.dtd">

How do we determine the TYPE of a document?  There is no good mechanism.
The following could be developed:
	- convince the world to use FPIs and fund a registry (a la domain name
	- create and register MIME types and attach them to documents
	- develop an XML-specific mechanism to be located *inside* XML schema files

I suspect that offering the DOCTYPE in SAX is of limited value and more
trouble than it is worth. [The entity of the external subset can be
obtained by other means if required.]


Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS, Virtual Hyperglossary

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as:
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list