Weak DTDs

Peter Newcomb peter at techno.com
Sat Oct 18 15:51:03 BST 1997

W. Eliot Kimber wrote:
> Peter has run head-on into one of the fundamental problems with DTDs as
> currently defined by SGML (and XML): we want them to describe *classes* of
> documents when they actually describe *individual* documents (and are
> incapable of defining classes of documents except in very weak ways).

The argument that Eliot is making is that as SGML (and XML) are
defined today, given an external declaration subset (the entity
identified by the external identifier of doctype declarations), there
is no (easy) way to guarantee that documents that reference it
actually conform to it, unless those documents' doctype declarations
do not include an internal subset.

This is because entities, notations, elements, and attributes declared
in a document's internal subset can radically alter the document's
type: general entities may be redefined, completely unknown notations,
element types, and attributes may be added, and parameter entities can
be redefined such that notations, element types, and attributes
declared in the external subset have completely different definitions.
All of these modifications can be made completely without constraint.

The only defenses DTD designers have against this all require the DTD
to be even more rigid, as any opportunity for flexibility also opens
up an opportunity for abuse.  Moreover, even these defenses may not be

Disallowing the internal subset is not the answer, because it is still
needed in order to describe document-level (as opposed to document
type-level) characteristics, at least things like document-specific
general entities, and configuration control parameter entities (that
configure the DTD in predefined ways, through the use of marked

Architectures, IMO, are a step in the right direction, since they are
immune to the kinds of haphazard modifications that make it difficult
to recognize and process a class of documents, while still allowing
the document-level flexibility needed by document authors.

[Sean Mc Grath <digitome at iol.ie> on Sat, 18 Oct 1997 09:48:39 +0100]
> <Statement InvititationForTrouble=TRUE">
> HyTime allows parsing w.r.t. a meta-DTD via HyTime aware parsers. However,
> I think there are many occasions when there is nothing "meta" involved. Just
> a desire to parse w.r.t to an alternative schema. Not a meta-schema - just
> a different schema.
> </Statement>

It is true that there is nothing "meta" about meta-DTDs.  They should
be called architectural DTDs instead, where "architectural" means
"used via the SGML architecture mechanism defined in Annex A.3 of
ISO/IEC 10744:1997", or "designed to be used architecturally", as in
the case of the HyTime architecture's DTD.  Architectural DTDs are
just DTDs being used in a different way.

And yes, architectural processing _is_ tantamount to parsing with
respect to an alternative schema, only the architectural schema is
better protected from the individual needs of documents, and
individual documents are better protected from the generalized needs
of the architectural schema.


Peter Newcomb                           TechnoTeacher, Inc.
peter at petes-house.rochester.ny.us       peter at techno.com
http://www.petes-house.rochester.ny.us  http://www.techno.com

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list