documenting schemas/DTDs

W. Eliot Kimber eliot at
Thu Nov 18 03:42:03 GMT 1999

Peter Saint-Andre wrote:
> Chapter 5 of the XML Schema working draft is a placeholder for
> recommendations regarding the documenting of schemas, and the available
> information on DDML and XSchema for DTDs seems somewhat out of date. What is
> the status of documentation efforts for XML? I'd sure like to have at my
> disposal an "XMLdoc" technology similar to Javadoc....

My personal feeling, based on many years of painful experience
developing and maintaining DTDs of varying scale and complexity and
documenting same (including the original version of IBM's IBM ID Doc
application and the second edition of the HyTime architecture, both
massive documentation projects) is that the only practical way to
develop and manage non-trivial document types is by making the
documentation the primary definition, with the working declarations
extracted from it using some sort of make process.  This is what we did
for both IBM ID Doc (we developed it by writing the reference manual,
which included the working declarations by reference) and HyTime 2nd
edition (the declarations for the meta-DTD are embedded in the source
for the normative text; we wrote a script to extract them and integrate
them with the ungodly tangle of marked sections needed to manage the
run-time configurability of the HyTime DTD, which do not appear in the
main body of the normative text).

If you are are creating DTD-syntax DTDs, the syntax of DTDs is simply
not up to the task of maintaining and managing documentation of any
useful sophistication. This is true for SGML, where you can embed
comments within declarations (essential for documenting attribute lists)
and absolutely true for XML where all comments must be separate
declarations (and you therefore have no hope of reliably binding them to
the declarations they should go with). 

SGML and XML's marked section facility is not up to the task of managing
complexly configured DTDs (even authoring vs. production distinctions
become unworkable very quickly). If you are trying to manage shared
declarations across several derivative DTDs, it is simply impossible.

Given this, the only realistic solution is to use XML documents to
describe the document type and define the details of the declarations
(thus, something that looks a lot like XML Schemas). In an XML context
you can do sophisticated configuration control, your documentation can
be as sophisticated as you need, you can do real modularization, and so

But, no standard markup for representing DTDs will be directly usable
for this task, simply because no matter what the standard says, it will
not meet some significant number of your requirements for a DTD creation
and authoring DTD. This is not a failing of the XML Schema designers, it
is simply a fact, the same fact that guarantees that no industry
standard DTD will be useful for authoring by any given member of that

This means that *regardless of whether you prefer DTD syntax or XML
Schema*, you will define your own document type and supporting
infrastructure for managing your DTD documentation (by "you" I mean
people doing non-trivial production-quality stuff, not individuals
playing about with XML, for whom simpler solutions are satisfactory). Or
you will soon wish you had.

Note that the focus must be on the *documentation*, not the
declarations. That's because it's the documentation that counts in the
long run. The declarations are so much syntax. Without the documentation
the declarations are useless, but with the documentation, the
declarations can be recreated at will. 

It also means that DTD syntax or XML Schema, it doesn't really matter
because both will be equally easy to create (more or less--schemas will
actually be somewhat harder because they give you more to say and more
syntax you have to worry about, but that's another matter).

The problem, of course, is that building this sort of system is non
trivial--it's certainly more complex than your typical SGML publishing
application. To be complete it requires at least the following

1. A well thought out document type that provides rich documentation
structures that meet your specificat requirements as well as complete
ways to represent declarations at the appropriate level of abstraction
(this may seem easy, but it's actually quite a subtle problem, even when
you are a committee of one).

2. Infrastructure for doing use-by-reference of documentation and DTD
modules so you can manage systems of related DTDs. The current XML
family of specifications offer little help here because they do not
provide a clear or general use-by-ref semantic (although I've seen some
encouraging developments in this area).

3. Extensive editor customization to make creating what will be quite
verbose and complex documents easy and managable.

4. Sophisticated and complex online and print rendering systems to turn
this documentation into something readable.

5. The DTD "make" process itself that generates the working
declarations, whatever form they may take.

My experience suggests that there cannot be a one-size-fits all solution
for this problem. In particular, no XML Schema proposal can solve it
because anything it does with respect to documentation will either be
too little or too general or too restrictive to meet requirements. Even
an architecture-based approach would probably be too restrictive,
although it could get closer to being generally useable.

It is probably possible to develop a good bit of the necessary
infrastructure in a generic way that could be used by many different DTD
documentation document types.  Unfortunately, it's not easy to see how
it could be a money-making proposition and I doubt it fires the
enthusiasm of anyone with open source time on their hands. I would
probably take it on if I had independent wealth because it's actually
one of the more challenging and interesting XML application problem
spaces and the lack of such systems is a significant limiting factor to
the sophistication of the XML solutions that can be built simply because
the DTDs and their documentation become unmanagable without them.

If anyone has a couple million dollars to throw at this problem over the
next year or two, feel free to drop me a line....



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as: and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list