DTD versioning

Tue Apr 7 18:23:26 BST 1998

John Wilson <tug at wilson.co.uk> asked about DTD versioning.

Thre was some work done at OCLC on software to manipulate DTDs;
try the Research section of www.oclc.org.

Where I forsee this being important, I generally take one of two
different tactics:
[1] include a DTDrev attribute on the outermost "doctype" element, and make
    it #REQUIRED,
    <!ATTLIST document DTDrevision "2" #REQUIRED>

    Now every document must contain this to be valid, and I can therefore
    find documents that may need changing if the DTD's major revision
    changes.

[2] include the DTD version in the filename, SYSTEM and/or PUBLIC
    identifier of the DTD.  It's required in practice in a PUBLIC
    identifier for SGML, since resolving (fetching) the same PUBLIC
    identifier twice must always result in the same information.
    In XML, support for PUBLIC identifiers is optional, so it's best
    to avoid them and use only SYSTEM identifiers, but you can still
    include a version number there.

For SGML, it's actually a fairly difficult problem to compute compatible
DTD changes, especially in the face of the rules about inclusions and
how they affect the interpretation of whitespace.

XML does not have these complexities.

None the less, a mechanical change is often the smallest part of the effort,
because SGML has in the past been used mostly for representing documents
rather than for database interchange.  In documents, especially in the
_descriptive_ or transcriptional use of SGML, well, an example:
    if i prevoisly had
    <placename>
    <keyword>
    <italic>
in my DTD and i now decide that some italic words are really people, so i
add
    <people>
i must now inspect every <placement> <keyword> or <italic> to see
if perhaps a person's name has been put there, and change it.  If
I could do that automatically, I probably would have tagged it like
that in the first place :)

Not making existing documents syntactically invalid is one thing; not
making them semantically invalid is another.

For database interchange, the semantic issues can be controlled more
tightly, I think, and this is an area that could be researched more.

Lee

-- 
Liam Quin --  the barefoot typographer -- Toronto
lq-text: freely available Unix text retrieval
IRC: discuss XML/SGML/XSL/XLL/DSSSL Mondays irc.technonet.net in #XML
email address: l i a m q u i n, at host: i n t e r l o g  dot  c o m

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)