Proposal Critique - XML DTDs to XML docs

Fri May 22 02:27:18 BST 1998

>There is a reason that we usually choose not to have circular
>specifications. First, reading and writing them is often a pain. Second,
>the two become interdependent.

Yes, there is the 'ingrown toenail' metaphor for standards that rely on each 
other too closely and turn into a mess.

> [re: using href declarations for references]
>I thought that you wanted to use XLink and XPointer?

Of course I want to use XLink and XPointer.  The href declaration is the 
tiniest piece of the XLink standard, and seems fairly well established, if not 
indeed set in stone.  I'd be happy to use the full XLink spec, but realize 
that not everyone needs it.  Fine.  Make href a part of the 'Level 1' spec and 
pray that XLink doesn't migrate to entirely different terminology.  It's no 
worse than SYSTEM and PUBLIC are now, certainly.

>What would the rules be? What would extensions be allowed to do and not
>do?

For now, because this is simply a 'representation', I expected the same rules 
to hold for these DTDs with regard to document syntax as apply now.  Maybe I 
should have written a complete section on behavior; maybe I will.

>I guess I don't understand the difference between adding things and
>changing the fundamental rules of the "level 1" parse. DTDs DO change the
>fundamental rules of the fundamental parse. What could be more fundamental
>than this:

Here we begin to see where the communications breakdown has set in, and maybe 
we can unravel it. You see entities as modifying the rules of the 'fundamental 
parse'.  I see entities as riding along on the rules of the 'fundamental 
parse' to make their changes.  To me, the basic rules for parsing establish a 
syntax for documents, including a set of rules for including entities.  Using 
an entity is just taking advantage of those rules, _not_ modifying them in any 
way.  I see the distinction between expanding an entity and including (or 
transcluding) information from a link as a minor technical skirmish that 
should have been settled long ago, not a major battle over the fundamental 
shape of documents.

Maybe that's what I get for working in HyperCard and HTML all these years...

>We could restrict DTD extension to data typing, but that strikes me as a
>step backwards. Verification is going to be (and should be) increasingly
>the job of non-DTD schemata.
>...
>Verification should be handled at a different level and by a different
>piece of software than the parser.

I think this philosophy reflects SGML's heritage in document management.  
Developers who'd like to apply XML to other tasks may find this heritage 
distracting or indeed disturbing, giving the DTD's current lack of 
extensibility.  It's not hard to imagine database developers who need to use 
XML coming up with a really simple schema like:

<Element Name="FirstName" Type="Text" Size="50" />
<Element Name="LastName" Type="Text" Size="50" />
<Element Name="BirthDate" Type="Date" />

Then they could just use a PI to tell their application to check their 
well-formed document ("Who the hell needs a DTD anyway? Like who came up with 
_that_?") against this schema.  Something like:

<? WhoNeedsDTDs simpleschema="http://www.simonstl.com/schema.jnk" ?>

This doesn't really do any harm; part of the joy of well-formed documents is 
that you can chuck all the rest of the goodies in XML and build it yourself.

Still, to me, this loses a lot.  I'd like to see developers use DTDs, and I 
think that describing the structure of these documents is important for many 
reasons: easier use with editors, easier-built storage systems, and, of 
course, error-checking.

Making DTDs extensible in clearly defined ways (and not your <!MY-OWN-ENTITY > 
critter) seems lke a good way to bring these folks in.  By providing a 
structure that developers can use to ensure interoperability of their 
documents, as well as extend to include data-type verfication, I think we'd be 
able to keep more developers in the habit of using DTDs.

Which brings us to the core of the issue:
>In other words, I think that we should be reducing the responsibilities of
>the DTD, rather than expanding them.  A whole new syntax for a core part
>of the language would make XML much more complicated than it is now.

Right now, the options for including verification on top of the DTD structure 
look pretty ugly.  Namespaces, schemas, and PIs pile on top of each other to 
drive documents into the ground.  These sort of extensions are going to 
sprout.  I'd like to give them a good place to grow, a single document that 
provides a complete picture of a document model's content.  Do you really want 
stacks of schemas floating around as well as the style sheets, scripts, link 
group documents, and the DTD?  I don't feel the need to put _everything_ in 
one place - style sheets, scripts, and link information seem better managed 
outside this framework and don't cause endless repetition of the document 
structure.

Does it really make sense to define the DTD once for XML 1.0 validation and 
define an entirely separate  but redundant structure for data type validation? 
 If SGML compatibility is your highest aspiration, it certainly may.  To me, 
it doesn't make sense. 

Maybe the XML-Data crew will get their ubercombination to work.  I'd rather 
start by getting DTD's made extensible and more easily managed first, and then 
add the schemas later, without requiring redundant structures.  This doesn't 
seem like that bizarre a goal.

Simon St.Laurent
Dynamic HTML: A Primer / XML: A Primer / Cookies

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)