Proposal Critique - XML DTDs to XML docs

Paul Prescod papresco at technologist.com
Fri May 22 20:43:50 BST 1998


Simon St.Laurent wrote:
> 
> >Is there any good reason that the ability to change the parse tree should
> >be conflated with the responsibility for verifying schema-compliance as
> >they are in DTDs. Is there any good reason to perpetuate this conflation
> >in your proposed replacement for DTDs?
> 
> I'd like to see a structure that's:
> a) easily interpreted, edited, and stored, without the need for multiple
> toolsets
> b) capable of containing a complete set of information about a document,
> including structure and data

The word "structure" is too vague for me to be able to argue for or
against. Are you talking about a single *language* (or specification) that
incorporates

 a) instance syntax
 b) textual replacement
 c) external text embedding
 d) extensible validation

XML 1.0 incorporates all of them. I think that that made sense for XML
1.0, in order to be SGML compatible, but for future versions I would
rather see the first three completely separate from the fourth. The reason
I feel that the last should be separated is that the types of validation
(or "verification") that people have to do can be quite varied. XML made
the DTD optional for this reason. I don't see that making the XML
specification substantially larger with an alternative encoding for DTDs
can really make that specification simpler.

> Why on earth would I
> want to keep multiple sets of document descriptions (schemas, whatever) around
> that share the task of defining the same document set?  It seems like a
> management mess, a processing mess, a waste of bandwidth and storage because
> of redundant information, and just generally a nuisance.
> 
> Making DTDs extensible is a good way, in my view, to address this issue, and
> several others.

That sounds attractive, and I encourage you to try and make it work. If
you succeed, I will be happy to use it. But, to be honest, I don't think
it will succeed. It's like in the early days of computer programming when
people thought that it was possible to invent a single, "extensible"
programming language (or "meta programming language") that would serve all
needs. Every attempt didn't quite do everything that everybody needed, and
the harder people worked to make languages "extensible", the more complex
(C++) or merely unpopular (Lisp) the language became.

I personally don't believe that one extensible schema/DTD language can
serve all of our diverse validation needs. The set of "extensions" will be
unlimited and approach the complexity of a full programming language. Look
at RDF schemata. They are miles and miles away from DTDs. I've had
document types where I was modeling OO systems and wanted to verify things
like "base class is not inherited more than once." Some OO-modeling schema
language would handle that, but DTDs (even extensible ones) could never do
so.

I tend to think that a strategy that is more likely to be successful is
one that layers schema languages. At the bottom level you have something
like XML DTDs without all of the stuff related to entities and notations
(in XML element notation). That layer might include data type validation.
In levels above that you have RDF and other schemata that are more
interested in relationships than in positional occurrence.

It seems like you are interested in that bottom layer schema. I think that
it would be good to formalize an XML element notation for the bottom
layer. But if you try to make it a replacement for DTDs, then it must do
everything that DTDs do and inherit all of the problems that the
conflation of features in DTDs causes.

> What's so difficult about that?  I can't think of any good reason (besides
> SGML compatibility) to oppose either of those goals.  

It is quite likely that SGML will soon be changed to allow you to use
whatever notation you want for XML DTDs. SGML compatibility is not a
problem. The question is what is the right design. You can make a slightly
better version of a bad design, or you can try to start again with a good
design.

Let me ask this plainly:

Does it make sense 

a) that textual substitutions should be specified in a part of a document
called a "document type definition".

b) that the "document type definition" should also be responsible for
declaring media types and attaching them to non-XML entities.

c) that the language for verifying element and attribute occurrence must
be in the same specification (XML 1.x) as that for creating elements and
attributes themselves?

I don't think that those three things (among others) make sense anymore.
Hence, I don't think that inventing a new notation for this inappropriate
concept is a good idea. If we are to replace DTDs, let us replace them
with something simpler and more specific to the task of validation,
instead of transliterating them into another syntax, warts and all.

Paul Prescod  - http://itrc.uwaterloo.ca/~papresco

"A writer is also a citizen, a political animal, whether he likes it or 
not. But I do not accept that a writer has a greater obligation 
to society than a musician or a mason or a teacher. Everyone has
a citizen's commitment."  - Wole Soyinka, Africa's first Nobel Laureate

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list