Proposal Critique - XML DTDs to XML docs

Paul Prescod papresco at technologist.com
Sat May 23 07:01:20 BST 1998


Simon St.Laurent wrote:
> 
> Actually, I'd like to see all four of these separated, except for b and c,
> which I feel should at least use common syntax.  

Well, I'm not that much of a radical yet. I am not yet convinced that
structural inclusion (e.g. get that tree from that document and place it
*here*) can completely and conveniently replace XML/SGML's text
substitution model. I'm ready to be convinced, however. Once XLink is done
and widely implemented, we will have an opportunity to pit the two methods
head to head against each other, and the market will decide. It's because
the jury is still out on that issue that I argue(d) for entities etc. in
XML. I wasn't willing to bet XML's usability on the faith that XLink would
come along and would do everything that entities do. Presumably the people
who could actually vote felt that way also.

> I'm very cautious right now about adding too many layers of schemas, mostly
> because they seem to require very high levels of redundancy.  (XML-Data, a
> DTD, and an RDF representation of a single document?  And they all have to
> stay consistent?  Whoa...)  

Well, XML allows you to skip the DTD. XML-Data is supposed to verify
everything the DTD does, so you can skip the verification (schema) parts
of the DTD if you use XML-Data. You may still need the DTD for entities
and notations. (XML-Data does allow you to declare entities, which makes
no sense to me, so I'll just ignore it and hope it goes away) 

I believe that by the time RDF and XML-Data are done, there should be
little overlap between them. RDF schemata constrain relationships between
elements of particular types. XML-Data constrains where they can occur. So
it there is still a question about managing multiple files and layers, but
should NOT be a question of duplication of services.

I try to think of the situation as analogous to stylesheets. You probaby
wouldn't have one RDF schema for every XML Data schema you have, and one
XML Data schema for every document you have. (you can't help but have one
DTD per document...that's the way XML is defined) Instead, you would use a
single XML Data schema for dozens of documents, and perhaps a single RDF
schema for dozens of document *types*. There is some conceptual overlap
here with architectural forms, as they are also meant to be layered in the
way I describe. But archforms allow multiple layers of purely positional
verification. You need some other kind of schema language to do
link/relationship verification.

> I fear this is a symptom of the conservatism of the proposal, which began
> merely as an effort to map DTD syntax to an XML document syntax.  It inherits
> the warts along with the beauty.  A more radical solution would excise many of
> the warts, but didn't seem like a wise idea given the conservatism of many in
> the XML community.  Smaller steps on the way to a radical change seemed more
> appropriate in this environment, but that may not be the best way to go.

Well, there are no more conservatives anymore. ISO seems willing to go far
beyond what you or I am proposing. As I understand it, they are moving to
a situation where DTDs can be in *any notation* whatsoever. I could
understand it wrong, because I was not at the meeting, but as I understand
it, you could invent a new binary notation for DTDs, and it could turn on
tag ommission and shortref features that would make your SGML documents
unreadable to anyone without a parser for your binary DTD notation.

If you approach it the way I am suggesting, then that isn't a problem. If
schemata are separated from syntax, then Microsoft may not be able to
validate documents created with Netscape's editor, but at least they will
be able to *read* them. Having a proprietary schema language/engine would
be just like having a proprietary style language/engine -- not ideal, but
sometimes necessary.

I should mention that I have mostly cribbed this "ignore DTDs and work
directly on schemata" approach from various people. The only one whose
name I can think of right now is Eliot Kimber, who wants to move away from
DTDs for different reasons than you do (syntax) or I do (conflation of
features). He dislikes the fact that they are controlled by, and can be
overridden by, the document. Although this also bothers me, it bothers him
a lot more.

See: http://www.sil.org/sgml/n1957Note.html

Of course the non-DTD schemata that Eliot is most interested in are
architectures, which look a lot like DTDs.

Also, http://www.sil.org/sgml/thompsonSchemata.html is useful.

Henry uses the phrase "document structure definition" to avoid getting
stuck in the "DTD" rut.

> >c) that the language for verifying element and attribute occurrence must
> >be in the same specification (XML 1.x) as that for creating elements and
> >attributes themselves?
> 
> Must be?  No.  Could be, and would offer significant advantages?  Yes.  I
> still think XML document syntax has significant advantages over current
> syntax.  Could there be a better syntax?  Of course.

I wasn't asking about syntax, but about actual specifications. Should the
validation language be specified in the same standards document as the
language syntax? I think that we agree that it should not.

> >If we are to replace DTDs, let us replace them
> >with something simpler and more specific to the task of validation,
> >instead of transliterating them into another syntax, warts and all.
> 
> Good idea.  Who, where, when?  I'll buy beer...

Well, I think that this is what XML-Data is about, but it is only a rough
sketch. I also think that the W3C is supposed to create a working group
that will address these sorts of issues. We could work out a concrete
proposal in this mailing list, or offline, but I'm not sure if it would
move us beyond all of the other DTDs for DTDs. In other words, I think
that your basic idea will eventually get implemented, hopefully as
modified according to my comments. I don't know how to expedite that,
however.
 
Paul Prescod  - http://itrc.uwaterloo.ca/~papresco

"A writer is also a citizen, a political animal, whether he likes it or 
not. But I do not accept that a writer has a greater obligation 
to society than a musician or a mason or a teacher. Everyone has
a citizen's commitment."  - Wole Soyinka, Africa's first Nobel Laureate

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list