Proposal Critique - XML DTDs to XML docs

Paul Prescod papresco at technologist.com
Fri May 22 16:40:05 BST 1998


Simon St.Laurent wrote:
> 
> For now, because this is simply a 'representation', I expected the same rules
> to hold for these DTDs with regard to document syntax as apply now.  Maybe I
> should have written a complete section on behavior; maybe I will.

If I understand this correctly, then you are saying that at first you
allow no extensions, just as DTDs allow no extensions.

> Here we begin to see where the communications breakdown has set in, and maybe 
> we can unravel it. You see entities as modifying the rules of the 'fundamental 
> parse'.  I see entities as riding along on the rules of the 'fundamental 
> parse' to make their changes.  To me, the basic rules for parsing establish a 
> syntax for documents, including a set of rules for including entities.

I misspoke. I meant that DTD's change the fundamental parse. This is not a
very interesting observation: given the same document instance, and two
different DTDs, you can get two radically different document parse trees.
Is there any good reason that the ability to change the parse tree should
be conflated with the responsibility for verifying schema-compliance as
they are in DTDs. Is there any good reason to perpetuate this conflation
in your proposed replacement for DTDs?
 
> Using
> an entity is just taking advantage of those rules, _not_ modifying them in any
> way.  I see the distinction between expanding an entity and including (or
> transcluding) information from a link as a minor technical skirmish that
> should have been settled long ago, not a major battle over the fundamental
> shape of documents.

It may or may not be a major battle over the fundamental shape of
documents, but it *is* a major battle over the fundamental shape of XML.
The differences between textual inclusion and structural inclusion are
quite deep and subtle. They affect hyperlinking, well-formedness,
validity, character set issues and almost everything else in the XML
specification. At some level of abstraction the distinction may be minor,
but in the details of the specification, it is humungous.
 
> Maybe that's what I get for working in HyperCard and HTML all these years...

HTML doesn't really support either. I don't know HyperCard.
 
> >Verification should be handled at a different level and by a different
> >piece of software than the parser.
> 
> I think this philosophy reflects SGML's heritage in document management.

I'm not sure if you understand that I am suggesting a model that is
fundamentally different from SGML's.

> Then they could just use a PI to tell their application to check their
> well-formed document ("Who the hell needs a DTD anyway? Like who came up with
> _that_?") against this schema.  Something like:
> 
> <? WhoNeedsDTDs simpleschema="http://www.simonstl.com/schema.jnk" ?>
> 
> This doesn't really do any harm; part of the joy of well-formed documents is
> that you can chuck all the rest of the goodies in XML and build it yourself.
> 
> Still, to me, this loses a lot.  I'd like to see developers use DTDs, and I
> think that describing the structure of these documents is important for many
> reasons: easier use with editors, easier-built storage systems, and, of
> course, error-checking.

Like anything else, I think that people should use the "standard
mechanisms" when that makes sense, and avoid them otherwise. I believe
that DTDs are inappropriate for some types of data, and would rather not
see them used in those cases. Some data types are not supposed to be
edited in editors. Sometimes the storage system and error-checking are
better driven by non-DTD schema languages.
 
> Making DTDs extensible in clearly defined ways (and not your <!MY-OWN-ENTITY >
> critter) seems lke a good way to bring these folks in.  By providing a
> structure that developers can use to ensure interoperability of their
> documents, as well as extend to include data-type verfication, I think we'd be
> able to keep more developers in the habit of using DTDs.

Data type verification is only one way that DTDs fall short of some types
of applications. Also, I don't believe that data type verification
requires "extensible DTDs". 

> Does it really make sense to define the DTD once for XML 1.0 validation and
> define an entirely separate  but redundant structure for data type validation?
>  If SGML compatibility is your highest aspiration, it certainly may.  To me,
> it doesn't make sense.

The element and attribute type verification provided by DTDs *are* data
type validation. I am not arguing that data type validation should be
separate from them. I am arguing that both element type validation and all
other types of verification should be completely separate from issues of
parsing, entity management and so forth.
 
> Maybe the XML-Data crew will get their ubercombination to work.  I'd rather
> start by getting DTD's made extensible and more easily managed first, and then
> add the schemas later, without requiring redundant structures.  This doesn't
> seem like that bizarre a goal.

I don't know what you mean by "add schemas later." DTDs are schemata. That
their verification responsibilities are mixed up with their parsing
responsibilities is unfortunate, but over the long term, correctable. But
only if we recognize that it is done in the wrong place and choose not to
perpetuate it in new schema languages like the one you propose.

Paul Prescod  - http://itrc.uwaterloo.ca/~papresco

"A writer is also a citizen, a political animal, whether he likes it or 
not. But I do not accept that a writer has a greater obligation 
to society than a musician or a mason or a teacher. Everyone has
a citizen's commitment."  - Wole Soyinka, Africa's first Nobel Laureate

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list