Proposal Announcement - XML DTDs to XML docs

Paul Prescod papresco at technologist.com
Thu May 21 12:48:54 BST 1998


Simon St.Laurent wrote:
> 
> At what level do you get defaulted attributes now?  Do you get defaulted
> attributes in a well-formed document without a DTD?  

No. But if you are talking about replacing the DTD, then I don't see how a
comparison to documents without a DTD are relevant. 

> Internal
> entities could be defined much as they are now, at the start of a document,
> within a structure set aside for that purpose using <ENTITY> (or whatever
> develops) instead of <!ENTITY>.  This would indeed need to be covered in level
> 1, unless you could live without internal entities.  

Okay. So <ENTITY> is in level 1 -- the document level, just like with
DTDs. But presumably <ENTITY> can use XLink. So now you've dragged XLink
into level 1. Now we are back to specification circularity.

Am I missing something here?

> In the past you seemed
> quite happy about forcing scripts to be external to a document, so I can't see
> why it would be so terrible to exile entities - and DTDs as well - to separate
> documents either.  I don't think it would be necessary, though, any more than
> it's necessary now.

I have never argued in favour of forcing scripts to be external to a
document. I argued that everything I know about text processing says that
putting scripts in textual documents is a bad idea -- and is in fact a
regression to the technique that SGML was invented to replace. But not
everybody uses XML or SGML for text processing, so I do not believe that
the *language* should restrict them from embedding scripts. XSL is a
perfect example of an appropriate mix of scripts and markup....but you'll
notice that there is essentially no text in an XSL stylesheet.

Anyhow, even if we exile entities, you still have the cirularity problem.
How can a level 1 parser process entities (as they do now) if the syntax
for declaring entities depends on XLink, XPointer, and other
specifications that are suppoed to be separate from XML itself.

Let me make this concrete (using a random DTDs in XML notation, with
old-syntax comments for clarity):

foo.xdtd:

<ELEMENT-TYPE NAME="TEST"><MODEL>TEST2</MODEL></ELEMENT-TYPE>
<!--ELEMENT TEST (TEST2)-->

<ELEMENT-TYPE NAME="TEST2"><EMPTY/></ELEMENT-TYPE>
<!--ELEMENT TEST2 EMPTY-->

<ENTITY NAME="foo"><CONTENT HREF="entity.com?ID(FOO)"/></ENTITY>
<!--ENTITY foo SYSTEM "..."-->


foo.xml:
<!DOCTYPE TEST SYSTEM "foo.xdtd">
<TEST>
&foo;
</TEST>

Does the processor have to go and fetch foo.xdtd, read it and understand
it before it can know the contents of this document?

> As for requiring levels, level 1 would serve a similar purpose to well-formed
> documents today.  2 would be a prerequisite for 3, of course.

Well-formed documents can have entities. In fact, all XML documents that
have entities are well-formed.
 
> >For example: One company's DTD extension could add in SGML tag ommission.
> >The start- and end-tag of an element could be implied, without violating
> >well-formedness. So then you could use that company's parser through SAX
> >and get a completely different set of events than if you used someone
> >else's parser. After all, changing the parse is one of the
> >responsibilities of the DTD.
> 
> I think this is overstating your case rather dramatically.  I could do
> something similarly brutal by creating a <? shorttags ?> PI at the start of a
> regular XML document and using the implied tags.  

No you could not. The semantics of XML DTDs are *fixed*, not extensible.
Any parser that interpreted processing instructions as commands to change
the parse would be *wrong*. But you propose that DTDs should become
extensible. Since DTDs can change the parse (radically, in some cases),
your proposal would allow DTD extensions to make documents specific to
particular processors, unless an amended proposal explicitly disallows
that.

> No one else could read my
> documents, but I sure could.  Not only that, but I already proposed separating
> the document syntax - which includes full start- and end-tags - from the DTD.
> There's no reason this proposal would allow the DTD to modify the basic
> document syntax and markup, period.

DTD's don't modify the document syntax and markup, but they do modify the
parse tree created by the document. In other words, they modify its
semantics. If you replace DTDs with something "extensible", you must
expect them to be able to modify the parse tree in extensible ways, unless
you explicitly disallow this in your proposal. Just as today's DTDs can
have "implied attributes", Microsoft could invent one with "implied
elements". Netscape could go in the opposite direction and give us
"transparent elements" that do not show up in the parse tree at all. You
could use the two parsers through SAX and get completely different parse
trees.
 
> My suggestion is that DTD's present a significant problem in their current
> format, and that they could be improved significantly.  I would enjoy being
> able to focus on elements and attributes, the core of XML (and SGML) document
> syntax, and worry less about the rest.  This project already is an attempt to
> be smaller, but to provide a place for new things to grow.

If all you are interested in is elements and attributes then you are
proposing a new schema language for XML, not a replacement for DTDs (which
do more). It is clear that you associate the word schema with complexity,
and I can't force you to use it. Schemata constrain the structure of data
models (databases, documents, etc.) 

DTDs are schemata. They also do more. They can change the parse. That's
what makes them complex and part of what makes *XML* complex. If you try
to do everything that DTDs do, then your new language will also be
needlessly complex. If you do not, then you are not replacing DTDs but
rather inventing something new. It sounds like a new schema language to
me.

Paul Prescod  - http://itrc.uwaterloo.ca/~papresco

"A writer is also a citizen, a political animal, whether he likes it or 
not. But I do not accept that a writer has a greater obligation 
to society than a musician or a mason or a teacher. Everyone has
a citizen's commitment."  - Wole Soyinka, Africa's first Nobel Laureate

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list