Proposal Announcement - XML DTDs to XML docs

Simon St.Laurent SimonStL at classic.msn.com
Thu May 21 15:27:07 BST 1998


>Okay. So <ENTITY> is in level 1 -- the document level, just like with
>DTDs. But presumably <ENTITY> can use XLink. So now you've dragged XLink
>into level 1. Now we are back to specification circularity.
>
>Am I missing something here?

There are several possible answers.  You could allow XML Level 1 parsers to 
ignore external entities if they choose - something similar is already in the 
spec right now (in a more limited case, section 4.1) for well-formed 
documents.  You could hard-wire the href attribute's interpretation in a DTD - 
parsers are already dealing with references in the context of DTDs, and it 
doesn't seem that hard to make sense of href.

Another option is to allow the circularity.  This message was brought to you 
(at least partway) by the Internet Protocol, IP, defined in RFC 791.  IP 
includes, and indeed requires, the services of ICMP (defined in RFC 792).  
ICMP uses IP to get from one place to another.  Circular?  Yep.  Workable? 
Certainly.  IP isn't allowed to generate extra ICMP messages about the 
delivery of an ICMP message. There is no circle in practice.  Nor would there 
be a circle in _practice_ by allowing the level 1 spec to refer to the hrefs 
described in XLink, or to simply use href without further consideration.

>foo.xdtd:
>
><ELEMENT-TYPE NAME="TEST"><MODEL>TEST2</MODEL></ELEMENT-TYPE>
><!--ELEMENT TEST (TEST2)-->
>
><ELEMENT-TYPE NAME="TEST2"><EMPTY/></ELEMENT-TYPE>
><!--ELEMENT TEST2 EMPTY-->
>
><ENTITY NAME="foo"><CONTENT HREF="entity.com?ID(FOO)"/></ENTITY>
><!--ENTITY foo SYSTEM "..."-->
>
>
>foo.xml:
><!DOCTYPE TEST SYSTEM "foo.xdtd">
><TEST>
>&foo;
></TEST>
>
>Does the processor have to go and fetch foo.xdtd, read it and understand
>it before it can know the contents of this document?

No more than it needs to in the current system, as stated in section 4.1:
X>Note that if entities are declared in the external subset 
X>or in external parameter entities, a non-validating processor 
X>is _not_ _obligated_ _to_ read and process their declarations; 
X>for such documents, the rule that an entity must be declared 
X>is a well-formedness constraint only if _standalone='yes'_.

>Well-formed documents can have entities. In fact, all XML documents that
>have entities are well-formed.

In fact, technically, 
X>A data object is an XML document if it is _well-formed_, 
X>as defined in this specification

>The semantics of XML DTDs are *fixed*, not extensible.
>Any parser that interpreted processing instructions as commands to change
>the parse would be *wrong*. But you propose that DTDs should become
>extensible. Since DTDs can change the parse (radically, in some cases),
>your proposal would allow DTD extensions to make documents specific to
>particular processors, unless an amended proposal explicitly disallows
>that.

I think you're dramatically misreading my argument, deliberately making this a 
bogeyman when it isn't.  I see no reason why malicious DTDs would be allowed 
to 'change the parse' any more than current DTDs would be.  Extensible DTDs do 
_not_ mean that anything goes.  Behavior can be proscribed, rules can be set.  
A DTD in this proposal would be allowed to add things to the the parse, not 
change the fundamental rules set in level 1.  Perhaps I should make this more 
explicit in the proposal - since the proposal is to 'map' XML DTD syntax to 
XML document syntax, it seemed reasonable to me that the same strictures 
demanded for processing an XML DTD would apply here.

>DTD's don't modify the document syntax and markup, but they do modify the
>parse tree created by the document. In other words, they modify its
>semantics. If you replace DTDs with something "extensible", you must
>expect them to be able to modify the parse tree in extensible ways, unless
>you explicitly disallow this in your proposal. 

I don't think this is difficult; the types of extensions allowed can be 
limited to a reasonable set (data types, for instance) and expanded through 
the standards process when it appears necessary.  Not everyone may want to 
wait, of course, but they'd find a way to get around DTDs anyway.  I think 
you're going to see plenty of ersatz XML in practice anyway - one of the great 
things about SAX is that people can put _any_ kind of parser underneath it and 
watch it spit out nice-looking XML on top.  As Chris Maden pointed out on 
another topic,

CM>Insight: XML != SAX.

>DTDs are schemata. They also do more. They can change the parse. That's
>what makes them complex and part of what makes *XML* complex. If you try
>to do everything that DTDs do, then your new language will also be
>needlessly complex. If you do not, then you are not replacing DTDs but
>rather inventing something new. It sounds like a new schema language to
>me.

Well, we'll see what happens.  This proposal is only starting out, and 
complexities always look simpler at the beginning.  Anyone who would like to 
help me figure out ways of expressing DTDs in XML document syntax is welcome 
to join this project - and suggestions for making sure that these DTDs don't 
change the parse in violent ways are also welcome.  Even if this solution 
isn't perfect, it opens up a lot of questions that are worth asking about the 
current way of doing things.

Simon St.Laurent
Dynamic HTML: A Primer / XML: A Primer / Cookies


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list