Between raw and cooked II: Are? DTDs are just for validation

Fri Apr 2 02:35:21 BST 1999

Perhaps DTDs are being used for too many purposes - validation and 
defaulting attributes/defining entities.

The argument is made that once a document has been validated, there is 
no need to validate it again in a parser. Hence the concept of a 
conforming rather than validating parser. This is a good idea, but the 
details of attribute defaults and entity definitions get in the way.

So, let's divorce the idea of validity from parsing. Instead of using 
a DTD use a URI that identifies the structure that the document 
conforms to. A DTD cannot describe all of the restrictions on the 
structure of elements in a document, the pattern syntax is too 
limiting. It may take the combination of validating against a DTD and 
then an application examining the resultant tree to truly define 
validity. There's no way to specify the set of valid zip codes or Visa 
card numbers in a DTD, but an application could verify them.

A document may still may reference a DTD, but it contains default 
attribute values and entity definitions not element structure. The 
document doesn't declare how it is parsed (valid or conforming),

The processing application that receives the document controls 
parsing. It may just request conformance parsing and its own code may 
default attributes and expand entities. Or, it may instruct the parser 
to parse according to an application specified DTD that the 
application knows corresponds to the URI in the document.

A URI identifying element structure does not have to have a 
corresponding DTD. It may describe an application that has been coded 
to process it, such as \\IRS\1998\ScheduleD.

Under this model:
1.	Conforming parsers additionally can parse a DTD but only attribute 
and entity declarations.
2.	A document can certify it conforms to a structure identified by a 
URI (certificate of authenticity). An application may be able to 
associate the URI with a DTD, or the URI may select an application 
that understands the structure.
3.	A validating parser can have a DTD specified to it by the 
application using the parser and will use the element structure 
definitions in the DTD to validate the document,

A little food for thought,

Marc B McDonald
Principal Software Scientist
Design Intelligence, Inc
www.design-intelligence.com

----------
From:  Didier PH Martin [SMTP:martind at netfolder.com]
Sent:  Thursday, April 01, 1999 7:53 AM
To:  'XML Dev'
Subject:  RE: Between raw and cooked II: Are? DTDs are just for 
validation

HI Jonathan,

<YourComment>
    If DTDs *were* only for validation there would be no issue here. 
However
DTDs provide additional functionality beyond validation, namely 
default
attributes and entities. The problem exists in that XML parsers can 
*choose*
whether or not to validate and in so doing the <em>information 
content</em>
of the XML document is altered.

    Validation is optional. Says so. Given this, the question becomes: 
ought
parsers be allowed to expand entities and default attributes with 
validation
turned off? What problem does this create?

    Perhaps the XML spec should properly specify that:

         *if* a DOCTYPE declaration is present which specifies a DTD 
then
the document must be validated else the parser must generate an 
error.
(DOCTYPE declarations would remain optional).

    In this way document authors would be able to properly specify
information content.
</YourComment>

<Reply>
Thanks for bringing back the issue at its source: the spec. According 
to the
spec nothing is said about how to interpret a document. It just say 
how a
document is to formatted but not how it is to be interpreted. Now that 
real
stuff is going out we see that holes are in the architecture. The 
holes
being: what do we do with this? this question is dependent on type of
interpreters like:
a) browsers
b) ERP front ends and back ends
c) repositories
d) any other stuff I am not think of right now

there is no specs on how you do interpret or parse a document in the 
context
of a browser. Your suggestion is a constructive one. You propose that 
the
next spec version reduces the ambiguity on the parsing stage by 
including in
the specs the parsing rule. the specs should also reduces the 
ambiguity with
external references, so, to speak, to explicitly state if a parser 
should
consider the presence of a DTD as a signal to validate the document.
Actually it is leaved at the mercy of the implementer and no 
specifications
are available to dictate the rules of conduct.

Thanks Jonathan for a constructive comment. Any other constructive 
opinion?
I mean here, any suggestions concerning the rules or more specifically 
the
specs?
</reply>

Regards
Didier PH Martin
mailto:martind at netfolder.com
http://www.netfolder.com

xml-dev: A list for W3C XML Developers. To post, 
mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on 
CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following 
message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)