Between raw and cooked II: Are? DTDs are just for validation
Marc.McDonald at Design-Intelligence.com
Marc.McDonald at Design-Intelligence.com
Fri Apr 2 02:35:21 BST 1999
Perhaps DTDs are being used for too many purposes - validation and
defaulting attributes/defining entities.
The argument is made that once a document has been validated, there is
no need to validate it again in a parser. Hence the concept of a
conforming rather than validating parser. This is a good idea, but the
details of attribute defaults and entity definitions get in the way.
So, let's divorce the idea of validity from parsing. Instead of using
a DTD use a URI that identifies the structure that the document
conforms to. A DTD cannot describe all of the restrictions on the
structure of elements in a document, the pattern syntax is too
limiting. It may take the combination of validating against a DTD and
then an application examining the resultant tree to truly define
validity. There's no way to specify the set of valid zip codes or Visa
card numbers in a DTD, but an application could verify them.
A document may still may reference a DTD, but it contains default
attribute values and entity definitions not element structure. The
document doesn't declare how it is parsed (valid or conforming),
The processing application that receives the document controls
parsing. It may just request conformance parsing and its own code may
default attributes and expand entities. Or, it may instruct the parser
to parse according to an application specified DTD that the
application knows corresponds to the URI in the document.
A URI identifying element structure does not have to have a
corresponding DTD. It may describe an application that has been coded
to process it, such as \\IRS\1998\ScheduleD.
Under this model:
1. Conforming parsers additionally can parse a DTD but only attribute
and entity declarations.
2. A document can certify it conforms to a structure identified by a
URI (certificate of authenticity). An application may be able to
associate the URI with a DTD, or the URI may select an application
that understands the structure.
3. A validating parser can have a DTD specified to it by the
application using the parser and will use the element structure
definitions in the DTD to validate the document,
A little food for thought,
Marc B McDonald
Principal Software Scientist
Design Intelligence, Inc
www.design-intelligence.com
----------
From: Didier PH Martin [SMTP:martind at netfolder.com]
Sent: Thursday, April 01, 1999 7:53 AM
To: 'XML Dev'
Subject: RE: Between raw and cooked II: Are? DTDs are just for
validation
HI Jonathan,
<YourComment>
If DTDs *were* only for validation there would be no issue here.
However
DTDs provide additional functionality beyond validation, namely
default
attributes and entities. The problem exists in that XML parsers can
*choose*
whether or not to validate and in so doing the <em>information
content</em>
of the XML document is altered.
Validation is optional. Says so. Given this, the question becomes:
ought
parsers be allowed to expand entities and default attributes with
validation
turned off? What problem does this create?
Perhaps the XML spec should properly specify that:
*if* a DOCTYPE declaration is present which specifies a DTD
then
the document must be validated else the parser must generate an
error.
(DOCTYPE declarations would remain optional).
In this way document authors would be able to properly specify
information content.
</YourComment>
<Reply>
Thanks for bringing back the issue at its source: the spec. According
to the
spec nothing is said about how to interpret a document. It just say
how a
document is to formatted but not how it is to be interpreted. Now that
real
stuff is going out we see that holes are in the architecture. The
holes
being: what do we do with this? this question is dependent on type of
interpreters like:
a) browsers
b) ERP front ends and back ends
c) repositories
d) any other stuff I am not think of right now
there is no specs on how you do interpret or parse a document in the
context
of a browser. Your suggestion is a constructive one. You propose that
the
next spec version reduces the ambiguity on the parsing stage by
including in
the specs the parsing rule. the specs should also reduces the
ambiguity with
external references, so, to speak, to explicitly state if a parser
should
consider the presence of a DTD as a signal to validate the document.
Actually it is leaved at the mercy of the implementer and no
specifications
are available to dictate the rules of conduct.
Thanks Jonathan for a constructive comment. Any other constructive
opinion?
I mean here, any suggestions concerning the rules or more specifically
the
specs?
</reply>
Regards
Didier PH Martin
mailto:martind at netfolder.com
http://www.netfolder.com
xml-dev: A list for W3C XML Developers. To post,
mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on
CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following
message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list