Re WF, V, and MSXML
Peter Murray-Rust
Peter at ursus.demon.co.uk
Tue Jun 10 23:36:06 BST 1997
In message <011290D45A8ACF119B8B00805FD471D6033DA955 at RED-24-MSG.dns.microsoft.com> David Schach writes:
[...]
> [David Schach] The XML spec seems to address this issue in
> section 2.20 Required Markup Declaration.
My problem is with the equivalence or not of the words 'parse',
'process' and 'validate'. I hope this isn't being seen as mindless pickiness.
>
> In an RMD, the value NONE indicates that an XML
> processor can parse the document correctly without first reading any
^^^^^
If RMD=NONE then the document cannot be validated. Therefore "parse"!="validate"
> part of the DTD. The value INTERNAL indicates that the XML processor
> must read and process the internal subset of the DTD, if provided, to
^^^^^^^
Presumable means extract the structure of the DTD for 'processing' the document.
> parse the containing document correctly. The value ALL indicates that
> the XML processor must read and process the declarations in both the
^^^^^^^
i.e. interpret the DTD subset(s)
> subsets of the DTD, if provided, to parse the containing document
^^^^^
> correctly.
>
> ...
>
> If no RMD is provided, an XML processor must behave as
> though an RMD had been provided with the value ALL. [David Schach]
> (emphasis added)
Here is a possible document
<?XML VERSION="1.0" RMD="INTERNAL"?> <!-- Parser, you have to parse me -->
<!DOCTYPE FOO [
<!ELEMENT FOO EMPTY>
<!ATTLIST FOO XYZZY CDATA #FIXED "Y2">
]> <!-- my internal subset is for adding Attvals -->
<FOO BAR="PLUGH"/>
Now, on the argument above (document is in control) the processor parses the
document. It cannot be valid, but does the processor try? If yes, it fails.
The result is either a null document, *or* error recovery to WF parsing.
If the parser does not try to validate, the result is
<FOO XYZZY="Y2" BAR="PLUGH"/>
However, although the spec [5] mentions processors that validate and
non-validate, in other places (e.g. [2.8]) it uses the phrase 'reads the
DTD'. This implies that there are (possibly) three classes of processor:
- a validator (which must always read the DTD)
- a busy non-validator (which reads the DTD not for validation, but for
extracting DTD-based markup)
- a lazy non-validator (which does not read the DTD).
The lazy non-validator will produce a different output from the busy
non-validator, i.e.:
<FOO BAR="PLUGH"/>
The lazy non-validator could be in violation of the spec if the RMD requires
it to parse the DTD subset(s). Maybe it parses them but throws them away
(i.e. 'does not read' == 'reads and forgets').
P.
--
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/
xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list