Re WF, V, and MSXML

Peter Murray-Rust Peter at ursus.demon.co.uk
Tue Jun 10 23:36:06 BST 1997


In message <011290D45A8ACF119B8B00805FD471D6033DA955 at RED-24-MSG.dns.microsoft.com> David Schach writes:
[...]
> 	[David Schach]  The XML spec seems to address this issue in
> section 2.20 Required Markup Declaration. 

	My problem is with the equivalence or not of the words 'parse',
'process' and 'validate'.  I hope this isn't being seen as mindless pickiness.

> 
> 		In an RMD, the value NONE indicates that an XML
> processor can parse the document correctly without first reading  any
                ^^^^^
If RMD=NONE then the document cannot be validated.  Therefore "parse"!="validate"

> part of the DTD.  The value INTERNAL indicates that the XML processor
> must read and process the internal subset of the DTD, if provided, to
                ^^^^^^^
Presumable means extract the structure of the DTD for 'processing' the document.

> parse the containing document correctly.  The value ALL indicates that
> the XML processor must read and process the declarations in both the
                                  ^^^^^^^
i.e. interpret the DTD subset(s)

> subsets of the DTD, if provided, to parse the containing document
                                      ^^^^^
> correctly.
> 
> 		...
> 
> 		If no RMD is provided, an XML processor must behave as
> though an RMD had been provided with the value ALL.    [David Schach]
> (emphasis added) 

Here is a possible document

<?XML VERSION="1.0" RMD="INTERNAL"?> <!-- Parser, you have to parse me -->
<!DOCTYPE FOO [                      
<!ELEMENT FOO EMPTY>
<!ATTLIST FOO XYZZY CDATA #FIXED "Y2"> 
]>                  <!-- my internal subset is for adding Attvals -->
<FOO BAR="PLUGH"/>

Now, on the argument above (document is in control) the processor parses the 
document.  It cannot be valid, but does the processor try?  If yes, it fails.
The result is either a null document, *or* error recovery to WF parsing.
If the parser does not try to validate, the result is

<FOO XYZZY="Y2" BAR="PLUGH"/>

However, although the spec [5] mentions processors that validate and 
non-validate, in other places (e.g. [2.8]) it uses the phrase 'reads the 
DTD'.  This implies that there are (possibly) three classes of processor:

- a validator (which must always read the DTD)
- a busy non-validator (which reads the DTD not for validation, but for 
	extracting DTD-based markup)
- a lazy non-validator (which does not read the DTD).

The lazy non-validator will produce a different output from the busy 
non-validator, i.e.:

<FOO BAR="PLUGH"/>

The lazy non-validator could be in violation of the spec if the RMD requires
it to parse the DTD subset(s).  Maybe it parses them but throws them away
(i.e. 'does not read' == 'reads and forgets').

	P.



-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)




More information about the Xml-dev mailing list