MSXML, WF, and Validity

Peter Murray-Rust Peter at
Sun Jun 8 20:33:14 BST 1997

In message <199706071847.LAA31370 at> Terry Allen writes:
> Jean Paoli wrote:
> | The Microsoft XML Parser is a validating XML parser written in Java. 
> | Once parsed, the XML document is exposed as a tree through a simple set
> | of Java methods. 
> After playing with it for awhile this morning I found myself wondering
> about WF and validity; I don't know if the following counts as a bug,
> but it would be useful to hear what other think.

I have worried about this as well - I may have mentioned it on the XML-WG.
I don't think it's a bug, but rather that the spec does not give a clear 
guideline on *when* validation is expected.  I am sure some ERB members will
see this discussion.

> My input is:
> <?XML version="1.0" encoding="UTF-8" ?>
> <!doctype book [
> <!element book (title, chapter+)>
> <!entity foo "bar">
> ]>
> <book><title>Palmy Days</title>
> <chapter><title>One Frond at a Time</title>
> <para>It was a dark and stormy night.  The crows clattered
> amongst the fronds.  
> </para>
> <para>&foo;</para>
> </chapter>
> </book>

IMO this is a WF document, but not a valid one.

> I stuck the DTD in the internal subset because I couldn't get the
> parser to find an external DTD.  The output of 
>   jview msxml -d palmy
> is
[... normalised expanded prettyprinted output deleted...]

> Now the declarations in the internal subset have been read (and munged),
> and the foo:bar entity expansion has been performed.  Yet the instance
> does not conform to the "DTD" in the internal subset, although taken
> on its own it is well formed.  Is the input file "palmy" a valid
> XML document?  The VC comment following [36] indicates not.  Is it
> WF?  I can't find a WF comment indicating that the document must
It's certainly WF as far as I see it.

> conform to the DTD (which is reasonable, although perhaps this point
> should be covered explictly).  Is MSXML only parsing "palmy" as WF?
> If not, is this error recovery?
> These (real, not rhetorical) questions are of interest whether or
> not this is the intended behavior of MSXML.
My view is based on Norbert's NXP which has a commandline switch -v
(i.e. require validation).  This is run clientside.  IOW if the document
above had been run through NXP it would have passed it as WF, but failed it
IFF the -v flag was set.

There are three possible places to request validation:
	- at author level (i.e. some instruction in the document stating that
		the document is validatable.  The ERB may wish to include this
		as a component in the XMLDecl or RMDecl (or elsewhere)
	- at human client level (e.g. -v in NXP)
	- at software/application level (i.e. this software will ONLY work
		with valid documents

Note that an internal subset may be present for other reasons than validation
(adding attribute values and types, as required for XML-LINK, for example).
Therefore I do not think the author's intentions can be deduced from the
presence of an internal subset.  Presumably a pointer (SYSTEM) to an
external DTD is likely to refer to a DTD which can be used for validation, but
I'm not sure whether this is explicit.

In summary I think that MSXML is capable of validation - I'm not clear whether
it *always* tries to validate, and if it can't decides simply to check for WF.
I think we need guidance on this.


Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences

xml-dev: A list for W3C XML Developers
Archived as:
To unsubscribe, send to majordomo at the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at

More information about the Xml-dev mailing list