Re WF, V, and MSXML

Peter Murray-Rust Peter at ursus.demon.co.uk
Mon Jun 9 11:30:07 BST 1997


In message <199706082339.QAA08654 at bolt.sonic.net> Terry Allen writes:
> 
> Peter Murray-Rust replying to me to him etc.
[... and hoping the WG/ERB are reading this ...]
> [Terry:]
[...]
> 
> Right.  That's why the IETF assigns such importance to running code.

Good point.  That is why XML-DEV is important and why we need people to
create prototypes at this stage.  [Most XML-related software and documents
come into this category because the problems we are encountering may have 
implications on the language.]

[...]
> | 
> | I think this is more a question of terminology.  NXP (Norbert Mikula) is a
> | 'validating parser', but the validation can be switched off.  This is a
> | client-side decision.  So with NXP 'palmy' could be either invalid or WF
> | according to the reader's wishes
> 
> Agreed, but from the viewpoint of the document preparer, it is both.  MSXML
> needs the switch NXP has.  I think the behavior is unintentional, but
> I would be alarmed at a processor/parser (they mean the same to me in
> this context) that attempted to parse for validity, and if it found
> an error, silently switched to WF-parse mode.

I'd agree with this analysis, and haven't been silent on the issue.  IMO it 
is more important for the WG/ERB to address *this* problem than some of the 
proposed extensions.  The concept of WFness is NEW!!  It is more subtle than
people realise.  A fundamental problem is that there is no clear internal
flag in the document stating what the validity/WFness of the current document
is, is meant to be, was, etc.  As Terry says, it's particularly likely that
a WF document could (possibly erroneously) mutate into a valid one.  I am
sure that any confusion about MSXML is not intentional and is due to the issue
not be prominent in the spec.  

<PROPOSAL>
All parsers (i.e. tools that take XML documents and apply the criteria in 
XML-LANG only) should state their attitude and behaviour to WFness and validity.
</PROPOSAL>

The possible options include at least:
	- nsgmls-like.  Full validation is the only option.  Any non-valid
		dcoument is flagged and appropriate error messages or error
		action is initiated.  
	- Lark-like (at least V0.88 - I think there is another coming).  No
		validation can be attempted.  Any 'output' can only be WF or
		in error.  NOTE: what does Lark do with the internal subset?
	- NXP-like.  Validation can be switched on or off by the 'client'.
		How this is transmitted to the application is application
		dependent at present.
	- MSXML-like.  Undocumented at present.  Possibly [though Terry and I
		hope not] validating by default, and changing to WF if this
		fails.
> 
[...]
> Point taken; but the spec is not entirely clean on this point.  If the
> application requests the processor to process, the processor must
> inform the application of certain things.  And it is hard to get
> around
> 
> "*An XML processor which does not read the DTD must always pass all 
> characters in a document that are not markup through to the application.* 

Ah!  I had assumed the internal subset as 'markup' - you see it as part
of the document.  We need a ruling on this :-).  Obviously if the DTD appears
***in the processed document***, then it could be interpreted as having been
read and used for validation.

[...]
> 
> | what is the implied structure of the document in:
> | 
> | <!DOCTYPE FOO [
> | <!ATTLIST FOO XML-LINK CDATA #FIXED "SIMPLE">
> | ]>
> | <FOO HREF="bar"/>
> | 
> | Can we assume that FOO (which has no Element declaration) has an ATTLIST as
> | given, and that therefore it inherits the SHOW and ACTUATE attributes?
> | IOW *must* a parser decorate all matching elements with the ATTLISTS in the 
> | internal subset?
> 
> No, not per XMLlang alone.  FOO's only declared attribute has as its name

My mistake.  I shouldn't have brought the others in.

> the unreserved string "XML-LINK" although it uses an undeclared attribute
> name "HREF".  So it is WF but not valid.

Agreed.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)




More information about the Xml-dev mailing list