Conformance in XML processors

Sat Jan 17 16:26:53 GMT 1998

I apologize in advance for being somewhat acerbic.  I think that 
there are areas that the PR could be more clear, in particular what
gets passed to the app, but what is required by way of DTD handling
is pretty crystal clear, and blatantly incorrect statements being presented
as facts at this point in history is dangerous and rather irritating. -Tim

At 11:10 AM 17/01/98, Peter Murray-Rust wrote:

>One design goal (4 in spec) is that it should be "easy to write programs
>which process XML documents".  If that is interpreted that it is "easy to
>write software that processes *all* XML documents, throwing errors wherever
>one is required", then that goal is already lost. For example, James Clark
>has come up with about 140 carefully incorrect XML documents 

... and both James' processor and Lark detect all 164 errors, modulo
to-be-fixed ambiguities on weird boundary conditions.  I will
be astounded if, in the not-too-distant future, due to input from
Microsoft and Netscape, every desktop doesn't come with a couple of fully
conformant XML processors built-in.  Yes, I agree that we didn't do
as well on that design goal as I would have liked; but the empirical
fact is that the software is already there.

>However, I think there will be domains where the full functionality (or at
>least the full syntax) of XML will not be used. In that case there will be
>"simple tools" that process XML documents. Not *all* XML documents, but a
>lot.

If there are widely-available fully-conformant processors which are
already there in the browser and OS, why would you want to use a 
"simple tool" which will fail to accept conformant documents?  Seems
like a way to lose customers, to me.

> It seems to me reasonable that these tools can tell the user if they
>can't process a document.

It seems highly unreasonable to me; if I create a legal XML document
in my nice Frame or Arbortext or SoftQuad software, and send it to you,
and you say "oooh icky, that's too complicated for poor little me" you
can expect vehement and sincere complaints.

>But I suspect there will be a number of tools which don't support the whole
>spec 

I doubt it.  Ooops, clarification, there will be tons of tools which
don't validate.  But when it is the case that both major browsers
accept all conformant documents and turf non-WF docs, then there will
be de facto a culture that will be intolerant of broken tools.
Thank goodness.

> We have frequently talked about the Desperate Perl Hacker
>writing tools which are sufficient to process a class of XML documents, but
>not all. 

Yes, but they don't claim to be XML processors.  And that's just fine.

>A Document + DTD + request to validate document. Requires a validating parser.

Right.

>B Document + full DTD but no request to validate. 

Right.  We assume this document is WF, right?

>C Document + parts of a DTD (e.g. a few ELEMENTs and ATTLISTs, maybe an
>external subset which covers some of the ELEMENTs in the document).

If no request to validate, the fact of missing <!ELEMENT declarations
is not required to have any effect, and applications must not depend
on any behavior contingent on the processing of an <!ELEMENT or
<!NOTATION declaration.

>D Document with no internal or external subset. Can only be well-formed.

Right.

>What the difference between A and B is is not clear to me.  

Only the request to validate.  Lots of WF docs will in fact be valid,
but be called WF simply because some app has no need to validate.

>Note that Lark and AElfred both throw errors for 
><!DOCTYPE FOO SYSTEM "bar.dtd">
>if bar.dtd cannot be found. 

No.  If you do lark.processEternalEntities(false) then it won't
try to fetch the DTD.  (Since "file:" URL's are in general a pool of
blood on Microsoft operating systems, I recommend doing this most
of the time).

>C is similar to B, but validation is not possible. It is *essential* that
>if ATTLISTs and ENTITYs (and NOTATION) exist, then the information in them
>MUST be applied to the document. 

No.  The spec is clear; a non-validating processor is required to
do internal entities and default attribute values.  Nobody should
expect one to do anything with notations or unparsed entities or
anything else.  You want that, get a validating processor.  

>*IFF* an ENTITY is declared (case C), the parser MUST process it.

If it's a non-validating processor, this is only true for *internal*
entities.

 -Tim

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)