Conformance in XML processors

Sun Jan 18 01:58:56 GMT 1998

At 08:25 17/01/98 -0800, Tim Bray wrote:
>I apologize in advance for being somewhat acerbic.  I think that 
>there are areas that the PR could be more clear, in particular what
>gets passed to the app, but what is required by way of DTD handling
>is pretty crystal clear, and blatantly incorrect statements being presented
>as facts at this point in history is dangerous and rather irritating. -Tim

I apologise if I have made incorrect statements - I do read the spec and I
tried to choose my words carefully and I also don't like upsetting people.

>
>At 11:10 AM 17/01/98, Peter Murray-Rust wrote:
>
>>One design goal (4 in spec) is that it should be "easy to write programs
>>which process XML documents".  If that is interpreted that it is "easy to

I suspect that the word 'process' has caused some confusion. My reading is
'software that does something useful to some subset of users' whereas
others have taken this to mean that any software that processes XML
documents is a 'processor' in the words of the spec. My contribution was
intended to address those people who were building processing software (NOT
processors) and had designed them so that they had a function more limited
than a full processor. 

>
>If there are widely-available fully-conformant processors which are
>already there in the browser and OS, why would you want to use a 
>"simple tool" which will fail to accept conformant documents?  Seems
>like a way to lose customers, to me.

Not all XML applications will wish to use browsers  - they may wish to call
parsing functionality from C programs, UNIX shells and other places. I
agree wholeheartedly that if XML libraries are universally available then
there shouldn't be a problem. That is one reason why I'm keen to see SAX
available in other languages than Java.   However I have many colleagues
who still use FORTRAN and other languages  where I suspect it will be some
time before e a set of XML libraries become available.

>
>It seems highly unreasonable to me; if I create a legal XML document
>in my nice Frame or Arbortext or SoftQuad software, and send it to you,
>and you say "oooh icky, that's too complicated for poor little me" you
>can expect vehement and sincere complaints.

Perhaps my experience has been clouded by  early exposure to C++, but it
was extremely common there to find that different compilers had different
functionality. If this is a non-problem for XML I rejoice. 

[...]
>
>> We have frequently talked about the Desperate Perl Hacker
>>writing tools which are sufficient to process a class of XML documents, but
>>not all. 
>
>Yes, but they don't claim to be XML processors.  And that's just fine.

I did not claim that any of the software that I was talking about was an
"XML processor". I talked about software that "processed XML documents".
[I think the use of the term "processor" is confusing, as I believe that is
it possible to process XML documents without using a "processor" ] If goal
4 actually means "all software that acts upon XML documents must be a bona
fide XML processor" I would take issue.  So I shall have to use a phrase
like "act upon" if "process" has a specific meaning. 

	P.

>
>>A Document + DTD + request to validate document. Requires a validating
parser.
>
>Right.
>
>>B Document + full DTD but no request to validate. 
>
>Right.  We assume this document is WF, right?
>
>>C Document + parts of a DTD (e.g. a few ELEMENTs and ATTLISTs, maybe an
>>external subset which covers some of the ELEMENTs in the document).
>
>If no request to validate, the fact of missing <!ELEMENT declarations
>is not required to have any effect, and applications must not depend
>on any behavior contingent on the processing of an <!ELEMENT or
><!NOTATION declaration.
>
>>D Document with no internal or external subset. Can only be well-formed.
>
>Right.
>
>>What the difference between A and B is is not clear to me.  
>
>Only the request to validate.  Lots of WF docs will in fact be valid,
>but be called WF simply because some app has no need to validate.
>
>>Note that Lark and AElfred both throw errors for 
>><!DOCTYPE FOO SYSTEM "bar.dtd">
>>if bar.dtd cannot be found. 
>
>No.  If you do lark.processEternalEntities(false) then it won't
>try to fetch the DTD.  (Since "file:" URL's are in general a pool of
>blood on Microsoft operating systems, I recommend doing this most
>of the time).
>
>>C is similar to B, but validation is not possible. It is *essential* that
>>if ATTLISTs and ENTITYs (and NOTATION) exist, then the information in them
>>MUST be applied to the document. 
>
>No.  The spec is clear; a non-validating processor is required to
>do internal entities and default attribute values.  Nobody should
>expect one to do anything with notations or unparsed entities or
>anything else.  You want that, get a validating processor.  
>
>>*IFF* an ENTITY is declared (case C), the parser MUST process it.
>
>If it's a non-validating processor, this is only true for *internal*
>entities.
>
> -Tim
>
>
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
>
>
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)