Conformance in XML processors

Peter Murray-Rust peter at
Tue Jan 20 22:27:05 GMT 1998

Thanks Paul - you have put it very clearly and it sounds exactly what I was

At 14:19 18/01/98 -0500, Paul Prescod wrote:
>Peter Murray-Rust wrote:
>> Exactly. And I hope that the community is able to develop them. [I am sure
>> all the functionality is present already in SP, but I confess that as a
>> novice to SGML I didn't find it easy to find my way around when I first
>> looked at it. Treat that as a reflection on me.]
>I believe that James Clark has already done most of this work in his XML
>tokenizer (which is distinct from SP).

Better and better.

>I think that we have different ideas about what normalized will look
>like. This is what you are thinking of:
>> <?xml version="1.0"?> <!-- magic incantation -->
>> <MOL NAME="water" xml:lang="EN">
>> <ATOM ID="O1">O</ATOM>
>> <ATOM ID="H2">H</ATOM>
>> <ATOM ID="H3">H</ATOM>
>> <BOND>O1 H2</BOND>
>> <BOND>O1 H3</BOND>
>> <DC:author>Doe</DC:author>
>> </MOL>
>This is what I am thinking of:
><MOL NAME="water" xml:lang="EN">
><ATOM ID="O1">
><ATOM ID="H2">
><ATOM ID="H3">
>O1 H2
>O1 H3

I am happier with yours :-) [You seem to have newlines in some tags and not
others, is this intended?]

>In other words, I am thinking about a subset of XML so simple that it is
>trivial to parse and so annoying that no human being would ever want to
>type it directly except for testing out their "reader". I would

Exactly. Most of the stuff I am concerned about will be generated by tools.

>explicitly disallow the magical incantation to discourage people from
>piping in ordinary XML documents (and thus from thinking that this
>reader is making any attempt to be an XML processor).
>> Essentially such a file is a subset of the ESIS information (no attribute
>> typing, no entities, no notation) and uses no CDATA or entity references.
>> It is my contention that there will be many people (some will be DPHs) who
>> will be quite happy to create XML files no more sophisticated than this and
>> will want *tools* to *operate on* them. 
>Right, I don't think that these tools should be constructed except as a
>stopgap. There is no good reason that these tools should not support all
>of XML. When people write these simple XML documents and find that their
>tools will not support more, they will inevitably get confused (just as
>most people do with C++) about exactly what XML *is*.

The only reason - and it's probably not "good" - is that the effort to
create or install a solution is too great for the problem at hand. And it
costs money and time.
>I proposed a processor in Fortran that only accepts the output of a
>normalizer, but I do not think that it should not be billed as an XML
>processor, any more than a Fortran program that accepts ESIS would be
>called an SGML parser. The documentation should says: "This Fortran
>program accepts the output of xmlnorm" and leave it at that. In other
>words, xmlnorm becomes an implicit component in the system.

Yes - I like this. Is your use of 'xmlnorm' fictitious, or is such a beast
emerging from the current tools.


Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS, Virtual Hyperglossary

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as:
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list