Conformance in XML processors

Paul Prescod papresco at
Sun Jan 18 19:20:11 GMT 1998

Peter Murray-Rust wrote:
> Exactly. And I hope that the community is able to develop them. [I am sure
> all the functionality is present already in SP, but I confess that as a
> novice to SGML I didn't find it easy to find my way around when I first
> looked at it. Treat that as a reflection on me.]

I believe that James Clark has already done most of this work in his XML
tokenizer (which is distinct from SP).

I think that we have different ideas about what normalized will look
like. This is what you are thinking of:

> <?xml version="1.0"?> <!-- magic incantation -->
> <MOL NAME="water" xml:lang="EN">
> <ATOM ID="O1">O</ATOM>
> <ATOM ID="H2">H</ATOM>
> <ATOM ID="H3">H</ATOM>
> <BOND>O1 H2</BOND>
> <BOND>O1 H3</BOND>
> <DC:author>Doe</DC:author>
> </MOL>
This is what I am thinking of:

<MOL NAME="water" xml:lang="EN">
<ATOM ID="O1">
<ATOM ID="H2">
<ATOM ID="H3">
O1 H2
O1 H3

In other words, I am thinking about a subset of XML so simple that it is
trivial to parse and so annoying that no human being would ever want to
type it directly except for testing out their "reader". I would
explicitly disallow the magical incantation to discourage people from
piping in ordinary XML documents (and thus from thinking that this
reader is making any attempt to be an XML processor).

> Essentially such a file is a subset of the ESIS information (no attribute
> typing, no entities, no notation) and uses no CDATA or entity references.
> It is my contention that there will be many people (some will be DPHs) who
> will be quite happy to create XML files no more sophisticated than this and
> will want *tools* to *operate on* them. 

Right, I don't think that these tools should be constructed except as a
stopgap. There is no good reason that these tools should not support all
of XML. When people write these simple XML documents and find that their
tools will not support more, they will inevitably get confused (just as
most people do with C++) about exactly what XML *is*.

I proposed a processor in Fortran that only accepts the output of a
normalizer, but I do not think that it should not be billed as an XML
processor, any more than a Fortran program that accepts ESIS would be
called an SGML parser. The documentation should says: "This Fortran
program accepts the output of xmlnorm" and leave it at that. In other
words, xmlnorm becomes an implicit component in the system.

Given these options, I'm not sure why users should accept any tools that
claim partial support for put it another way, human beings
should never have to worry about the limitations of their tools when
they are typing XML.

 Paul Prescod

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as:
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list