Error Reporting: XML vs ISO 8879

David Megginson ak117 at freenet.carleton.ca
Sat Dec 13 18:46:16 GMT 1997


This has been a fascinating discussion on what XML conformance means
in an XML processor -- I think that it has helped people like me (who
are not in the SIG or in the WG) to understand more of the WG's
reasoning on the very strict rules for XML conformance.


SGML PARALLELS
--------------

I recognise James's concern that explicitly allowing
non-error-reporting XML processors could cause non-conforming variants
of XML to become common -- given the unfortunate history of HTML, I am
not prepared to dismiss that concern lightly.  It is surprising,
however, that although some proponents (not James) claim XML as "a
simplified form of SGML," XML is actually much more rigid than full
SGML on this point.  Let me quote from a (non-normative) note to the
SGML standard, ISO 8879:1986, clause 15.4:

  NOTE -- A conforming SGML system need not have a validating SGML
  parser.  Implementors can therefore decide whether to incur the
  overhead of validation in a given system.  A user whose text editing
  system allowed the validation and correction of SGML documents, for
  example, would not require the validation process to be repeated
  when the documents are processed by a formatting system.

In other words, if I have read the standard correctly (something that
all of us fail to do at times), full SGML allows parsers that do not
report errors, but XML does not.

It is ironic that we can call PSGML a "conforming, non-validating"
SGML editor, but that we must call it a "non-conforming" XML editor
(even with my XML patches).


CODE SIZE AND THE INTERNET
--------------------------

This inflexibility on XML's part is especially surprising given that
XML is designed for the Internet, where code size (whether for Java
applets or ActiveX controls) is _much_ more critical than it is in a
closed system.

Imagine a Java programmer who has just written a 100K applet, and is
considering adding XML support as an extra feature.  I am concerned
that we could not convince that programmer to add even a 24K XML
parser like Ælfred (especially after she's spent three weeks
optimising for size); we certainly will not convince her to add 50K or
100K of class files for a full error-reporting XML parser, doubling
the size of the applet.  As it stands, however, her applet will be
non-conforming unless it uses a conforming parser, so strictly
speaking, the programmer will not be able to claim XML support if she
uses a smaller XML parser like Ælfred.

Ideally, I'd like to get Ælfred to under 10K to help with acceptance
in the Java community; practically, I'll be thrilled if I can get it
down to under 20K.  I cannot justify bloating it to 40K or 50K.


PRAGMATISM AND DEVIANT BEHAVIOUR
--------------------------------

The strongest argument, however, comes from pragmatism.  A W3C
recommendation has relatively little moral force compared even to an
IETF RFC, much less an International Standard, so if conformance is
too difficult, most people just won't bother conforming (look at some
of the widely-ignored HTML drafts that have come out).

It makes sense, then, for XML to try to channel and regulate deviant
behaviour rather than simply looking away and denying its existence.
Instead of declaring every simple, non-error-reporting processor
"non-conforming" (and thus, not regulating it at all), why not define
a standard behaviour for those parsers as well, and create standard
terms for labelling them?  At least then, people will know what
they're getting.


GUARDING THE GRAIL
------------------

Like a former rebel who has just found a job, bought a house, or
become a new parent, the XML WG now has something to protect, and they
are naturally adapting precisely the conservatism that a vocal
minority of XML supporters used to attack in the SGML establishment
(and sometimes, as in the case of error-reporting, they have outdone
the SGML community in their conservatism).

This is a normal and expected development, but I expect that
privately, at least, some of the original XML evangelists must be
starting to look more sympathetically at what they used to consider
unnecessary rigidity and purism in the SGML community.



All the best,


David

--
David Megginson                 ak117 at freenet.carleton.ca
Microstar Software Ltd.         dmeggins at microstar.com
      http://home.sprynet.com/sprynet/dmeggins/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list