XML Information Set is seeding impenetrable language; reconsider?

Nils Klarlund klarlund at research.att.com
Wed Jan 19 17:26:58 GMT 2000

A so-called "last call" working draft has been issued by the W3C for XML
information set. Apparently, this draft has not provoked a lot of
discussion.  The draft
solves the problem: what is the mathematical object that an XML document
The answer is a "tree" defined according to some rather natural rules.

Although the document is cleanly written, it introduces some very cumbersome
which will almost guarantee that future XML specifications will become
unnecessarily hard to read (just look at the current XML Schema document,
part 1).

Would it not be in the interest of the XML community to encourage the
authors use
language that mortals understand (nodes, trees, etc)? I don't believe the
Information Set in its current formulation should be accepted.  Am I alone
in this belief?
If not, then we might perhaps be able to convince the authors, who have
done a fine job, to tidy up the language and the connection to DOM.


PS: below I have enclosed the e-mail that I sent to the official discussion
list yesterday

Subject: XML infoset: please don't

Dear working group members:

XML Information Set terminology unfortunately seems to be having
adverse effects.  I just started rereading the XML Schema draft and
choked right away on the sentence:

  "An element information item is the component of an infoset which
   corresponds to an element."

No one should be forced to write like that! Another example,

   "XML Schema: an XML element information item which, along with its
   descendants, satisfies all the Constraints on Schemas in this

This should have been:

   "XML Schema: an element node which satisfies all the Constraints on

These and many more examples are solid road blocks to the furthering
of XML; personally, they don't make my blood boil, but among the
public, some are enraged (see recent mailings to comp.text.xml).

I then tried to comprehend what an element information item is by
reading the XML Information Set note.  Nothing really deep it turns
out: it's a node in a tree representation of an XML document.  My
objection is that there are now two (at least) different tree models
of XML: DOM and XML information sets.  They are both justified, but I
believe they should be unified in what is (or should be) an obvious

* DOM, being the finer model, is the starting point; the tree model is
  something any programmer can understand, and the most detailed one.

* DOM-I are trees gotten from trees in DOM by a mapping that convert
  CDATA to text and applies concatenate text nodes (by using
  normalize()) (and a couple of other tricks, more complicated it
  shouldn't be).

Canonical XML can now be explained by a simple transformation from

I would encourage that the XML Information Set be substantially
simplified.  Please put stakes through verbiage like "XML element
information item."  And, XML Information Set should be explainable in
one paragraph departing from DOM.  Then, make this paragraph a part of
DOM2 (along with canonical XML, perhaps).



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Please note: New list subscriptions now closed in preparation for transfer to OASIS.

More information about the Xml-dev mailing list