Call for unifying and clarifying XML 1.0, DOM, XPATH, and XML Infoset

Lars Marius Garshol larsga at
Fri Jan 28 15:38:31 GMT 2000

* Lars Marius Garshol
| So I see several reasons for the general failure to understand what
| groves are all about: [...]

* Nils Klarlund
| Or maybe that groves are just too abstract? 

I don't really think so. The concept of objects (nodes) with
attributes (properties) and a defining set of declarations (property
set) should be familiar to most programmers.

| In fact, even the XML quintessence, trees, is not a clear sell:
| recursion and trees are a standard part of a computer science
| curriculum, but these concepts are not easily swallowed by all.

I think you have to elaborate on this; at least I have no idea what
you are referring to here.

| I am not qualified to comment on SGML itself, but even XML 1.0 does
| appear to be suffering from over-conceptualization (too many
| concepts that don't fit together too precisely).  As a simple
| example, look at content models:
| - a content model is not a model for content in general, but only two
|   kinds of content, namely elements and character data, not processing
|   instructions and not comments (incidentally, it could have been
|   termed "markup model" as well I think, since markup is a more
|   general concept than content)
| - the content model concept is further split into two concepts:
|   (1) element content, which allows only elements in content
|   (2) mixed content, which allows character data interspersed
|       with elements

This criticism is undeserved, I think. Content models describe the
allowed _structural_ content of an element. Comments are not affected
since they are not considered part of the document at all, which
definitely makes sense. PIs are not affected by content models because
they are not considered part of the structure described by a DTD,
being extensions added orthogonally across applications.

As for mixed and element content I don't really see how that qualifies
as two concepts. It's quite simply a means of saying where text is
allowed. Admittedly the allowed forms of mixed content models is
something of a special case, but that has its reasons and doesn't have
anything to do with over-conceptualization.
| An alternative approach would have declared "content" to simply
| consist of just element nodes and text nodes ("text nodes" as in
| XPATH) representing character data.  Then there would be no need for
| (2), since a content model now describes a regular language over the
| alphabet consisting of what you would expect: element names and the
| token text() (or #PCDATA).  And, you'd be able to describe, say,
| HTML with Appendix elements that must appear at the end:
|   ((#PCDATA | H1 | H2 |...)*, Appendix*)

SGML has this, but it was cut in XML because of all the problems with
it. That decision might be contested on various grounds, but I don't
think this has anything to do with over-conceptualization.
--Lars M.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as: or CD-ROM/ISBN 981-02-3594-1
Unsubscribe by posting to majordom at the message
unsubscribe xml-dev  (or)
unsubscribe xml-dev your-subscribed-email at your-subscribed-address

Please note: New list subscriptions now closed in preparation for transfer to OASIS.

More information about the Xml-dev mailing list