Arrgh! - FW: Call for unifying and clarifying XML 1.0,
DOM, XPATH, and XML Infoset
Simon St.Laurent
simonstl at simonstl.com
Tue Jan 25 20:16:01 GMT 2000
[Forwarded for Kevin Williams <Kevin.Williams at ultraprise.com>]
From: Kevin Williams <Kevin.Williams at ultraprise.com>
To: "'simonstl at simonstl.com'" <simonstl at simonstl.com>
Subject: RE: Arrgh! - FW: Call for unifying and clarifying XML 1.0, DOM, X
PATH, and XML Infoset
Date: Tue, 25 Jan 2000 15:12:40 -0500
Am I missing something on this thread? Here's my understanding:
The Infoset is intended to describe all of the various entities that may
together comprise an XML document. If I may expand on your quote from the
Infoset last call WD:
"The XML information set does not require or favor a specific interface or
class of interfaces. This specification presents the information set as a
tree for the sake of clarity and simplicity, but there is no requirement
that the XML information set be made available through a tree structure;
other types of interfaces, including (but not limited to) event-based and
query-based interfaces are also capable of providing information conforming
to the information set."
The intent here seems to be to allow non-tree-based processors, such as SAX,
to abide by the Infoset specification. In other words, "Here's a pile of
things that are in an XML document. You have to have some of them, and you
can choose to leave others out. They also have to point to one another
somehow. The exact mechanism to be used is not specified in this document."
Otherwise, an event-driven parser like SAX could never conform to the
specification. I don't think the W3C is attempting to describe a content
model in the Infoset specification, but to open the door for non-tree
processors.
As an aside, in the data universe XML documents often are not structured as
trees. For example, in a project I've been working on for the mortgage
industry, a <Property> element may play several roles in a loan - it may be
the subject property, it may be a current address, it may be a piece of real
estate owned by a borrower, and so forth. To avoid repeating the same piece
of information more than once in a document, then, we use IDs and IDREFs to
point to the property, expressing it only once. In this case, a simple tree
structure falls short, and we need to "hop" from branch to branch.
The subject of attributes is a trickier one. I think the chief problem is
that attributes are not ordered; in a tree, then, an attribute might have a
parent, and siblings, but not next and previous siblings. Perhaps it might
have been better to approach the problem as you state, with ordered and
unordered children in the model - however, I think that this is precisely
the model that the Infoset describes (while eschewing the terms "tree" and
"node" to avoid alienating the non-tree processors).
I think that the issue with attributes is really at the core of the problem
here - the fact that neither the DOM nor XPath treat attributes as "real"
nodes. In an application, however, I would think that the role played by
attributes and text elements would be clear and unambiguous, making a
construct such as "@* | node()" only necessary in more esoteric situations.
While I agree that the language in the W3C specifications is ambiguous, even
obtuse, at times, I still feel strongly that imposing the tree structure on
every application that uses XML would be the wrong way to go. If we take
Infoset as a basis, and then assume that the tree-model processor mechanisms
(DOM) and the event-model processor mechanisms (SAX) inherit from it, I
don't think there's a problem - certainly if the DOM and XPath are in
disagreement, they should be resolved, as I would imagine XPath inherits
from the DOM model.
Here's a thought - perhaps there's a missing specification or two needed to
fill in the gaps. From Infoset (which is a descriptive model), we need to
derive two physical models - the tree-based model (very close to what you
described in your first post) and the event-based model (for SAX et. al.)
Indeed, perhaps there's room for a query-based model as well. Then, we would
have tree technologies such as the DOM and XPath using the tree-based model,
while SAX et. al. could use the event-based model. That way, the model would
be consistent throughout all of the tree-based technologies. (I'm assuming
here that the DOM should be treated as an API rather than a content model -
its name notwithstanding!)
Any thoughts?
- Kevin
Kevin Williams
Ultraprise Corporation (www.ultraprise.com)
Co-author, _Professional XML_ (Wrox Press)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Unsubscribe by posting to majordom at ic.ac.uk the message
unsubscribe xml-dev (or)
unsubscribe xml-dev your-subscribed-email at your-subscribed-address
Please note: New list subscriptions now closed in preparation for transfer to OASIS.
More information about the Xml-dev
mailing list