SAX and DOM
Clark C. Evans
clark.evans at manhattanproject.com
Tue Dec 14 21:40:20 GMT 1999
Just some thoughts....
As both of these standard become more and more
solid (i.e. lots of code depending upon them),
perhaps it is now high time to look closely
at their integration to make it as painless
as possible for the application developer...
before _more_ code is developed.
Item #0
~~~~~~~
SAX is fundamentally event-driven,
sequential-access, read-only. DOM
is fundamentally object-based,
random-access, read-write.
No problem here, the stark differences
make for a joyful interaction!
Item #1
~~~~~~~
AttributeList and NamedNodeList both do
essentially the same thing. Excepting
NamedNodeMap allows for mutation, and
AttributeList is more specific.
A. In DOM, drop all of the mutators from
NamedNodeMap. The replaceChild,
addChild, and removeChild methods
could just as easily be used for
attributes as well as elements.
In DOM level 2, those methods
could be deprechiated; giving
applications a chance to catch up.
B. In SAX, replace AttributeList with
NamedNodeMap.
In SAX2, AttributeList could be
deprechiated, making it an interface
supported by the particular
NamedNodeMap used.
Item #2
~~~~~~~
This one is a bit more brutal. A DOM
node is heavyweight, where a SAX beginElement
only has the element name and attribute list.
A few things to note here:
(a) often times a read/write DOM
node is not needed; so it is often
pure overhead.
(b) with a query on a DOM tree, a list
of nodes fiting the criteria must
be returned. It will be tempting to
return them as an array, however,
if relational database design has
anything to say, it will be realized
that a stream of Nodes will be a far
better canidate. And what is SAX
but a stream of nodes?
(c) When using SAX, access to the ancestor
element stack would be horribly
valueable.
Thus,
A. Introduce a BaseNode interface that
includes the node's name and value,
parent node, attribute list, and
(possibly) a child list...
B. In DOM Level 2, make Node inherit
from BaseNode
C. For SAX 2, introduce an alternative
DocumentHandler interface, called
NodeHandler with beginElement and,
handleNode( BaseNode node) methods.
Notes: I'm not sure how to handle
characters() method, perhaps the
light-weight Node interface needs
quite a bit more modification...
Perhaps, instead of returning a String
(for Java), it returns another object,
something like this:
class CharBuff {
char [] array;
int begin;
int length;
}
Item #3
~~~~~~~
SAX is a stream interface, but unfortunately,
an event/listener pattern was not used. So,
perhaps for SAX2, a xpath based dispach
system could be used, to pick a particular
NodeHandler based on a particular criteria.
This would also work *wonderfully* for a
DOM query handler. A system like this, BTW,
really drives the need for the ancestor
stack (at a minimum) to be made available
through a SAX2 interface.
Item #4
~~~~~~~
DOM is a random-access interface, but unfortunately,
it does not currently allow user-defined containers
for sub-sets of children. This would, IMHO, be
a great boon for a moudular grammer.... in some
cases a linked list might be perfect. In other
cases, a ballenced red/black tree might be
the ticket, etc. By delegating this to implementation
a huge amount of choice is stripped from the
application developer.
Anyway... just thinking out loud here.
...
BTW, the experimental YML syntax has really
cleared up my thinking with regard to
sequential vs. random access.
On the SML list is a possible starting proposal
for a better SAX/DOM integration based on
the endEvent/handleNode returning a boolean,
"true" if the node is to be added to its parent's
child list, or "false" if it and all of its
children are to be garbage collected. The result
is suprizing... if the answer to this question
is "no" recursively, then the unified interface
is logically equivalent to SAX. If the answer
is "yes" recurisvely, then the unified interface
is logically equivalent to a DOM builder with
SAX calls. If the question is "no" for many
top level nodes, but "yes" for an entire sub-tree,
then the result is similar to Pyxie's hybrid
approach. However, what this interface allows
is far more granularity of choice than either
of these models... thus with a small amount of
added complexity (a boolean decision), great
flexibility is granted.
It is in my attempt to unify DOM/SAX using this
type of "SAX->DOM" binary-recurisve builder
that the above concerns popped up. It would
be cool to have a debate about them, or
perhaps better, pointers as to where the
debates on these points were carried out.
Best Wishes,
Clark
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list