SAX and DOM

Tue Dec 14 21:40:20 GMT 1999

Just some thoughts....

As both of these standard become more and more 
solid (i.e. lots of code depending upon them),
perhaps it is now high time to look closely
at their integration to make it as painless
as possible for the application developer...
before _more_ code is developed.

Item #0
~~~~~~~
SAX is fundamentally event-driven,
sequential-access, read-only.  DOM 
is fundamentally object-based, 
random-access, read-write. 

No problem here, the stark differences 
make for a joyful interaction!

Item #1
~~~~~~~
AttributeList and NamedNodeList both do 
essentially the same thing. Excepting
NamedNodeMap allows for mutation, and
AttributeList is more specific.  

A. In DOM, drop all of the mutators from 
   NamedNodeMap.  The replaceChild,
   addChild, and removeChild methods
   could just as easily be used for 
   attributes as well as elements.

   In DOM level 2, those methods 
   could be deprechiated; giving
   applications a chance to catch up.

B. In SAX, replace AttributeList with
   NamedNodeMap.

   In SAX2, AttributeList could be
   deprechiated, making it an interface 
   supported by the particular 
   NamedNodeMap used.  

Item #2
~~~~~~~
This one is a bit more brutal.  A DOM
node is heavyweight, where a SAX beginElement 
only has the element name and attribute list.
A few things to note here:  

 (a) often times a read/write DOM 
     node is not needed; so it is often
     pure overhead.

 (b) with a query on a DOM tree, a list 
     of nodes fiting the criteria must
     be returned.  It will be tempting to 
     return them as an array, however,
     if relational database design has 
     anything to  say, it will be realized 
     that a stream of Nodes will be a far
     better canidate.  And what is SAX
     but a stream of nodes?

 (c) When using SAX, access to the ancestor
     element stack would be horribly
     valueable.

Thus,

A.  Introduce a BaseNode interface that
    includes the node's name and value,
    parent node, attribute list, and 
    (possibly) a child list...

B.  In DOM Level 2, make Node inherit
    from BaseNode

C.  For SAX 2, introduce an alternative
    DocumentHandler interface, called
    NodeHandler with beginElement and,
    handleNode( BaseNode node) methods.

    Notes:  I'm not sure how to handle
    characters() method, perhaps the 
    light-weight Node interface needs 
    quite a bit more modification...

    Perhaps, instead of returning a String
    (for Java), it returns another object,
    something like this:
      class CharBuff {
   	char []  array;
  	int      begin;
        int      length;
      }

Item #3
~~~~~~~
SAX is a stream interface, but unfortunately,
an event/listener pattern was not used.  So,
perhaps for SAX2, a xpath based dispach
system could be used, to pick a particular
NodeHandler based on a particular criteria.
This would also work *wonderfully* for a
DOM query handler.  A system like this, BTW,
really drives the need for the ancestor
stack (at a minimum) to be made available
through a SAX2 interface.

Item #4
~~~~~~~
DOM is a random-access interface, but unfortunately,
it does not currently allow user-defined containers
for sub-sets of children.  This would, IMHO, be
a great boon for a moudular grammer.... in some
cases a linked list might be perfect.  In other
cases, a ballenced red/black tree might be 
the ticket, etc.   By delegating this to implementation
a huge amount of choice is stripped from the
application developer.   

Anyway... just thinking out loud here.

...

BTW, the experimental YML syntax has really
cleared up my thinking with regard to 
sequential vs. random access.   

On the SML list is a possible starting proposal 
for a better SAX/DOM integration based on
the endEvent/handleNode returning a boolean, 
"true" if the node is to be added to its parent's 
child list, or "false" if it and all of its 
children are to be garbage collected.  The result
is suprizing...  if the answer to this question
is "no" recursively, then the unified interface
is logically equivalent to SAX.  If the answer
is "yes" recurisvely, then the unified interface
is logically equivalent to a DOM builder with
SAX calls.  If the question is "no" for many
top level nodes, but "yes" for an entire sub-tree,
then the result is similar to Pyxie's hybrid
approach.  However, what this interface allows
is far more granularity of choice than either 
of these models... thus with a small amount of
added complexity (a boolean decision), great
flexibility is granted.

It is in my attempt to unify DOM/SAX using this
type of "SAX->DOM" binary-recurisve builder
that the above concerns popped up.  It would
be cool to have a debate about them, or 
perhaps better, pointers as to where the
debates on these points were carried out.

Best Wishes,

Clark

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)