events vs callbacks (was Re: SAX2 (was Re: DOM vs. SAX??? Nah. ))

Tue Feb 23 03:19:17 GMT 1999

Bill la Forge wrote:

> Overhead is an issue. Event objects really do simplify a lot of things, especially filters.
> Interfaces are faster.

<pedantic>

SAX is described as an event-based API. IMO, it is a callback based API. Iam guessing
that many will find this either a debateable distinction or one not worth dwelling on. I feel
it is worth distinguishing.  This is separate from the issue of whether it is worthwhile to
either replace the callback API with an event API, or layer an event API on top of the
callback API.

</pedantic>

>
> Worse, if the parser pulls the same tricks with Event objects as are currently done with
> AttributeList (i.e. reusing the same object over and over), you must then clone the
> event before adding it to the queue.

The issue of memory management and ownership rears its ugly head again :-(. This seems
to argue for a eventgen filter for SAX.

Given the memory management and efficiency issues, event queuing would need to be layered
on top of, rather than instead of, the callback API. If you assumed that an event API replaced
the callback API due to your extensibility argument, then I wonder if you couldn't provide a
configuration parameter to the SAX driver on whether to clone or reuse the event objects.

> There are lots of things we could do if we had event objects, especially with control flow.
> (And there's a lot of mess in MDSAX because we do not use event objects!)
> But parser speed is the key feature. For now.

I can see that the speed of the parser subsystem argues for the current approach. Especially
since
the existing base of parsers use this model.  Its not clear to me that the speed of a system
which
itegrates a SAX based parser is necessarily enhanced by the current model.

I have two issues with the current approach. One is the stated one with event vs. callback based
API. The other is more related to parser architecture and single threaded runtime environments.

AFAIK, the current crop of parsers and SAX all assume that they are passed a thread of control
and in turn pass this thread to the callbacks registered by the application. In single-threaded
invironments
this means that the parser is the center of the universe until the document is completely
processed.
It would be nice if there was also a "fragment" sequence interface like that used by the HTML
parser in Perl. I.e. each call  to "parse" provides the next chunk of input forming the
document.  This is also useful in a multi-threaded runtime since the application can control the
chunking directly rather than indirectly thru thread synchronization mechanisms.

> Though if we go with Simon's layered architecture, we might actually get a speed gain.
> But there's no question that the code would be a whole lot smaller and easier to
> understand. And that may be justification enough.

I looked at MDSAX over the weekend, and it is certainly a powerful platform for SAX based
processing.  On the other hand, it seemed that trying to fit everything into a single filter
network
without any lookahead capability (would require queueing) and cumbersome lookbehind capability
(such as in the flatten example)  is problematic. Given the constraints of making use of the
existing
infrastructure (SAX, XML) you have created a very flexible framework. Its not clear if the
constraints
are the right ones for trying to support composition of processing like you envision.

Gabe Beged-Dov
www.jfinity.com

>

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)