ModSAX: Proposed Core Features
Ronald Bourret
rbourret at ito.tu-darmstadt.de
Thu Mar 11 13:11:07 GMT 1999
Oren Ben-Kiki wrote:
> Has anything similar [assembling processors based on feature requests]
> been done in a different field, so we could reuse the
> design lessons there? It seems like a pretty generic "stream processing"
> problem.
I think there is an inherent assumption in this question that we are
defining individual features that can be implemented by different parties
and then randomly assembled to get a useful processor. While this is
potentially a useful thing to do -- UNIX pipes are a good example -- it is
not necessarily an easy thing to do, nor is it clear that this is a goal of
ExModE-XSAX.
We tried to do a similar thing in OLE DB, where database functionality
would be broken down into individual services which could be assembled at
will on top of a database driver. (Generally, this would be meaningful
only for drivers for non-database sources, as drivers for existing
databases already exposed most/all functionality.) The idea never really
worked out, but here are some of the issues:
* Are there enough useful features/components to make this worthwhile? For
OLE DB, the answer was "probably not". We implemented a scrollable cursor
(basically just a result set cache), but other ideas (transactions,
security) were not easily implementable as separate layers and were not
really meaningful -- anybody could get around them by excluding the layer.
* What are the interfaces between components and how hard are they to
implement? If you want to be able to assemble components from different
vendors at will, these need to be defined. The success of SAX filters is a
red herring here -- it leads one to believe that SAX can function as a
useful interface for all XML-related processing features. In fact, this is
not the case -- for example, whether or not to retrieve external entities
has nothing to do with SAX. Thus, other interfaces would need to be
defined to be able to assemble processors from third-party components. (I
think this is one thing that led us astray in OLE DB. The usefulness of a
scrollable cursor engine that spoke OLE DB at both ends led us to believe
that the same could be done with other database features. In fact, OLE DB
was less well suited or completely unsuited for other operations. In
addition, it was expensive to implement.)
* How independent are the features? Is it meaningful to ask for one thing
but not another, such as wanting validation without namespaces (maybe) or
parsing external entities (no)? Again, I think the orthogonality of some
features is a red herring leading one to believe all features are
orthogonal.
* Are performance penalties too high to separate features into separate
components? For example, suppose several features need to process XML
documents as trees. While it might make sense to write a single processor
for these features and toggle them within the processor, the performance
hit of implementing them as separate, chained processors would be too high:
each would have to build a tree, process it, and then stream it back out as
SAX.
* Are there order dependencies between components? For example, if you
want validation and namespace processing as separate components, you had
better do namespace processing first. An open question is who knows about
order and how is it advertised.
* Who assembles the components -- the application, the processor, or a
third party? The advantage of a processor or third party (such as a
factory) assembling components is that you need the assembly logic in only
a few places. The disadvantage is that applications that know about a new
feature cannot use that feature until the assembly logic in the
processor/factory is updated. It is probably best to have a mechanism that
allows both processors and applications to assemble components.
My personal feeling is that assembling XML processors completely on the fly
is a pipe (if you will excuse the pun) dream. The world is simply not o
rthogonal enough to make this possible. Furthermore, there are too many
performance gains to be had by tight integration of functionality to ever
convince people to build things entirely as components with public
interfaces.
-- Ron Bourret
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list