ModSAX: Proposed Core Features

Thu Mar 11 13:11:07 GMT 1999

Oren Ben-Kiki wrote:

> Has anything similar [assembling processors based on feature requests]
> been done in a different field, so we could reuse the
> design lessons there? It seems like a pretty generic "stream processing"
> problem.

I think there is an inherent assumption in this question that we are 
defining individual features that can be implemented by different parties 
and then randomly assembled to get a useful processor.  While this is 
potentially a useful thing to do -- UNIX pipes are a good example -- it is 
not necessarily an easy thing to do, nor is it clear that this is a goal of 
ExModE-XSAX.

We tried to do a similar thing in OLE DB, where database functionality 
would be broken down into individual services which could be assembled at 
will on top of a database driver.  (Generally, this would be meaningful 
only for drivers for non-database sources, as drivers for existing 
databases already exposed most/all functionality.)  The idea never really 
worked out, but here are some of the issues:

* Are there enough useful features/components to make this worthwhile?  For 
OLE DB, the answer was "probably not".  We implemented a scrollable cursor 
(basically just a result set cache), but other ideas (transactions, 
security) were not easily implementable as separate layers and were not 
really meaningful -- anybody could get around them by excluding the layer.

* What are the interfaces between components and how hard are they to 
implement?  If you want to be able to assemble components from different 
vendors at will, these need to be defined.  The success of SAX filters is a 
red herring here -- it leads one to believe that SAX can function as a 
useful interface for all XML-related processing features.  In fact, this is 
not the case -- for example, whether or not to retrieve external entities 
has nothing to do with SAX.  Thus, other interfaces would need to be 
defined to be able to assemble processors from third-party components.  (I 
think this is one thing that led us astray in OLE DB.  The usefulness of a 
scrollable cursor engine that spoke OLE DB at both ends led us to believe 
that the same could be done with other database features.  In fact, OLE DB 
was less well suited or completely unsuited for other operations.  In 
addition, it was expensive to implement.)

* How independent are the features?  Is it meaningful to ask for one thing 
but not another, such as wanting validation without namespaces (maybe) or 
parsing external entities (no)?  Again, I think the orthogonality of some 
features is a red herring leading one to believe all features are 
orthogonal.

* Are performance penalties too high to separate features into separate 
components?  For example, suppose several features need to process XML 
documents as trees.  While it might make sense to write a single processor 
for these features and toggle them within the processor, the performance 
hit of implementing them as separate, chained processors would be too high: 
each would have to build a tree, process it, and then stream it back out as 
SAX.

* Are there order dependencies between components?  For example, if you 
want validation and namespace processing as separate components, you had 
better do namespace processing first.  An open question is who knows about 
order and how is it advertised.

* Who assembles the components -- the application, the processor, or a 
third party?  The advantage of a processor or third party (such as a 
factory) assembling components is that you need the assembly logic in only 
a few places.  The disadvantage is that applications that know about a new 
feature cannot use that feature until the assembly logic in the 
processor/factory is updated.  It is probably best to have a mechanism that 
allows both processors and applications to assemble components.

My personal feeling is that assembling XML processors completely on the fly 
is a pipe (if you will excuse the pun) dream.  The world is simply not o  
rthogonal enough to make this possible.  Furthermore, there are too many 
performance gains to be had by tight integration of functionality to ever 
convince people to build things entirely as components with public 
interfaces.

-- Ron Bourret

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)