Fw: ModSAX: Proposed Core Features

Thu Mar 11 15:56:53 GMT 1999

I asked:
>> Has anything similar [assembling processors based on feature requests]
>> been done in a different field, so we could reuse the
>> design lessons there? It seems like a pretty generic "stream processing"
>> problem.

Ronald Bourret <rbourret at ito.tu-darmstadt.de> wrote:

>I think there is an inherent assumption in this question that we are
>defining individual features that can be implemented by different parties
>and then randomly assembled to get a useful processor.  While this is
>potentially a useful thing to do -- UNIX pipes are a good example -- it is
>not necessarily an easy thing to do, nor is it clear that this is a goal of
>ExModE-XSAX.

Well, at least the idea warrants some serious thought.

>We tried to do a similar thing in OLE DB, where database functionality
>would be broken down into individual services which could be assembled at
>will on top of a database driver.  (Generally, this would be meaningful
>only for drivers for non-database sources, as drivers for existing
>databases already exposed most/all functionality.)  The idea never really
>worked out, but here are some of the issues:
>
>* Are there enough useful features/components to make this worthwhile?

Good question. For SAX I'd say "probably yes". Here's a list of features
(courtesy of David Megginson):

> http://xml.org/sax/features/validation
>  Validate (true) or don't validate (false).
> http://xml.org/sax/features/external-general-entities
>  Expand external general entities (true) or don't expand (false).
> http://xml.org/sax/features/external-parameter-entities
>  Expand external parameter entities (true) or don't expand (false).
> http://xml.org/sax/features/namespaces
>  Preprocess namespaces (true) or don't preprocess (false).  See also
>  the http://xml.org/sax/properties/namespace-sep property.
> http://xml.org/sax/features/normalize-text
>  Ensure that all consecutive text is returned in a single callback to
>  DocumentHandler.characters or DocumentHandler.ignorableWhitespace
>  (true) or explicitly do not require it (false).

I'd like to see "http://xml.org/sax/features/xsl-transformation" as well.
Anyway, all of the above seem to fall nicely into the pipeline framework.

>* What are the interfaces between components and how hard are they to
>implement?

Basically the SAX callbacks, probably extended so that the full document
data is available (comments and so on). This seems pretty much a done deal.

>* How independent are the features?
>* Are there order dependencies between components?

This is a problem, as I've already pointed out. Take "normalize-text", for
example. The effects of such a filter might be lost if it is followed by any
of the entity expansion filters (say), not to mention an XSL one. However
most of the other features seems relatively independent. I'd say this isn't
a fatal problem. It definitely doesn't effect the API I suggested.

>* Are performance penalties too high to separate features into separate
>components?

Unknown; I guess this depends on the feature and the implementation. But
then, allowing one to build a system by combining filters doesn't mean one
has to do so. Even inefficient pipelines are still very useful for ad-hoc
processing, for prototyping systems, and so on. From the list of features
above, I'd say that most won't suffer a serious penalty.

>* Who assembles the components -- the application, the processor, or a
>third party?

What I'm suggesting is we currently answer "for now, the application", and
provide a simple, lightweight, low-level API which allows it to do so. More
complex solutions could evolve later on. This seems to be in the SAX spirit.

>My personal feeling is that assembling XML processors completely on the fly
>is a pipe (if you will excuse the pun) dream.  The world is simply not o
>rthogonal enough to make this possible.  Furthermore, there are too many
>performance gains to be had by tight integration of functionality to ever
>convince people to build things entirely as components with public
>interfaces.

Simon St.Laurent has made a good case for layering XML functionality - see
http://www.simonstl.com/articles/layering/layered.htm. The list of features
above seems to validate his claims.

My feeling is that pipelining is a valid approach. This is because there are
quite a few features which fit this model, and each application needs its
own special subset of them. If this weren't the case, we'd be designing
SAX2.0 with a fixed set of features instead of ModSAX.

Have fun,

    Oren Ben-Kiki

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)