Fw: ModSAX: Proposed Core Features

Fri Mar 12 09:19:58 GMT 1999

Ronald Bourret <rbourret at ito.tu-darmstadt.de> wrote:
>If the application assembles the components and the interface between them
>is SAX, what do we need that SAX filters don't already give us?  In other
>words, does anything need to be done to OpenSAX (best name so far) to
>support this besides adding the ParserFilter interface?

Yes. One needs to _locate_ the necessary filters. Hence the registry, the
query-for-a-feature, etc.

>The other question that occurs to me is how useful/common it is to
>dynamically assemble a processor at run time. That is, are there really
>applications (outside of test environments) that allow the user to
>designate their parser at run time (or even installation time) and
>therefore need to cover any possible deficiencies in the chosen parser?
> What is gained by allowing the user to choose the parser?

If there aren't, why bother with ModSAX at all? If I know exactly which
class is used, I also know exactly which features it provides, right? The
whole point of ModSAX is that this isn't the case.

Think of it like this: XML support is not the same on all platforms.
Sometimes there's a built-in SAX parser. It may or may not support some
features. Sometimes there's an XSL processor. And so on. I'm talking about
platforms existing today, or "real soon now" - IE5, server packages, etc.

I want to write code which is _reasonably_ portable to such platforms. I
accept the remark that a full-scale solution is beyond the scope of ModSAX.
What I suggested is an interface in the spirit of SAX (I hope) -
lightweight, simple, low-level, which allows future layering of higher-level
solutions.

>Note that this is a very different situation from, say, using different
>ODBC drivers.  In the case of ODBC drivers, you are choosing a different
>source of data (type of database) and application writers have a strong
>incentive to support multiple databases through ODBC.  In the case of XML,
>the source of data is always the same XML document and the choice of parser
>becomes a trade-off between speed, reliability, feature-set, etc.

On the contrary, I see it as vbeing very similar to using ODBC drivers. ODBC
drivers vary in their capabilities, and therefore have a mechanism for
querying for particular features. So do XML components. There might be any
number of ODBC drivers available in a particular system. Same for XML
components. And you typically have a pretty good idea of which ODBC driver
you are going to use. Same for XML components. The last point doesn't
invalidate the first two.

BTW, have you ever tried to write a non trivial program which would work
with any ODBC driver? I have. You have to at least negotiate its
capabilities, find a match for your needs, and then the problems start - it
doesn't like this join syntax, it can't do this particular form of query...
You end up writing an adapter class which knows the particular nastiness of
the particular driver. Of course this is due to SQL being such a weak
standard; XML should be better in this regard - if we insist on
well-defining features, that is.

>Since the application writer knows the feature set ahead of time, why not
>just hard-code the required parser and SAX filters and be done with it?
> (Yes, I know that "hard-code" is a bad word and I shudder as a write it,
>but I really am curious if anybody out there has a real-world application
>that allows users to change parsers and what the benefits of this are
>besides the ability to say, "Oh, look. I'm using a different parser.")

Mine. I run on both IE5 ("hey, look, there's a built in XSL processor") and
IE4 ("oh well, let's use XT"), not to mention some server platforms I'm
considering. I'm also tentatively considering other XML features -
namespaces and embedding. I doubt I'm unique in this regard. And as XML
support starts crawling into popular platforms (examples abound), this would
become more and more common.

At least we hope so :-)

>In this view, the utility of SAX is not the ability to change parsers at
>run time, but to change them over time as reliability, speed, size, etc. of
>the parsers change.  It also means that application writers can learn a
>single interface (SAX) and then choose parsers as they are appropriate to
>the application without having to learn different interfaces for different
>parsers.

That's one view and a valid one. It shouldn't prevent the other one.

>The ability to request features in OpenSAX allows the application to
>request processor behavior, which is slightly different from assembling a
>suitable parser.  For example, if I have an application that doesn't need
>validation, but I the parser I want to use does validation by default, I
>would like to be able to turn that off.

Right. I didn't suggest that the original question ("which features are
supported") isn't important. What I suggested is that the second question
("how do I find a filter/parser which does X") is also important.

If it wasn't, why do we have a ParserFactory class in SAX?

BTW, I'm not happy with this "parser" fixation. SAX is an interface which
allows processing an XML tree. I don't see why the special case ("input:
text; output: SAX events") is any different then "input: DOM; output: SAX
events", for example. That's why "org.xml.sax.parser" is just another
"feature" in the API I suggested. "org.xml.sax.visitor" and
"org.xml.sax.builder" would be on equal grounds. IMVHO, converting DOM to
SAX and back is something which we will have to deal with.

>Just to be clear, I'm not necessarily against assembling processors based
>on a feature set.  I just believe that it is far more complex than it
>appears at first glance and am not convinced that it's worth the trouble.

I think I've answered the complexity issue - the API I've suggested is
anything but. It merely provides the basic building blocks. The application
may be as complex or as simple as you want.

Have fun,

    Oren Ben-Kiki

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)