SAX Future

Lars Marius Garshol larsga at ifi.uio.no
Fri Nov 13 17:26:34 GMT 1998


I think the main reasons that SAX has achieved such popularity as it
has (even being implemented by Oracle) are that it is very simple to
add SAX support to a parser, it arrived early and it has remained
stable.

Any attempts at making a SAX 2.0 (or 1.0.1) puts two of these reasons
in immediate jeopardy, and also creates a potential for endless
compatibility problems. On reflection I think we should be very
careful in developing a new version of SAX and attempt to break as
little as possible when/if doing so.

Some of the things that have been suggested can, I think, be safely
added without jeopardizing compatibility, since they are additions
above the driver level:

 - Parser filters
 - Driver management
 - More library functionality

I support all these and think they would be very useful. Some ideas
for how these could be done might be taken from the Python SAX
extensions, which are described at:

<URL:http://www.stud.ifi.uio.no/~larsga/download/python/xml/saxexts-spec.html>
<URL:http://www.stud.ifi.uio.no/~larsga/download/python/xml/saxutils-doco.html>


However, some of the other suggestions are a bit more troublesome:

 - A handler for basic lexical information
 - A handler for DTD information
 - The option for parsers not to break up PCDATA
 - Depending on the implementation driver querying might also end up 
   on this list

I think the lexical handler very useful, if only because it provides
information about the DOCTYPE declaration. I'm not so sure about the
handler for DTD information, which would most likely be much harder to
develop and also rather difficult to use in any sensible way.

As for the PCDATA option I agree with David in preferring a
ParserFilter for this purpose.

In spite of the potential problems I do support the driver querying
and management part. It's been very useful to us in Python, and now
that I'm doing some work in Java as well I wish I had it there too.

Being able to say

  saxexts.XMLValParserFactory.make_parser()

and knowing that you'll get a validating parser if one is installed
or else an exception is really convenient.


Finally, there is the question of backwards compatibility. As far as I
can see the extensions I labelled 'safe' can be done with no harm to
compatibility. The two new handlers are slightly trickier, but might
be incorporated in the same manner that SAX for Python did its
extensions: a separate sub-interface of Parser called ExtendedParser.

One could then add an implementation of ExtendedParser that wrapped
SAX 1.0 parsers and threw SAXUnsupportedExceptions when attempts were
made to call the new methods. This would avoid the need for a new
package name entirely.

The driver querying methods might get a separate interface or it might
be added to the ExtendedParser like in Python.


These are my 0.02 NOK,
--Lars M.


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list