A Proposal for Refactoring SAX

andyclar at us.ibm.com andyclar at us.ibm.com
Tue May 25 19:11:18 BST 1999



With the release of the SAX 1.0 API and watching the
development of SAX 2.0, we have seen a need to refactor
the existing API to share useful function with other XML
parsers. Parsers that produce other output such as a DOM
tree or JavaBeans would benefit from a standard way of
parsing documents, resolving entities, and handling
errors. In addition, refactoring the APIs can help solve
the problem of allowing for the new 2.0 APIs.

The design for SAX is simple, practical, and makes the
most sense to borrow from in order to bring the multiple
parser worlds together. SAX contains pieces that are
useful to all parsers, regardless of whether they provide
a streaming API.

This proposal details how we would refactor SAX but does
not include the new SAX 2.0 APIs. It is assumed that any
API can be added once the interfaces and classes are
re-organized. The full set of existing API is also not
included because of the length of this message already.

SECTION 1: Refactoring SAX 1.0

We have separated the SAX 1.0 interfaces and classes into
two separate groups: those interfaces and classes that are
of use to all parsers, and the other interfaces and classes
that are specific to stream-based parsers.

  General Purpose Interfaces and Classes (before)
  ======================================

  org.xml.sax.EntityResolver
  org.xml.sax.ErrorHandler
  org.xml.sax.InputSource
  org.xml.sax.Locator
  org.xml.sax.Parser
  org.xml.sax.SAXException
  org.xml.sax.SAXParseException
  org.xml.sax.helpers.LocatorImpl

  Stream Specific Interfaces and Classes (before)
  ======================================

  org.xml.sax.AttributeList
  org.xml.sax.DocumentHandler
  org.xml.sax.DTDHandler
  org.xml.sax.HandlerBase
  org.xml.sax.helpers.AttributeListImpl
  org.xml.sax.helpers.ParserFactory

Refactoring these classes would lead to the following
possible scenario. These are the interfaces and classes
that we have found to be most useful to the various
parser communities in writing the IBM XML4J parser.

  General Purpose Interfaces and Classes (after)
  ======================================

  org.xml.EntityResolver
  org.xml.ErrorHandler
  org.xml.InputSource
  org.xml.Locator
  org.xml.Parser
 *org.xml.XMLException
 *org.xml.XMLParseException
  org.xml.helpers.LocatorImpl
 +org.xml.helpers.ParserFactory

  Stream Specific Interfaces and Classes (after)
  ======================================

  org.xml.sax.AttributeList
  org.xml.sax.DocumentHandler
  org.xml.sax.DTDHandler
  org.xml.sax.HandlerBase
 *org.xml.sax.SAXParser
  org.xml.sax.helpers.AttributeListImpl
 *org.xml.sax.helpers.SAXParserFactory

Refactoring in this way also allows DOM-based parsers to
share a lot of the same API of SAX parsers. The following
list details additional interfaces and classes specific
to DOM.

  DOM Specific Interfaces and Classes (after)
  ===================================

 +org.xml.dom.DOMParser
 +org.xml.dom.helpers.DOMParserFactory

The interfaces and classes marked with a plus (+) are new
and those marked with an asterisk (*) are renamed to make
more general purpose or to remove ambiguity in the name.

Modifications would have to done in order to correct for
the movement of interfaces and classes from one package to
another. The most obvious change would be that the general
purpose methods of org.xml.sax.Parser would be moved to
org.xml.Parser with the remaining methods being retained
in the org.xml.sax.SAXParser interface.

SECTION 2: How to Not Break SAX 1.0 Compatibility

Unfortunately, keeping the same package name, as detailed
in Section 1, will not work well because of the incredibly
large number of parsers and apps using those parsers that
are already coded to use the existing SAX 1.0 interfaces
and classes. I would vote to replace the old classes with
the new ones but legacy is a powerful motivational force.

A simple solution for not breaking SAX 1.0 compatibility
would be to move the new org.xml.sax interfaces and
classes into a package named org.xml.sax2. We've discussed
the possibility of making the new structure work on top of
the old interfaces and classes without modification but we
decided against it -- simple is better in most cases.

SECTION 3: Incorporating SAX2 Work

Incorporation of the ongoing SAX2 work into the refactored
SAX interfaces and classes would become simple because the
backward compatibility would be separated by package name,
as described in Section 2. Users of the old SAX parser and
classes would remain unchanged and those users who want to
upgrade and make use of the new SAX functionality could
make the same changes they would have to make anyway.

--
Andy Clark * IBM, JTC - Silicon Valley * andyclar at us.ibm.com



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list