SAX: Parser Interface -- Summary of Change Requests

Tyler Baker tyler at
Sun Feb 1 22:36:19 GMT 1998

Tyler Baker wrote:

> >    I can see the convenience of this method, and I plan to add
> >    something like this to AElfred when I have a chance.  For SAX,
> >    however -- which is meant to end up as a language- and
> >    system-independent API -- I am reluctant to hardcode assumptions
> >    about storage (and I don't know enough about IDL to know if there
> >    is a general representation for streams).  Paul Pazandak has also
> >    suggested allowing strings and buffers -- in this case, they would
> >    already be decoded into characters.
> Another idea (as far as implementation goes) is to have the parser simply be an
> extension of which takes an one or more Handler
> interfaces as arguments (to delegate to), so that you can handle very large
> streams of data.  In addition to overriding the necessary
> methods, you can also have methods like readDocument(),
> readElement(), etc.  This would give people a lot more control over reading in
> XML.  This approach of course is similiar to how URL Content in the
> package handles content.  But where I see this approach being most useful is in
> transactions where you might only want to read in a limited amount of data
> anyways and process only that or else in the case where XML content is always at
> a fixed length (like in databases where you get null padding for string fields
> which do not take up the assigned length).  With the current SAX implementation,
> you have no real control at the IO level where it would help to skip content if
> the application feels it is necessary.

One last thing I wanted to add to this which would be nice is if you had the Parser
be an extension of or, would be for
being able to simple take a compressed XML file and unpack it all in one line of

For example, you could create it all like this:

XMLInputStream xis = new XMLInputStream(new CompressedInputStream(in), handler);

where in, is any input stream (like file, URL, etc) and handler is one or more

This I feel is much more flexible, since currently SAX only will accept content which
comes from a resolved URL as well as the fact that if you are going to have an
InputStream argument, you will need control over how it is handled.  In addition, you
might want to be able to register the handler right before actually handling the
content.  For example, if you get a systemID or publicID of some type (this would
currently occur with a doctype event in SAX), you would then want to register a
particular document handler with that type (which could be done nicely with a dynamic
class loading mechanism).  In this case, you might have a static method in the
XMLInputStream class which acts as a registry for handlers of various document types
that could be something no more complex than a hashtable of class names which are
indexed by systemID or publicID.  You could have this registry just be for documents,
or else it could even be more complex with a federated namespace of handlers for

Personally I would much rather write code that looks like this:

// Done when I initialize the program
java.util.Properties handlers = new java.util.Properties();
try {
  handlers.load(new FileInputStream("foo.txt"));
} catch (IOException) {

// Then later do this
URL fooURL = new URL("");
XMLInputStream xis = new XMLInputStream(fooURL.openStream());

Or if you don't use any registry for document handlers, you could simply do something
like this

DocumentHandler bdh = new BarDocumentHandler();
// Assumes bar.xml is a document type "bdh" can handle
URL fooURL = new URL("");
XMLInputStream xis = new XMLInputStream(fooURL.openStream(), bdh);

Once you have the "xis" reference, then just call methods like "readDocument(Document
document)" which would read the document data into a Document object (Document would
be an interface).

Document document = new MSWord90Document();
try {
} catch (IOException e) {

Personally I prefer the registry idea so you the application would know ahead of time
what to do for any XML file (handle it or else do some default handling).

Just some ideas before v1.0 of SAX in grinded in stone...


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as:
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list