SAX Level 2 (was 1998-04-20 Pre-Release...)

Ray Cromwell ray at
Thu Apr 23 16:24:15 BST 1998

Hi, I've been a lurker on this list for awhile, but I thought
I'd add my two cents.

I identify with the need to keep SAX simple, but only because it helps
rapid adoption in the beginning in order to make it a defacto standard
(because it is easy for parser writers to implement).  A huge interface
like JDBC would take an effort that many freeware authors wouldn't
embark on. So I think David has done Java/XML programmers a wonderful
service by organizing the effort.

However, I think there is a need for a ubiquitous, parser independent,
API that gives one complete control over their data structures, but
without any (or negligable) information loss. Larry Wall gave a
convincing presentation at XML98 as to how Perl will support XML which
made my mouth drool compared to the level of information I'm getting
now. In fact, I built my application around Lark instead of SAX because
I needed access to location offset information.

There are a whole class of applications that are impossible to write
with SAX, namely, authoring tools, or any tools that need two-way
manipulation. Another class of applications need access to DTD

Right now, it seems only IBM's XML for Java supports access to the DTD,
however, it does not give location offset information. Thus, it's
looking more and more like parser features are going to diverge, which
means SAX has two possible future scenarios:

1) All SAX features are required to be implemented, ala OpenGL, etc.
2) Some SAX features can be unimplemented, but an interface is available
to query whether the functionality exists (DirectX, JDBC, etc)

The first scenario makes it easier to write applications, but is a
disincentive to parser implementors. The second scenario makes it easier
for parser implementors to support, but introduces complicated choices
for the application programmer. (Query driver to see if feature X is
enabled, disable all application features dependent on X. Filter out
driver choices that don't support Y, etc)

My gut feeling is, I have to say, that scenario 1 is the best, for
several reasons. 

First, there are always more application authors than parser
implementors. Most of the work should be shifted onto the area that will
provide the greatest benefit for the greatest number of people. For
instance, it is better to make the operating system or foundation class
do the work for a programmer, to free his time for other things.

Second, parsers are commodity items. They will quickly be built into
operating systems, browsers, and frameworks. I don't believe programmers
will "shop around" for a parser. They will use the one that comes with
their environment, so it is better that it is full featured and support
a general API.

Third, APIs that have undefined behavior, or behavior that is optional,
cause wasted programming logic, when differences are eventually
eradicated anyway. The implementations that support 100% of the
functionality end up winning, and either the partial implementations
become full ones, or they disappear. You can see this happening in the
2D and 3D video card markets right now.

Thus, I think it makes sense to define a powerful API that allows a
spectrum of applications to be written.  I know some critics are going
to respond "well, that's not the point of SAX." My gut feeling is that
if XML is going to be foundational, a common, single API will be
critical to the success of enabling a market of XML applications.

Now, either proprietary parsers themselves will become this defacto API
(e.g. Microsoft, or if Sun were to include a parser in the next JDK),
or, there is going to be a standard API that everyone adheres to.
Perhaps the question is, should the W3C/IETF be doing this, or should it
be informal? 

Since I need this API *now*, actually yesterday, I'd rather not wait for
the W3C/IETF to define it, rather, I like Dave's model of getting quick
consesus and shipping a beta implementation. Otherwise, I'm either going
to hack the source to someone else's parser, or write my own.

Ok, now that I've started a flame war and gotten that off my chest :),
I'd like to nominate the three biggest features I'd like in SAX Level 2
(or SAX2.0), in order of importance.

1) access to DTD information
2) comments, CDATA, and location information for Attributes
3) sax.util classes that take an ElementFactory (which return DOM
interfaces), and build a tree. (maybe Don Park would like to contribute
this). IBM's XML for Java is a starting point, but it has the fatal flaw
that the return values of the ElementFactory are not the DOM interfaces
(such as Element or PI) but IBM base classes, like TXElement or PI,
which means you are forced to inherit from TXElement instead of just
implementing Element.

Ok, flame away!


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as:
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list