Announcement: SAX Java Implementation (pre-release)

James Clark jjc at
Sat Apr 11 12:51:20 BST 1998

David Megginson wrote:

> I have put together a new, beta version of SAX with quite a few
> changes

This looks good.  I have some nits:

1. Why has a SAX prefix been added to all classes?

2. For consistency with SAXException, in SAXLocator getSystemId should
return null if no system id is available, and getLineNumber,
getColumnNumber should similarily return -1 if no line or column
number is available.

3. The interface for reading character streams needs more
specification if it is to be interoperable.

a) There's a critical ambiguity in the concept of a character stream:
a Java concept of a char does not correspond to the XML concept of a
character. A character outside the BMP is a single XML character but
is represented by a pair of Java chars.  If you want to use the Java
Reader interface, then a character stream must be a stream not of
characters in the XML sense but in the Java sense.  I don't have the
Unicode standard handy, but it has precisely defined terms for these
two different things; I suggest referencing the Unicode standard and
using the appropriate term.

b) Is it legal for a byte order mark character to be present at the
start of the character stream? The right answer is that it should not
be legal: this should be stripped out in the byte to character
conversion process.

c) How does this interact with the encoding declaration in the XML
document?  The docs should say that it's legal for the character
stream to include an encoding declaration and it doesn't matter what
encoding it specifies.

4. The doc for SAXDTDHandler should say that the order in which DTD
events are fired is unspecified except that they will be all be fired
after startDocument and before startElement.

5. Maybe the name of SAXDTDHandler should be changed to reflect the
fact that it is not attempting to be a complete DTD interface.  Some
future version of SAX might provide optional support for full DTDs and
it would be nice to be able to use the name SAXDTDHandler as the name
for that.

6. I strongly object to including the name argument in
SAXEntityResolver.resolveEntity.  There's nothing in XML that says
that the name should be used in resolving an entity and so there's no
reason to suppose a parser will make it available.  I also think it's
wrong in principle to make use of it.  This business with "[document]"
and "[dtd]" is gross. At the very least the spec should say that name
maybe null if this information is not available.

7. Is the first character on the line at column 0 or column 1? (GNU
Emacs says column, but others say column 1.)  The docs need to make
this clear.

8. I don't think SAXException.getLocalizedMessage is the right
approach to internationalization.  Although the JDK does have
Throwable.getLocalizedMessage, as far as I can tell nothing uses it
and it's not at all convenient.  It would be better to have a
setLocale(Locale locale) method on SAXParser that specified the locale
in which messages should be returned.  This is the approach that is
used in AWT.  In any case SAXException.getLocalizedMessage is entirely
redundant since SAXException has Throwable as an indirect superclass,
and Throwable includes an identical definition of getLocalizedMessage.

9. I think SAXHandlerBase.error and SAXHandlerBase.warning should be
no-ops like almost all the other methods.  Having the default be to
print messages on System.err introduces a command-line bias that seems
inappropriate to me.  In addition using a PrintStream (which
System.err is) is irretrievably broken from an internationalization
perspective, as is made clear in the PrintStream docs.


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as:
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list