Announcement: SAX 1.0gamma

Sun May 3 15:13:49 BST 1998

James Clark writes:

 > I did a first pass at implementing a SAX 1.0gamma driver for XP today.
 > 
 > Some nits:
 > 
 > It should be specified whether a byte order mark at the beginning
 > of a XML byte stream is included as part of the character stream.
 > I don't think it should be since the byte-order mark isn't included
 > the XML document production, and the XML spec says explicitly that
 > the byte order mark "is an encoding signature, not part of either
 > the markup or the character data of the XML document".

My first hunch is the opposite: the XML productions deal with
characters, not bytes.  When I provide a raw byte stream
(java.io.InputStream), I'm requiring the XML parser to take on two
logical tasks:

1) convert the bytes to characters

2) apply the XML productions to the characters

You have already mentioned that, unlike many XML parsers (including
AElfred), XP does not perform these as independent, serial steps;
conceptually, however, the tasks are still distinct.  The BOM is part
of the raw byte stream, but not part of the character stream.

I think that it also simplifies Java implementation if the parser can
behave the same way with an InputStream from a URLConnection and an
InputStream supplied explicitly by an application.

 > How are relative system identifiers supposed to be handled in
 > DTDHandler?  Suppose I have a DTD with a system id of dir/foo.dtd,
 > which declares an unparsed entity with a system id of foo.eps
 > (which refers to dir/foo.eps). If the systemId argument to
 > DTDHandler.unparsedEntityDecl is foo.eps, then the application is
 > going to have problems.  There's a similar issue with
 > EntityResolver.resolveEntity.

This does seem to be a serious problem.  One solution is to require
the parser to fully resolve system identifiers before reporting them
(as AElfred already does).  This approach will work well with URLs,
but may break for other URI schemes.

Any other solutions?

 > Parser.parse should be allowed to throw IOException in addition to
 > SAXException.  Since InputSource includes a Reader and InputStream, and 
 > methods on Reader and InputStream throw IOException, parse needs to
 > throw IOException.  It's ridiculous to require the parser to wrap an
 > IOException in a SAXException when you know that the parser needs to
 > throw IOException.

This suggestion sounds quite reasonable.  Any objections?

 > There's nothing in the XML spec that says parsers have to make
 > attribute types available. So I think the doc for
 > AttributeList.getType should say that CDATA may be returned not
 > only if the parser has not read the declaration, but also if the
 > parser does not make this information available (alternatively it
 > could return null in this case).

I don't recall anything in the spec that requires parser to report the
start and end of elements, or even character data other than
whitespace -- this area needs attention from the XML WG.

That said, I think that this documentation change would be useful.

 > Supporting changing of Locale in the middle of a parse would
 > require me to redesign my native interface in a way I consider very
 > undesirable, and I don't see any need for this, so I don't plan to
 > support this.  The basic issue is that my counterpart of Parser is
 > reentrant unlike SAX's and I want to keep it that way. I think
 > changing of Locale mid-parse is going to be difficult for anyone
 > with a similar style of interface.

I doubt that anyone else is supporting Locale at all right now, so
this change should not cause any trouble.  I have no objection to
making it.

 > It's probably too late for this, but I'm having problems seeing the
 > logic in the exception handling design.  The design seems to make
 > things inconvenient for both users and implementors: implementors
 > have to wrap SAXException in order to pass it up through their
 > parsers, and in handler methods users have to wrap their exceptions
 > in SAXException.

The former problem exists only when SAX support comes through a
separate driver (as, admittedly, is usually be the case).  A new
parser, written from scratch, could include SAXException in its throw
clauses without using a wrapper.

The latter problem is very annoying, but there seems to be no
obviously correct solution.  I received a very, very large number of
objections to my use of java.lang.Exception.  I don't want to
vacillate any more, and have settled on a SAXException wrapper simply
as the (slightly) lesser of two evils.

All the best,

David

-- 
David Megginson                 ak117 at freenet.carleton.ca
Microstar Software Ltd.         dmeggins at microstar.com
      http://home.sprynet.com/sprynet/dmeggins/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)