Announcement: SAX Java Implementation (pre-release)

Tim Bray tbray at textuality.com
Mon Apr 13 00:41:27 BST 1998


At 05:31 PM 12/04/98 -0400, David Megginson wrote:
> > 1. Why has a SAX prefix been added to all classes?
>
>There are a few benefits to this decision:

Kind of unconvincing, I'd have to say.  If someone doesn't have it
together enough to figure out how to use java packages they're
not going to have much luck with SAX anyhow.  And we really shouldn't
be worrying about legacy SAX implementatoins at this stage; we're
all bleeding-edge types around here.  And if somebody
wants a C binding, that's going to be different enough from 
Java SAX anyhow that we shouldn't do the SAX prefix just because
they're going to have to.

> > 3. The interface for reading character streams needs more
> > specification if it is to be interoperable.
>
> > a) There's a critical ambiguity in the concept of a character stream:
> > a Java concept of a char does not correspond to the XML concept of a
> > character.
>What does everyone else think about this point?  Is this a good case
>for pragmatism over logical consistency, or am I introducing an ugly
>kludge that will come back to haunt us all?

Is it maybe the right thing to be brutally clear and just have a UTF-16
character stream?  I haven't looked at Java chars as closely as James
has, but his description sounds exactly like UTF-16.  A 16-bit UTF-16
quantity is not precisely a character, but the places where it isn't
(non BMP chars) exhibit graceful degradation; if the app knows about
UTF-16 it does the right thing, otherwise it looks like two unknown
characters, nothing breaks.

> > b) Is it legal for a byte order mark character to be present at the
> > start of the character stream? The right answer is that it should not
> > be legal: this should be stripped out in the byte to character
> > conversion process.
>
>This is a tricky point.  I had planned to leave it in -- what is the
>default behaviour for java.io.Reader (and for other languages with
>character streams)?

No; if there's a BOM, that should be eaten by the underlying char stream
machinery, which should read it and thereafter transparently swap bytes
or not to produce Java chars without the app having to work at it.
The spec is clear on this point, and at one with sensible implementation
practice.

> > c) How does this interact with the encoding declaration in the XML
> > document?  The docs should say that it's legal for the character
> > stream to include an encoding declaration and it doesn't matter what
> > encoding it specifies.
>
>I'd think that it should be ignored under these circumstances, since
>the characters are already decoded (though again, in an underspecified
>way -- are we dealing with UCS-2, UCS-4, or UTF-16?).

Once again, I think the spec can be followed straightforwardly
on this one.  If you know, by any combination of BOM, encoding 
decl, and external header, what the encoding is, just use it.
I think SAX implementations should compete on their ability to
Do The Right Thing.

> > 6. I strongly object to including the name argument in
> > SAXEntityResolver.resolveEntity.  There's nothing in XML that says
> > that the name should be used in resolving an entity and so there's no
> > reason to suppose a parser will make it available.  I also think it's
> > wrong in principle to make use of it.  This business with "[document]"
> > and "[dtd]" is gross. At the very least the spec should say that name
> > maybe null if this information is not available.
>
>I'm neutral on this point, though I do agree that "[document]" and
>"[dtd]" are ugly.  Does anyone object to the removal of the name
>argument?

I'm with James - any use of the entity name by an application is
potentially actively harmful, nuke it. -Tim


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list