Announcement: SAX Java Implementation (pre-release)
ak117 at freenet.carleton.ca
Mon Apr 13 16:20:31 BST 1998
Tim Bray writes:
> At 05:31 PM 12/04/98 -0400, David Megginson wrote:
> > > 1. Why has a SAX prefix been added to all classes?
> >There are a few benefits to this decision:
> Kind of unconvincing, I'd have to say. If someone doesn't have it
> together enough to figure out how to use java packages they're
> not going to have much luck with SAX anyhow. And we really shouldn't
> be worrying about legacy SAX implementatoins at this stage; we're
> all bleeding-edge types around here. And if somebody
> wants a C binding, that's going to be different enough from
> Java SAX anyhow that we shouldn't do the SAX prefix just because
> they're going to have to.
If no one wants this, I will happily remove it. If you do want the
"SAX..." prefix on all interfaces, please speak up now.
> > > 3. The interface for reading character streams needs more
> > > specification if it is to be interoperable.
> > > a) There's a critical ambiguity in the concept of a character stream:
> > > a Java concept of a char does not correspond to the XML concept of a
> > > character.
> >What does everyone else think about this point? Is this a good case
> >for pragmatism over logical consistency, or am I introducing an ugly
> >kludge that will come back to haunt us all?
> Is it maybe the right thing to be brutally clear and just have a UTF-16
> character stream? I haven't looked at Java chars as closely as James
> has, but his description sounds exactly like UTF-16. A 16-bit UTF-16
> quantity is not precisely a character, but the places where it isn't
> (non BMP chars) exhibit graceful degradation; if the app knows about
> UTF-16 it does the right thing, otherwise it looks like two unknown
> characters, nothing breaks.
Fair enough -- we could specify that SAXCharacterStream is a UTF-16
stream, or we could even name it SAXUTF16Stream. How will this
interact with Larry Wall's decision to use UTF-8 as the internal
encoding for the next Perl?
> > > b) Is it legal for a byte order mark character to be present at the
> > > start of the character stream? The right answer is that it should not
> > > be legal: this should be stripped out in the byte to character
> > > conversion process.
> >This is a tricky point. I had planned to leave it in -- what is the
> >default behaviour for java.io.Reader (and for other languages with
> >character streams)?
> No; if there's a BOM, that should be eaten by the underlying char stream
> machinery, which should read it and thereafter transparently swap bytes
> or not to produce Java chars without the app having to work at it.
> The spec is clear on this point, and at one with sensible implementation
Should we require all versions to use the Java byte order, or only the
> > > 6. I strongly object to including the name argument in
> > > SAXEntityResolver.resolveEntity. There's nothing in XML that says
> > > that the name should be used in resolving an entity and so there's no
> > > reason to suppose a parser will make it available. I also think it's
> > > wrong in principle to make use of it. This business with "[document]"
> > > and "[dtd]" is gross. At the very least the spec should say that name
> > > maybe null if this information is not available.
> >I'm neutral on this point, though I do agree that "[document]" and
> >"[dtd]" are ugly. Does anyone object to the removal of the name
> I'm with James - any use of the entity name by an application is
> potentially actively harmful, nuke it. -Tim
That's two 'nays' and one abstention (mine). If anyone wants to keep
the entity name argument, please put your case forward quickly.
Thanks, and all the best,
David Megginson ak117 at freenet.carleton.ca
Microstar Software Ltd. dmeggins at microstar.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev