Announcement: SAX Java Implementation (pre-release)
David Megginson
ak117 at freenet.carleton.ca
Mon Apr 13 16:20:31 BST 1998
Tim Bray writes:
> At 05:31 PM 12/04/98 -0400, David Megginson wrote:
> > > 1. Why has a SAX prefix been added to all classes?
> >
> >There are a few benefits to this decision:
>
> Kind of unconvincing, I'd have to say. If someone doesn't have it
> together enough to figure out how to use java packages they're
> not going to have much luck with SAX anyhow. And we really shouldn't
> be worrying about legacy SAX implementatoins at this stage; we're
> all bleeding-edge types around here. And if somebody
> wants a C binding, that's going to be different enough from
> Java SAX anyhow that we shouldn't do the SAX prefix just because
> they're going to have to.
If no one wants this, I will happily remove it. If you do want the
"SAX..." prefix on all interfaces, please speak up now.
> > > 3. The interface for reading character streams needs more
> > > specification if it is to be interoperable.
> >
> > > a) There's a critical ambiguity in the concept of a character stream:
> > > a Java concept of a char does not correspond to the XML concept of a
> > > character.
> >What does everyone else think about this point? Is this a good case
> >for pragmatism over logical consistency, or am I introducing an ugly
> >kludge that will come back to haunt us all?
>
> Is it maybe the right thing to be brutally clear and just have a UTF-16
> character stream? I haven't looked at Java chars as closely as James
> has, but his description sounds exactly like UTF-16. A 16-bit UTF-16
> quantity is not precisely a character, but the places where it isn't
> (non BMP chars) exhibit graceful degradation; if the app knows about
> UTF-16 it does the right thing, otherwise it looks like two unknown
> characters, nothing breaks.
Fair enough -- we could specify that SAXCharacterStream is a UTF-16
stream, or we could even name it SAXUTF16Stream. How will this
interact with Larry Wall's decision to use UTF-8 as the internal
encoding for the next Perl?
> > > b) Is it legal for a byte order mark character to be present at the
> > > start of the character stream? The right answer is that it should not
> > > be legal: this should be stripped out in the byte to character
> > > conversion process.
> >
> >This is a tricky point. I had planned to leave it in -- what is the
> >default behaviour for java.io.Reader (and for other languages with
> >character streams)?
>
> No; if there's a BOM, that should be eaten by the underlying char stream
> machinery, which should read it and thereafter transparently swap bytes
> or not to produce Java chars without the app having to work at it.
> The spec is clear on this point, and at one with sensible implementation
> practice.
Should we require all versions to use the Java byte order, or only the
Java version?
> > > 6. I strongly object to including the name argument in
> > > SAXEntityResolver.resolveEntity. There's nothing in XML that says
> > > that the name should be used in resolving an entity and so there's no
> > > reason to suppose a parser will make it available. I also think it's
> > > wrong in principle to make use of it. This business with "[document]"
> > > and "[dtd]" is gross. At the very least the spec should say that name
> > > maybe null if this information is not available.
> >
> >I'm neutral on this point, though I do agree that "[document]" and
> >"[dtd]" are ugly. Does anyone object to the removal of the name
> >argument?
>
> I'm with James - any use of the entity name by an application is
> potentially actively harmful, nuke it. -Tim
That's two 'nays' and one abstention (mine). If anyone wants to keep
the entity name argument, please put your case forward quickly.
Thanks, and all the best,
David
--
David Megginson ak117 at freenet.carleton.ca
Microstar Software Ltd. dmeggins at microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list