Announcement: SAX Java Implementation (pre-release)

David Megginson ak117 at freenet.carleton.ca
Mon Apr 13 16:20:31 BST 1998


Tim Bray writes:

 > At 05:31 PM 12/04/98 -0400, David Megginson wrote:
 > > > 1. Why has a SAX prefix been added to all classes?
 > >
 > >There are a few benefits to this decision:
 > 
 > Kind of unconvincing, I'd have to say.  If someone doesn't have it
 > together enough to figure out how to use java packages they're
 > not going to have much luck with SAX anyhow.  And we really shouldn't
 > be worrying about legacy SAX implementatoins at this stage; we're
 > all bleeding-edge types around here.  And if somebody
 > wants a C binding, that's going to be different enough from 
 > Java SAX anyhow that we shouldn't do the SAX prefix just because
 > they're going to have to.

If no one wants this, I will happily remove it.  If you do want the
"SAX..." prefix on all interfaces, please speak up now.

 > > > 3. The interface for reading character streams needs more
 > > > specification if it is to be interoperable.
 > >
 > > > a) There's a critical ambiguity in the concept of a character stream:
 > > > a Java concept of a char does not correspond to the XML concept of a
 > > > character.
 > >What does everyone else think about this point?  Is this a good case
 > >for pragmatism over logical consistency, or am I introducing an ugly
 > >kludge that will come back to haunt us all?
 > 
 > Is it maybe the right thing to be brutally clear and just have a UTF-16
 > character stream?  I haven't looked at Java chars as closely as James
 > has, but his description sounds exactly like UTF-16.  A 16-bit UTF-16
 > quantity is not precisely a character, but the places where it isn't
 > (non BMP chars) exhibit graceful degradation; if the app knows about
 > UTF-16 it does the right thing, otherwise it looks like two unknown
 > characters, nothing breaks.

Fair enough -- we could specify that SAXCharacterStream is a UTF-16
stream, or we could even name it SAXUTF16Stream.  How will this
interact with Larry Wall's decision to use UTF-8 as the internal
encoding for the next Perl?

 > > > b) Is it legal for a byte order mark character to be present at the
 > > > start of the character stream? The right answer is that it should not
 > > > be legal: this should be stripped out in the byte to character
 > > > conversion process.
 > >
 > >This is a tricky point.  I had planned to leave it in -- what is the
 > >default behaviour for java.io.Reader (and for other languages with
 > >character streams)?
 > 
 > No; if there's a BOM, that should be eaten by the underlying char stream
 > machinery, which should read it and thereafter transparently swap bytes
 > or not to produce Java chars without the app having to work at it.
 > The spec is clear on this point, and at one with sensible implementation
 > practice.

Should we require all versions to use the Java byte order, or only the
Java version?

 > > > 6. I strongly object to including the name argument in
 > > > SAXEntityResolver.resolveEntity.  There's nothing in XML that says
 > > > that the name should be used in resolving an entity and so there's no
 > > > reason to suppose a parser will make it available.  I also think it's
 > > > wrong in principle to make use of it.  This business with "[document]"
 > > > and "[dtd]" is gross. At the very least the spec should say that name
 > > > maybe null if this information is not available.
 > >
 > >I'm neutral on this point, though I do agree that "[document]" and
 > >"[dtd]" are ugly.  Does anyone object to the removal of the name
 > >argument?
 > 
 > I'm with James - any use of the entity name by an application is
 > potentially actively harmful, nuke it. -Tim

That's two 'nays' and one abstention (mine).  If anyone wants to keep
the entity name argument, please put your case forward quickly.


Thanks, and all the best,


David

-- 
David Megginson                 ak117 at freenet.carleton.ca
Microstar Software Ltd.         dmeggins at microstar.com
      http://home.sprynet.com/sprynet/dmeggins/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list