SAX: Byte Stream Needed?

David Megginson ak117 at freenet.carleton.ca
Wed Apr 15 13:29:14 BST 1998


James Clark writes:

 [...]

 > InputStreamReader, however, leaves something to be desired because
 > it doesn't allow users to supply their own character-to-byte
 > conversion routines. But if you have an InputStream you should be
 > using the interface to the parser that takes an InputStream.  In
 > any case it's not practical to use an InputStreamReader for XML
 > because that won't deal with XML's rules for detecting encodings.

I have actually been toying with omitting the byte-stream parse()
method altogether, so that there would be only two parse methods:

  public abstract void parse (String publicId, String systemId)
    throws java.lang.Exception;

  public abstract void parse (String publicId, String systemId,
                              SAXCharacterStream input)
    throws java.lang.Exception;

I've defined SAXCharacterStream as follows:

  public interface SAXCharacterStream {
    public abstract int read () 
      throws SAXException;
    public abstract int read (char ch[], int start, int count) 
      throws SAXException;
  }

(Where SAXException is, in the Java version, a direct and unmodified
subclass of java.io.IOException).  The result of either method is -1
if there are no characters left to read; otherwise, it is a UTF-16
character value for the first, and the number of characters read for
the second.

The advantage of using SAXCharacterStream is that behaviour over CORBA
(or, I suppose, DCOM) is now well-defined.  The disadvantage is
another bloody interface.

I had also written a SAXByteStream, but then I started wondering why
we really need it -- information coming from a database, for example,
or from a buffer should already be in characters, not in raw bytes
(and in Java, at least, it is simply to wrap a Reader around any
InputStream when necessary -- I expect that other languages will have
good internationalisation support soon).

Can anyone put forward a convincing case for having a standard SAX
method parsing from a raw byte stream (remembering that
implementations can always extend the SAXParser interface themselves
for special requirements)?


Thanks, and all the best,


David

-- 
David Megginson                 ak117 at freenet.carleton.ca
Microstar Software Ltd.         dmeggins at microstar.com
      http://home.sprynet.com/sprynet/dmeggins/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list