SAX: New Idea for Entity Resolution

David Megginson ak117 at
Sat Apr 18 16:19:42 BST 1998

James Clark writes:

 > You could just have a class that encapsulates a structure with three
 > members:
 > - a CharacterStream
 > - a ByteStream
 > - a String
 > At least one of the CharacterStream and ByteStream must be non-null. If
 > the ByteStream is non-null the String can specify the encoding.

[Read on to the bottom for a large-ish design change.]

This implies, then, the following three interfaces:

  public interface ByteStream {
    public abstract int read ()
      throws SAXException;
    public abstract int read (byte b[], int start, int count)
      throws SAXException;

  public interface CharacterStream {
    public abstract int read ()
      throws SAXException;
    public abstract int read (char ch[], int start, int count)
      throws SAXException;

  public class InputSource {
    // For each variable, imagine a get/set pair instead...
    public ByteStream byteStream;
    public CharacterStream characterStream;
    public String encoding;

The nice thing here is that all of these can live on separate systems
in a distributed environment: the InputSource can be a C-program on a
VAX, the CharacterStream can come a Python program running under alpha
Linux, and the parser can be running in Java on a Windows box.  There
is no dependency on language- or system-specific features (except for
java.lang.String, which should be able to map predictably to other

Now, why not take this a step further?

  public class InputSource {
    // For each variable, imagine a get/set pair instead...
    public String publicId;
    public String systemId;
    public ByteStream byteStream;
    public CharacterStream characterStream;
    public String encoding;

We'd have to define rules of precedence:

1) if there is a character stream, use it;

2) if there is no character stream but there is a byte stream, use the
   byte stream;

3) if there is neither a character stream nor a byte stream but there
   is a system identifier, open a connection to the system identifier;

4) if there is no character stream, byte stream, or system identifier,
   throw an exception (or invoke the ErrorHandler).

Now, we can get away with only one parse() method in

  public abstract void parse (InputSource source)
    throws Exception;

It might still be useful to keep two separate methods in
EntityResolver, though:

  public interface EntityResolver
    public String resolveSystemId (String publicId, String systemId)
      throws SAXException;
    public InputSource openEntity (String systemId)
      throws Exception;


All the best,


David Megginson                 ak117 at
Microstar Software Ltd.         dmeggins at

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as:
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list