SAX: New Idea for Entity Resolution

James Clark jjc at jclark.com
Fri Apr 17 05:05:15 BST 1998


David Megginson wrote:
> 
> Here's a different idea for SAXEntityResolver, that would add the
> ability for an application to return a character stream for _any_ URI
> (rather than just the document root):
> 
> public interface SAXEntityResolver {
>   public abstract String filterSystemId (String publicId, String systemId);
>   public abstract SAXCharacterStream openCharacterStream (String systemId);
> }

This is fine except that it should use byte streams not character
streams.  What you get if you are reading from the net or from an
archive or a database or whatever is bytes not characters and it is part
of the function of an XML processor to manage the conversion into bytes
using the encoding declaration and the XML specified mechanisms for
encoding auto-detection.  You could provide both, but the fundamental
one is a for a stream of bytes.  Also the EntityResolver needs to be
able to indicate an externally specified encoding (as with the
additional argument for parse with a SAXByteStream).  In other words
SAXEntityResolver needs to return an object with two members: a
SAXByteStream and a (possibly null) String.

Note that given this I can trivially implement openCharacterStream byte
implementing a SAXByteStream that encodes UTF-16 characters as UTF-16
bytes and specifies UTF-16 as the externally specified encoding.  The
converse is absolutely not the case: in order to do the converse I would
have to provide machinery for parsing the XML declaration and for
managing encoding conversions.

This is basically what I do in XP:

/**
 * This interface is used by the parser to access external entities.
 * @see Parser
 * @version $Revision: 1.4 $ $Date: 1998/02/17 04:20:32 $
 */
public interface EntityManager {
  /**
   * Opens an external entity.
   * @param systemId the system identifier specified in the entity
declaration
   * @param baseURL the base URL relative to which the system identifier
   * should be resolved; null if no base URL is available
   * @param publicId the public identifier specified in the entity
declaration;
   * null if no public identifier was specified
   */
  OpenEntity open(String systemId, URL baseURL, String publicId) throws
IOException;
}

/**
 * Information about an open external entity.
 * This is used to by <code>EntityManager</code> to return
 * information about an external entity that is has opened.
 * @see EntityManager
 * @version $Revision: 1.4 $ $Date: 1998/02/17 04:20:47 $
 */
public class OpenEntity {
  private InputStream inputStream;
  private String encoding;
  private URL base;
  private String location;

  /**
   * Creates and initializes an <code>OpenEntity</code> which uses
   * an externally specified encoding.
   */
  public OpenEntity(InputStream inputStream, String location, URL base,
String encoding) {
    this.inputStream = inputStream;
    this.location = location;
    this.base = base;
    this.encoding = encoding;
  }

  /**
   * Creates and initializes an <code>OpenEntity</code> which uses
   * the encoding specified in the entity.
   */
  public OpenEntity(InputStream inputStream, String location, URL base)
{
    this(inputStream, location, base, null);
  }

  /**
   * Returns an InputStream containing the entity's bytes.
   * If this is called more than once on the same
   * OpenEntity, it will return the same InputStream.
   */
  public final InputStream getInputStream() {
    return inputStream;
  }

  /**
   * Returns the name of the encoding to be used to convert the entity's
   * bytes into characters, or null if this should be determined from
   * the entity itself using XML's rules.
   */
  public final String getEncoding() {
    return encoding;
  }

  /**
   * Returns the URL to use as the base URL for resolving relative URLs
   * contained in the entity.
   */
  public final URL getBase() {
    return base;
  }

  /**
   * Returns a string representation of the location of the entity
   * suitable for use in error messages.
   */
  public final String getLocation() {
    return location;
  }

}

James



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list