SAX: Status Report

Alex Milowski lex at www.copsol.com
Wed Jan 7 22:45:43 GMT 1998


> CORE EVENTS
> -----------
> 
> So far, there seems to be general agreement on the following event
> callbacks for the XmlApplication interface:
> 
>  public void startDocument ();
>  public void endDocument ();
>  public void endElement (String name);
>  public void processingInstruction (String name, String remainder);
>  
> There is general agreement that the following two should be present,
> but still discussion over their exact form (I'm still tweaking the
> names a bit):
> 
>  public void characters (char ch[], int start, int length, ...?);
>  public void startElement (String name, ...?); 
> 
> (For the first, there is the question of a flag for ignorable
> whitespace, and for the second, the question of how to report
> attributes).

By "core events", do you mean that only a subset of the APIs would
be available?

> ENTITIES
> --------
> 
> There has been a lively and well-informed discussion on entity
> handling.  Many participants are comfortable with something like the
> following for external entities (including the external DTD subset,
> which may contain processing instructions):
> 
>   public void startEntity (String ename, String publicID, String systemID);
>   public void endEntity (String ename);
> 
> (There is also a question about whether public IDs should be
> provided).  Some others suggest that SAX should provide no information
> about external entities, while others suggest that the XmlParser
> interface should have a getLocation() method instead.  The main
> motivation for providing external-entity information (aside from error
> reporting) is to resolve relative URIs in attribute values.

IMHO, parsers should know *nothing* about resolving entities.  Resolving
entities is an orthogonal problem.

> 
> On the issue of entity resolution, there has been less feedback,
> probably because the topic is a little confusing.  I have suggested
> something like this
> 
>   public String resolveEntity (String ename, String publicID, String systemID);
> 
> which would allow simple URI substitution and resolution of public
> identifiers, if desired (in most cases, you could simply return the
> systemID argument unmodified).  Another suggestion is a separate
> EntityManager interface which would allow much more functionality.

The separate entity manager interface is how both SP and my dsssl.parser
APIs in the DSSSL Developer's Toolkit work.  I'd highly recommend this.
You can create "reference" entity managers that lookup a URI and use this
by default.

> ERROR REPORTING
> ---------------
> 
> A majority of participants seem to support using callbacks for error
> reporting, partially to simplify cross-language support:
> 
>   public void warning (String message, int line, int column);
>   public void fatal (String message, int line, int column);
> 
> Note the addition of the 'column' argument -- it has rightly been
> pointed out that XML documents can consist of a single, long line, so
> the line number itself may be useless.  If we do not have some general
> way to determine the current entity (i.e. startEntity and endEntity),
> we will also have to supply the URI of the current entity here.

In both SP and the DSSSL Developer Toolkit, there is some abstract
object or interface that is the "Message Reporter".  This object is
given to the parser when the parse happens or the parse is created.

Again, this is an issue of orthogonality.

> PROLOG
> ------
> 
> No one sees a need for startProlog and endProlog events, but several
> people would like to see an event for the DOCTYPE, if present:
> 
>   public void doctype (String name, String publicID, String systemID);
> 
> where publicID and systemID refer to the external DTD subset, if any.
> This would help with autodetection of different document types.

I see a need for this.  In a proper interpretation of the prolog of the document
you have the following sequence:

(stuff)
start-doctype
internal-subset
end-doctype
potentially process external-subset
(stuff)
document-element

How do you delimit the internal subset from the external subset without ending
the document type declaration.  Remember:

 document type declaration != document type

The document type is defined by the combination of the internal and external
subsets.

> COMMENTS
> --------
> 
> Most people agree that there is no need for SAX to report comments.

Yes there is a need!  If you do not report about comments, how might one
actually edit or process those comments?  An event API should encompass in
some way *all* the information in the document.  

We have a finite number of constructs in XML.  Define an interface for all
of these constructs and be done with it.  If you don't do it now, it may
never get done.

Also, by saying "we don't need that information" and potentially *never* getting
access to such information you beg the question why such a construct is in XML 
at all.  Hence, if you see the necessity for comments to be in XML, the same 
necessity dictates that it must be in your API.

> PARSER
> ------
> 
> Everyone seems to like the idea of a common parser interface.

Yes! ...I thought XAPI-J was about this.

> ARTIST'S RENDITION
> ------------------
> 
> Things are still up in the air, but here is some indication of what
> SAX's central XmlApplication interface might look like in Java:
> 
> 
> /* Beginning of XmlApplication.java */
> 
> public interface XmlApplication {
> 
>   //
>   // Entities
>   //
>   public String resolveEntity (String ename, String publicID, String systemID);
>   public void startEntity (String ename, String publicID, String systemID);
>   public void endEntity (String ename);
> 
>   //
>   // Document structure
>   //
>   public void startDocument ();
>   public void endDocument ();
>   public void doctype (String name, String publicID, String systemID);
>   public void startElement (String name /* and attributes, somehow */);
>   public void endElement (String name);
>   public void characters (char ch[], int start, int length, boolean ignorable);
>   public void processingInstruction (String name, String remainder);
> 
>   //
>   // Error reporting
>   //
>   public void warning (String message, int line, int column);
>   public void fatal (String message, int line, int column);
> 
> }
> 
> /* end of XmlApplication.java */

Why think of this as an XML Application?  What we are talking about here are
components--ones which might be part of a larger system.  Hence, "application"
is a misnomer.
 
...on a similar note, having been too busy lately to keep up.

What is the different between SAX and XAPI-J.  Where did SAX come from?
What are the requirements, design patterns, etc?

...a URL for the above, maybe?

Obviously, I'm confused!  ;-)

==============================================================================
R. Alexander Milowski     http://www.copsol.com/   alex at copsol.com
Copernican Solutions Incorporated                  (612) 379 - 3608

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list