SAX2 RFD: LexicalHandler draft v.1.1

David Brownell db at eng.sun.com
Fri Apr 2 21:10:13 BST 1999


I'd have responded sooner, but this discussion started on the
day I left for some vacation ... :-)

Note that some of this feedback comes from having implemented
versions of this functionality and from user feedback on it.
(Based on earlier discussions, some on xml-dev.)

That's in the latest parser from Sun (TR1); some folk might
care to play with that code a bit.  (There's also a version
of DTDHandler extensions too -- essential! :-)

Short summary:  the basic idea is still right, though I think
the DTD related stuff should be done a bit differently.

- Dave


David Megginson wrote:
> 
> // LexicalHandler.java
> // $Id: LexicalHandler.java,v 1.1 1999/03/21 02:49:41 david Exp $
> // SAX2 handlerID: http://xml.org/sax/handlers/lexical
> 
> package org.xml.sax;
> 
> public interface LexicalHandler
> {
>     public abstract void xmlDecl (String version,
>                                   String encoding,
>                                   String standalone)
>         throws SAXException;

I'd far prefer to drop XML declarations; if they're to
be provided, I'd rather see a general text declaration
facility (version and encoding) applying to all parsed
entities.

Then, standalone would look like the special case it is;
perhaps with a callback just for that boolean value, when
it's even provided.  (Standalone is trivalue:  yes, no,
and unspecified.)


>     public abstract void startDTD (String doctype,
>                                    String publicID,
>                                    String systemID)
>         throws SAXException;
> 
>     public abstract void endDTD ()
>         throws SAXException;

These IMHO belong in the DTDHandler2 interface !  

Also, we've found it essential to see the internal subset;
it's most practical to report it as a single string.  If
one can't see that subset, one can't plan to round-trip
the data in a document, and the ability to do that sort of
round-trip is critically important.  (Even though some folk
want more data to pass through than others -- e.g. many
don't care about CDATA boundaries, comments, etc.)

In fact, what Sun did for this functionality was to
partition it into three things (in DTD callbacks):

	startDtd (String rootName)
	endDtd ()
	    ... "start" has the declared root name
	externalDtdDecl (String publicID, String systemID)
	    ... just for the unnamed [dtd] PE
	internalDtdDecl (String internalSubset)
	    ... the literal internal subset

This permits "safe" and complete recreation of the doctype
declaration.


>     public abstract void startEntity (String name)
>         throws SAXException;
> 
>     public abstract void endEntity (String name)
>         throws SAXException;

Right ... except that we pass a boolean "included" flag with
the startEntity() call to meet the XML 1.0 specification 
requirement to report entities that aren't included (e.g. a
nonvalidating parser of some types).  To "pass through" one
needs to be able to reproduce all entity refs, and the flag
is needed to distinguish entities with no content from ones
which just weren't read. 

As I noted earlier, and James did more recently, this can't
apply to entities in attribute values.  It needs to be
specified/documented accordingly -- these callbacks must only
apply to content.  (I'll look at the proposal for attribute
handling later.)

There was also the issue of whether this is a general or a
parameter entity ... we took the position that for sanity,
we'd only present _general_ entities this way.  For example,
PEs inside markup declarations would be pretty useless.

PE/DTD parsing can be a separate ("SAX3"? :-) set of features,
and with any luck the popular tools will develop using XML-syntax
schemas rather than PEs and that "SAX3" module won't ever need
to happen; it'd need to be messy.


>     public abstract void comment (String text)
>         throws SAXException;
>
>     public abstract void startCDATA ()
>         throws SAXException;
> 
>     public abstract void endCDATA ()
>         throws SAXException;

Right, all this is basically needed in that form.

> }
> 
> // end of LexicalHandler.java

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list