SAX: Next Round (Lexical Event Handler)

David Brownell db at Eng.Sun.COM
Fri Jan 29 20:32:14 GMT 1999

Glad to see this SAX discussion!

James Clark wrote:
> david at wrote:
> >   public interface LexicalHandler
> >   {
> >     public void startDTD (String name, String pubid, String sysid)
> >       throws SAXException;
> >     public void endDTD (String name);
> >     public void startExternalEntity (String name, String pubid, String sysid)
> >       throws SAXException;
> >     public void endExternalEntity (String name) throws SAXException;
> >     public void startCDATA () throws SAXException;
> >     public void endCDATA () throws SAXException;
> >     public void comment (String data) throws SAXException;
> >   }
> >
> > I haven't checked, but I think that this gives us everything we need
> > for DOM level one.

Doesn't quite ... there's some more DTD information needed to:

	*  ensure that PIs within the DTD (e.g. internal subset)
	   don't show up anywhere in the DOM tree (ugh);
	*  see declarations of external general entities;
	*  expose values of defaults so that the DOM can ensure
	   that defaulted attributes always have values;
	*  distinguish attributes which were defaulted from those
	   that were explicitly in the document.

See "com.sun.xml.parser.AttributeListEx" and, in the same package,
"DtdEventListener" for that additional info.  (An upcoming version
supports full recreation of the <!DOCTYPE ...> declaration, which
a number of users have needed.)

(In addition the above, if XML namespaces are to be layerable over
a normal XML 1.0 parser, declarations of all other entities need to
be exposed so they can be examined for conformance:  they must not
contain colons!)

> I wonder whether LexicalHandler ought to extend DocumentHandler.  The
> events it reports are synchronous with the events reported by
> DocumentHandler.  It seems to me that applications are always going to
> want to implement either DocumentHandler or both DocumentHandler and
> LexicalHandler.

That's my logic, and have done it for the extended DTD event reporting
provided in Sun's parser (to support DOM and a few other features that
folk have persuaded me are important).

> I would prefer different callbacks for external general and external
> parameter entities, or at least a parameter to say whether it's general
> or external. This information can be inferred from start/endDTD, but
> that's seems unnecessarily obtuse. I think users will be surprised to
> find both general and parameter entities getting reporting by
> start/endExternalEntity with no distinction.

Actually, I would rather not expose parameter entities at all,
but am certainly open to them being useful.

I'd rather just see a single callback for the general entities,
passing only the name (the ID info was provided already through
the DTD handler, doesn't need repeating).

> Doesn't the DOM allow access to internal entities as well?  This would
> be tough to support because internal entities can be referenced in
> attribute values.  What's the point of reporting just external entities?

DOM allows but does not require this stuff (both internal and external).

- Dave

> What is the thinking behind giving endExternalEntity() a single name
> argument (as opposed to, say, no arguments, three arguments, or a single
> sysid argument)?  Similarily for endDTD()?
> What's the point of the pubid and sysid arguments to startDTD()?  This
> information will be provided via the call to startExternalEntity() for
> the reference to the external subset.
> James

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as:
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list