SAX2: LexicalHandler draft v.1.1
David Megginson
david at megginson.com
Tue Mar 23 22:23:07 GMT 1999
roddey at us.ibm.com writes:
> >public interface LexicalHandler
> >{
> > public abstract void xmlDecl (String version,
> > String encoding,
> > String standalone)
> > throws SAXException;
> >
> 1) The xmlDecl() needs another parameter. In addition to the encoding
> string, which is the exact text of the string in the document, some
> customers need to know what the actual encoding is (which might have been
> auto-sensed.) They need this in some cases to get the document back to the
> original encoding. So there should be an 'actualEncoding' parameter which
> is either the same as encoding (if there was an encoding string in the
> document) or the actual encoding used if not (probably in some canonical
> format, since there are only about 6 auto-sensed encodings right?)
With the new SAX2 modular setup, it will be possible for people to
create handlers that provide this level of detail if they want. I'm
still wavering about including the XML Declaration at all.
> 2) I made the names for the comment, PI, and whitespace call backs
> on the DTD handler have different names from those of the ones on
> the document handler. This is somewhat safer in C++ since it means
> not having a single method override two pure virtuals from a
> mixin. It also allows the handler to be less stateful in the
> situation where the same object is implementing the handler for
> both document and DTD (since they then know that its for one or the
> other without having to keep flags for that stuff, which is not
> really a biggie but I thought it was worth it.)
That's an interesting suggestion -- I don't think that the state
information is too much of a burdon, but we can watch closely.
There's also an interop problem, since SAX 1.0 parsers already use
DocumentHandler.processingInstruction() to report PIs in the DTD as
well.
> 3) I report whitespace in the DTD, so that it can also be pretty
> much exactly recreated. I only report this if I'm asked to (by an
> 'advanced callbacks' flag, which also controls comments and PIs
> being reported from the DTD.)
This is too far for the SAX core, but I'd encourage others to develop
handlers like this (a crowded market is a healthy market).
> 4) I have events for the begin/end of the internal subset.
This information is available in the current lexical handler in a
slightly different form: the start/endDTD() handler gives the overall
boundaries, and the start/endEntity() call for "[dtd]" will delimit
the external subset (if any); everything inside the DTD but outside
the external subset (or other external parameter entities) is in the
internal subset by default.
> 5) I have a callback for notation decl, attlist decls, and attdefs,
> which are important.
Notations are already in SAX 1.0 (as required by the XML REC). The
remainder will appear in DTDDeclHandler as soon as I have a chance to
draft a proposal for it.
> 6) I have a flag on each entity, element, etc... decl callback
> called 'isIgnored'. This lets the caller know that this one was
> ignore because it was a subsequent instance of a previously
> declared decl. So they don't need to keep it if they just care
> about actual content, but they do if they want to recreate the
> original document (which is extremely important to some folks.)
Yes, this is still an open question for DTDDeclHandler.
> 7) I haven't done this yet, but some customers are insisting that
> any event callback that reports a quoted string indicate whether
> single or double quotes were used (again for recreation of the
> original document.) This seems a bit over the top to me, since they
> are equivalent, but I guess the customer is always right even when
> he's wrong.
That's precisely why SAX2 (I almost typed "ModSAX" -- sniff, sniff) is
designed for easy extensibility and feature discovery. Business
requirements will demand different types of support for different
situations, and SAX2 provides a clean way to do that. I don't imagine
that we'd put this kind of thing in the core, though.
> That's all I can think of right now. It would really be nice if we
> could map all of the information that we go through the trouble
> (and overhead) of parsing to public APIs. Otherwise, customers end
> up using our internal event API in order to get the information
> that they require. This locks down our internal API more than we'd
> like, but there is little we can do about it if they *have* to have
> this extra info to do what they do.
See my comments above on extensibility.
All the best,
David
--
David Megginson david at megginson.com
http://www.megginson.com/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list