SAX2: LexicalHandler draft v.1.1

roddey at us.ibm.com roddey at us.ibm.com
Mon Mar 22 20:57:01 GMT 1999




>public interface LexicalHandler
>{
>    public abstract void xmlDecl (String version,
>                     String encoding,
>                     String standalone)
>    throws SAXException;
>

Some of this stuff I've already dealt with in the internal event APIs of
the new IBM parser, so I'd like to throw in a couple of points here.... and
hopefully some of this is not off the actual topic, since I've been too
busy to follow this thread as closely as I should have. If some of this
really applies to another thread, then assume I really wrote it there :-)


1) The xmlDecl() needs another parameter. In addition to the encoding
string, which is the exact text of the string in the document, some
customers need to know what the actual encoding is (which might have been
auto-sensed.) They need this in some cases to get the document back to the
original encoding. So there should be an 'actualEncoding' parameter which
is either the same as encoding (if there was an encoding string in the
document) or the actual encoding used if not (probably in some canonical
format, since there are only about 6 auto-sensed encodings right?)

2) I made the names for the comment, PI, and whitespace call backs on the
DTD handler have different names from those of the ones on the document
handler. This is somewhat safer in C++ since it means not having a single
method override two pure virtuals from a mixin. It also allows the handler
to be less stateful in the situation where the same object is implementing
the handler for both document and DTD (since they then know that its for
one or the other without having to keep flags for that stuff, which is not
really a biggie but I thought it was worth it.)

3) I report whitespace in the DTD, so that it can also be pretty much
exactly recreated. I only report this if I'm asked to (by an 'advanced
callbacks' flag, which also controls comments and PIs being reported from
the DTD.)

4) I have events for the begin/end of the internal subset.

5) I have a callback for notation decl, attlist decls, and attdefs, which
are important.

6) I have a flag on each entity, element, etc... decl callback called
'isIgnored'. This lets the caller know that this one was ignore because it
was a subsequent instance of a previously declared decl. So they don't need
to keep it if they just care about actual content, but they do if they want
to recreate the original document (which is extremely important to some
folks.)

7) I haven't done this yet, but some customers are insisting that any event
callback that reports a quoted string indicate whether single or double
quotes were used (again for recreation of the original document.) This
seems a bit over the top to me, since they are equivalent, but I guess the
customer is always right even when he's wrong.



That's all I can think of right now. It would really be nice if we could
map all of the information that we go through the trouble (and overhead) of
parsing to public APIs. Otherwise, customers end up using our internal
event API in order to get the information that they require. This locks
down our internal API more than we'd like, but there is little we can do
about it if they *have* to have this extra info to do what they do.



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list