Request for Discussion: SAX 1.0 in C++

roddey at us.ibm.com roddey at us.ibm.com
Tue Dec 7 20:42:31 GMT 1999




Here's my take on it... Note that what I'm saying here reflects the
necessities of supporting really bad C++ implementations, not my personal
feelings. If it were up to me, I'd say use every modern service of C++ and
those who don't have compliant C++ implementation can have a good reason to
get one. But, by an unfortunate decision, I was not made the ruler of the
world... Go figure!

1) I don't mind that we just start of with SAX2 I guess. It makes sense
this late in the game perhaps to just concentrate on SAX2.

2) We would prefer that all data come out of the SAX interfaces as raw
wchar_t strings. This is the most flexible mechanism and does not lock
people into using any particular implementation of a string object. It also
has the highest potential performance for those folks who never need to put
it into anything more formal than a raw array.

3) We agree with the basic desire to avoid object ownership issues, but
wouldn't worry about them if they are well documented. Object ownership is
just a fundamental issue in C++ and if you don't understand them you
probably are going to blow your own foot off no matter what.

4) We would be concerned about some of the SAX2 stuff wrt setting features
(I think its features) via an abstracted object interface because its a
little bit sticky. It can be done, but the point still arises of where does
the desirability of being the same as the Java interface end and the
desireability of having a very natural interface for your own language
begin? I.e. just don't make it so Java'esque that it requires a lot of
trickery to make work on C++. Don't require some common base class.

5) If you wanted to templatize the interface over the character type, we
wouldn't mind particularly. But, considering that any implementation of the
interface would *always* use the same instantiation, why bother? Just
typedef the character type and let each implementation drive it. Its not
likely that a particular build of a particular implementation would need to
change this on the fly, right?

6) The issue of handler ownership is something we punted on. As far as we
are concerned, handlers installed on the SAXParser belong to the caller
because in most cases one object implements a number of handlers.

7) The names of methods of the handlers need to be non-ambiguous to avoid
problems. So DocType handlers should use DocTypeCharacters() or
DTDCharacters() or whatever, and Document handlers should use
DocCharacters() or some such thing. Its just not worth the paranoia of how
implementations would deal with multiple mixed in interfaces having the
same named methods. If the processing should be common, the class
implementing both handlers can delegate to a private method.

8) I disagree with the contention that unsigned shouldn't be used in
interfaces. If the thing being modeled is unsigned, use unsigned because
you are modelling the type desired. I would personally typedef (by logical
usage) all of the fundamental types used by the interfaces and let the
implementation drive them.

9) APIs such as getType() or getValue() should return a "const wchar_t*" so
that the caller uses the returned value directly. The overhead of copying
the return (and having to clean it up) would probably be unacceptable
(actually it wchar_t would be some defined type that is driven by the
implementation.) Yes this involves ownership issues, but as I said, this is
fundamental to C++, so people should probably just 'get over it' :-)

10) I believe that its better to have the interfaces remain pure virtual
and provide a HandlerBase. This lets people who want to be sure that
they've overridden everything be told so by the compiler, and it allows
selective overriding by using HandlerBase where desired.

11) The class names (since we can't afford to use C++ namespaces) should be
expanded to include a SAX prefix to avoid clashes. So SAXParser and
SAXLocator and SAXAttributeList and so on.

12) We added reset() methods to all the handlers. The reason being that, on
the start of a new parse operation, each handler might need to reset its
internal state. We assume that the handlers might be completely unknown to
the code that kicks off the parse event and we didn't want them to have to
assume that the order of events wouldn't change over time (i.e. we didn't
want them to just pick what they think will be the first event and reset
from that.)


That's all I can think of at the moment. I haven't had enough time to look
at SAX2 closely so I don't know what there might be problematic to us in
the C++ world. But, I still think that its good enough to just pick up at
SAX2 as long as SAX2 can be reconcilled with the needs of the C++ world.

----------------------------------------
Dean Roddey
Software Weenie
IBM Center for Java Technology - Silicon Valley
roddey at us.ibm.com



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list