SAX/C++: Changes for C++
Michael Fuller
msf at mds.rmit.edu.au
Mon Dec 6 04:04:47 GMT 1999
On Thu, Dec 02, 1999 at 04:38:11PM -0500, David Megginson wrote:
> Here are some of the differences between the SAX/Java interfaces and the
> SAX/C++ interfaces:
>
> - lots of const
> - C++ const char * for Java String throughout (and, thus, UTF-8
> instead of UTF-16)
> - InputSource doesn't have an equivalent of Java Reader (no getReader
> method)
I don't mind if the character container is unsigned short or wchar_t
(it doesn't really matter if wchar_t is 32 bits on some platforms as
it's easy enough to convert to/from where required), but put me down
as another vote for UTF-16 rather than UTF-8.
Given that the point of Unicode is to support I18N, why choose as a default
a format that typically has a 50% size overhead for non-European languages?
Many parsers and application happily work internally using UTF-16;
why not standardize that as the default SAX character encoding?
Suggestion:
Do what the Java SAX interface did: optionally provide *both*
ByteStream and CharacterStream components in an InputSource object
Applications can treat the ByteStream as a stream of bytes whose encoding
can either be auto-detected, or is explicitly indicated by the Encoding.
However, a CharacterStream would always be a sequence of UTF-16 characters.
> - SAXException does not allow an embedded exception, because there's
> no need to tunnel exceptions in C++ (you can always throw any
> exception)
Unless you use throw() lists in function declarations; as did the Java spec.
In which case, you need to be able to embed exceptions...
> - DocumentHandler::characters and DocumentHandler::ignorableWhitespace
> don't need the 'start' argument, since they can be passed a pointer
> to the start position in an existing array (that's not possible in
> Java)
Yup.
> - HandlerBase omitted, since the classes can contain their own default
> implementations
I think this has been covered by others; if we define SAX/C++ using
abstract classes, then we need HandlerBase and the Impl classes back
for convenience.
> - I haven't figured out what to do with Parser::setLocale yet
Michael
____________________________________________
http://www.mds.rmit.edu.au/~msf/
Multimedia Databases Group, RMIT, Australia.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list