SAX/C++: Changes for C++

Michael Fuller msf at mds.rmit.edu.au
Mon Dec 6 04:04:47 GMT 1999


On Thu, Dec 02, 1999 at 04:38:11PM -0500, David Megginson wrote:
> Here are some of the differences between the SAX/Java interfaces and the 
> SAX/C++ interfaces:
> 
> - lots of const
> - C++ const char * for Java String throughout (and, thus, UTF-8
>   instead of UTF-16)
> - InputSource doesn't have an equivalent of Java Reader (no getReader
>   method)

I don't mind if the character container is unsigned short or wchar_t
(it doesn't really matter if wchar_t is 32 bits on some platforms as
it's easy enough to convert to/from where required), but put me down
as another vote for UTF-16 rather than UTF-8.

Given that the point of Unicode is to support I18N, why choose as a default
a format that typically has a 50% size overhead for non-European languages?
Many parsers and application happily work internally using UTF-16;
why not standardize that as the default SAX character encoding?

Suggestion:
    Do what the Java SAX interface did: optionally provide *both*
    ByteStream and CharacterStream components in an InputSource object

Applications can treat the ByteStream as a stream of bytes whose encoding
can either be auto-detected, or is explicitly indicated by the Encoding.
However, a CharacterStream would always be a sequence of UTF-16 characters.
    
> - SAXException does not allow an embedded exception, because there's
>   no need to tunnel exceptions in C++ (you can always throw any
>   exception)

Unless you use throw() lists in function declarations; as did the Java spec.
In which case, you need to be able to embed exceptions...

> - DocumentHandler::characters and DocumentHandler::ignorableWhitespace 
>   don't need the 'start' argument, since they can be passed a pointer
>   to the start position in an existing array (that's not possible in
>   Java)

Yup.

> - HandlerBase omitted, since the classes can contain their own default 
>   implementations

I think this has been covered by others; if we define SAX/C++ using
abstract classes, then we need HandlerBase and the Impl classes back
for convenience.

> - I haven't figured out what to do with Parser::setLocale yet

Michael
____________________________________________
http://www.mds.rmit.edu.au/~msf/
Multimedia Databases Group, RMIT, Australia.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list