SAX and Unicode question

Matthew J. Evans mje at shakha.com
Tue Jan 6 00:43:17 GMT 1998


(Please forgive me if this post is ill-received.)

How is SAX going to handle Unicode, especially sending 16-bit chars 
(UTF-16) to callback functions? Sending void*'s and/or char*'s in the 
callbacks will leave the application and/or parser guessing what was sent. 
Sending byte order marks in every string seems rather impractical, 
especially since UTF-16 can have null bytes making most string objects 
useless anyway.

(sorry, my Java is NULL. But from what I can tell, the String and 
String_buffer classes do not support 16- or 32-bit chars - correct me if 
I'm wrong)

As a developer, it would be very nice not to have to re-code support into 
my applications. I would like to see some implementation of Unicode in SAX 
that is compatible with most systems and is extensible for when new 
standards come along. (Wide character and encoding support is lacking in 
most software languages).

I do have a couple of ideas if you would like them (omitted for brevity).


- Matthew

<<<<<<< | >>>>>>>
Matthew J. Evans
  Professional Hobbyist
  Santa Fe, New Mexico
  mailto:mje at shakha.com


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list