unicode confusion

Tim Bray tbray at textuality.com
Tue Jan 4 19:07:04 GMT 2000


At 01:37 PM 1/4/00 -0500, Fabio Arciniegas A. wrote:
><snip/> 
>> Note that Java uses UTF-16, which isn't quite fixed-width, though no
>> one really notices.
>
>Err... David, I thought Java used UTF-8, actually a version slightly
>different from the "typical" version that expresses:

Java has come with a succession of library classes that advertised
UTF-8 support; the first few iterations were so hopelessly broken that I 
gave up on them, but I've been told that recent versions are verging
on usable.

What David was saying is that in Java, the basic "char" data type
is 16 bits, and thus is naturally used to hold UTF-16-encoded text.  I
have no idea if the library classes do the right things with UTF-16
surrogate pairs either in String or char[] contexts, but my experience 
with String processing in Java is that it's often best just to ignore
those libraries anyhow and roll your own. -Tim

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list