unicode confusion
Steve Schafer
pandeng at telepath.com
Tue Jan 4 19:25:43 GMT 2000
On Tue, 4 Jan 2000 13:37:45 -0500 (GMT+5), "Fabio Arciniegas A."
<l-arcini at uniandes.edu.co> wrote:
>Err... David, I thought Java used UTF-8, actually a version slightly
>different from the "typical" version that expresses:
>Characters in the range \u0001 to \u007F in one byte: 0[bits 0-6]
>Characters in the range \u0080 to \u07FF and \u0000 in two bytes:
>110[bits 7 -10] 10[bits 0-6]
>Characters in the range \u0800 to \uFFFF in three bytes: 1110[bits 12-15]
>10[bits 6-11] 10[bits 0-5]
I think he's talking about the _internal_ character format, which is
indeed UTF-16 (without the surrogates).
-Steve Schafer
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list