UTF-8 vs UTF-16...?
David Brownell
david-b at pacbell.net
Wed Nov 17 18:47:46 GMT 1999
Kragen Sitaker wrote:
>
> According to the latest Unicode book (is it version 2.0? Or 3.0?)
> UTF-8 does not allow you to encode more than the first 17 planes of ISO
> 10646.
The Unicode book has a bias: it only talks about the Unicode
aspects of UTF-8. I've always felt that to be a disservice,
since they didn't develop or standardize UTF-8 and are thus
spreading misinformation. (They could at least _mention_ the
fact that they're presenting a Unicode subset of full UTF-8!)
Better information is thankfully freely accessible. See:
http://www.ietf.org/rfc/rfc2279.txt
which includes the details of the five and six byte encodings.
Note that even with a four byte subset of UTF-8, you can encode
characters that can't be expressed in Unicode. A few of the
test cases in the OASIS/NIST test suite (these cases happen to
come from James Clark's XMLTEST package) have such characters;
and any conformant XML processor must report a fatal error when
it sees them.
- Dave
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list