UTF-8 vs UTF-16...?

David Brownell david-b at pacbell.net
Wed Nov 17 18:47:46 GMT 1999


Kragen Sitaker wrote:
> 
> According to the latest Unicode book (is it version 2.0?  Or 3.0?)
> UTF-8 does not allow you to encode more than the first 17 planes of ISO
> 10646. 

The Unicode book has a bias:  it only talks about the Unicode
aspects of UTF-8.  I've always felt that to be a disservice,
since they didn't develop or standardize UTF-8 and are thus
spreading misinformation.  (They could at least _mention_ the
fact that they're presenting a Unicode subset of full UTF-8!)

Better information is thankfully freely accessible.  See:

    http://www.ietf.org/rfc/rfc2279.txt

which includes the details of the five and six byte encodings.

Note that even with a four byte subset of UTF-8, you can encode
characters that can't be expressed in Unicode.  A few of the
test cases in the OASIS/NIST test suite (these cases happen to
come from James Clark's XMLTEST package) have such characters;
and any conformant XML processor must report a fatal error when
it sees them.

- Dave

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list