Mix encodings in a document?
John Cowan
cowan at locke.ccil.org
Wed Sep 23 18:26:11 BST 1998
Jerome McDonough wrote:
> Under Unicode version 2.0,
> what I should've said is:
>
> Unicode == ISO-10646-UCS-2 != UTF-16
>
> as Unicode and 10646 in UCS-2 format should be identical, but UTF-16
> differs from both of these in it allows the use of code surrogate
> pairs to enable encoding the BMP and next 16 planes of UCS-4. From
> what I can see at Unicode's home page, it now looks like Unicode is
> dropping UCS-2 character encoding and now only endorses UTF-8 and
> UTF-16, so that the situation now is:
>
> Unicode != ISO-10646-UCS-2
>
> and Unicode sometimes does/sometimes does not equal UTF-16. Is that
> more or less the case at the moment?
"Unicode 2.0" and "Unicode 2.1" always mean UTF-16. UCS-2 proper
(that is, the encoding that does not allow references to what
10646 calls Planes 1 to 10) has never been Unicode since the
distinction between UCS-2 and UTF-16 was invented. Before that,
there was only UCS-2 and Unicode = UCS-2.
So Unicode = UTF-16 != UCS-2, but the distinction is usually
trivial: UCS-2 per se does not define any meaning for surrogate
characters.
--
John Cowan http://www.ccil.org/~cowan cowan at ccil.org
You tollerday donsk? N. You tolkatiff scowegian? Nn.
You spigotty anglease? Nnn. You phonio saxo? Nnnn.
Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list