Mix encodings in a document?

Wed Sep 23 18:26:11 BST 1998

Jerome McDonough wrote:

> Under Unicode version 2.0,
> what I should've said is:
> 
>         Unicode == ISO-10646-UCS-2 != UTF-16
> 
> as Unicode and 10646 in UCS-2 format should be identical, but UTF-16
> differs from both of these in it allows the use of code surrogate
> pairs to enable encoding the BMP and next 16 planes of UCS-4.  From
> what I can see at Unicode's home page, it now looks like Unicode is
> dropping UCS-2 character encoding and now only endorses UTF-8 and
> UTF-16, so that the situation now is:
> 
>         Unicode != ISO-10646-UCS-2
> 
> and Unicode sometimes does/sometimes does not equal UTF-16.  Is that
> more or less the case at the moment?

"Unicode 2.0" and "Unicode 2.1" always mean UTF-16.  UCS-2 proper
(that is, the encoding that does not allow references to what
10646 calls Planes 1 to 10) has never been Unicode since the
distinction between UCS-2 and UTF-16 was invented.  Before that,
there was only UCS-2 and Unicode = UCS-2.

So Unicode = UTF-16 != UCS-2, but the distinction is usually
trivial: UCS-2 per se does not define any meaning for surrogate
characters.

-- 
John Cowan	http://www.ccil.org/~cowan		cowan at ccil.org
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)