Non-Unicode Character Sets

John Cowan cowan at locke.ccil.org
Sat Jan 29 19:07:42 GMT 2000


Paul Prescod scripsit:

> I am told that conversion of some character sets through Unicode is
> lossy and cannot be round-tripped. But it occurs ot me that as long as
> one has the private use area, "unknown" characters can always be
> preserved.

Mappings have to serve various purposes: not just round-trippability,
which could be achieved by any arbitrary 1-1 mapping, but also
usefulness.  Not all character set standards agree on what counts
as a character, as opposed to a mere variant that need not be
represented.  Most of Unicode's compatibility characters were added
in order to satisfy these rather disjoint needs.

For example, the Korean standard KSC 5601 provides distinct codepoints
for different "readings" of Chinese characters (hanja) used in Korean writing.
The great bulk of all Chinese characters have only a single reading in
Korean (unlike Japanese), but some few have two, three, or more.
Providing distinct codepoints eased mappings between Korean hanja
and native Korean writing, as each hanja could be given a unique
mapping.

Unicode, however, unified Chinese characters into a single repertoire.
In order to permit round-tripping between KSC 5601 and Unicode,
compatibility characters were added to Unicode for each of the
multi-mapped hanja.

The character set CNS 11643 was not given this treatment, however,
and its (few) multiple mappings do not have Unicode equivalents.  Therefore,
round-tripping is not possible.

> Is there any character set in the world that cannot be considered a
> "subset of Unicode"?

The CCCII standard and its superset EACC (aka ANSI Z39.64) have
many multiple mappings and will not roundtrip through Unicode.

-- 
John Cowan                                   cowan at ccil.org
       I am a member of a civilization. --David Brin

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Unsubscribe by posting to majordom at ic.ac.uk the message
unsubscribe xml-dev  (or)
unsubscribe xml-dev your-subscribed-email at your-subscribed-address

Please note: New list subscriptions now closed in preparation for transfer to OASIS.





More information about the Xml-dev mailing list