Non-Unicode Character Sets

Tim Bray tbray at
Fri Jan 28 23:04:38 GMT 2000

At 04:05 PM 1/28/00 -0600, Paul Prescod wrote:
>I am told that conversion of some character sets through Unicode is
>lossy and cannot be round-tripped. 

Errr... deep waters here.  There are a few controversial areas in
Unicode and I'm not sure which one you are referring to.  But in the
general case I believe this to be false.  

The only example I can think of for which this is possibly the case are 
character repertoires that grow dynamically.  One example is the CJK 
ideographs; in various places in Asia they sometimes custom-create
characters, usually for use in proper names.  Another is the practice
of mathematics, where they from time to time create new characters, or
attach semantics to particular renditions of old characters (e.g. R in 
a Fraktur font) and it would be nice to treat these as characters.

>But it occurs ot me that as long as
>one has the private use area, "unknown" characters can always be

Well, only between co-operating subgroups of users who agree on what's
going to go where.

>Are there some
>standardized national character sets with so many non-Unicode characters
>that they cannot fit into the PUA? Even with planes 15 and 16?

I doubt it.  None that are widely known anyhow.

>Is there any character set in the world that cannot be considered a
>"subset of Unicode"?

Yes.  THose that haven't been added yet.  For examples, go to
and look at the candidates that are queued up for addition to future
versions of Unicode. 

Anyhow, is this a real problem anywhere? -T.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as: or CD-ROM/ISBN 981-02-3594-1
Unsubscribe by posting to majordom at the message
unsubscribe xml-dev  (or)
unsubscribe xml-dev your-subscribed-email at your-subscribed-address

Please note: New list subscriptions now closed in preparation for transfer to OASIS.

More information about the Xml-dev mailing list