XML and special Characters : unicode v3.0 ?

Tim Bray tbray at textuality.com
Mon Mar 1 18:24:13 GMT 1999


At 12:58 PM 3/1/99 -0500, John Cowan wrote:
>> For instance, the Sinhala character set was not in Unicode 2.0 but will be
>> in 3.0. How do I get one of those characters in an XML document ? 
>
>There is a discrepancy between the prose, which says "legal Unicode/10646
>characters" and references old versions of these standards, and
>the BNF, which says the Char production handles everything except
>known control characters (and even some of those).

John's right.  And it's not the Sinhala that first brought it home, but
the Euro character, which is clearly OK per production [2] but isn't
a "legal yadda yadda yadda" per the particular amendment of 10646/Unicode
that the XML spec references.  The W3C has some I18n heavies trying
to figure out what to do - life is made more complicated by the fact
that the Unicode people and the IETF i18n people don't always point
in the same direction, sigh; did you know the BOM was legal in UTF-8?
And of course by the fact that Unicode/10646 is a moving target.

But the bottom line is (see the public errata to the XML spec)
that production [2] is normative; both in theory and in practice,
XML processors pass through everything in that range.  In practice,
I've never actually seen anything outside of the BMP, but the 
experts agree they're showing up real soon now.   

How to get it in? Something like 𐌳 I expect.  As a programmer,
it'll show up either as two UTF-16 surrogates or 4+-byte UTF-8 string,
neither of which will look in the slightest like hex 10333.  -Tim


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list