XML and special Characters : unicode v3.0 ?
John Cowan
cowan at locke.ccil.org
Tue Mar 2 15:40:27 GMT 1999
MURATA Makoto wrote:
> It is my understanding that Unicode 3.0 will have many ideographic
> characters which are outside of the BMP.
The Unicode Consortium has indicated on its mailing list
that no non-BMP characters will appear in Unicode 3.0.
(Unless Vertical Extension A is being put in Plane 2 after all?)
> >An application receiving data may either use these signatures to
> >identify the coded representation form, or may ignore them and treat
> >FEFF as the ZERO WIDTH NO-BREAK SPACE character.
> How do you interpret this "or"?
I interpret it as "inclusive or", "and/or", "vel".
> One could argue that when EF BB BF
> is recognized as a signature, it is not treated as the ZWNS.
I think that it may or may not be treated as the ZWNBSP. In any event,
the whole annex is informative, and describes "a convention [...]
applied by a certain class of applications". It is reasonable to
suppose that XML is not in that class of applications, at least
so far as UTF-8 recognition is concerned.
--
John Cowan http://www.ccil.org/~cowan cowan at ccil.org
You tollerday donsk? N. You tolkatiff scowegian? Nn.
You spigotty anglease? Nnn. You phonio saxo? Nnnn.
Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list