XML and special Characters : unicode v3.0 ?

John Cowan cowan at locke.ccil.org
Tue Mar 2 15:40:27 GMT 1999


MURATA Makoto wrote:

> It is my understanding that Unicode 3.0 will have many ideographic
> characters which are outside of the BMP.

The Unicode Consortium has indicated on its mailing list
that no non-BMP characters will appear in Unicode 3.0.
(Unless Vertical Extension A is being put in Plane 2 after all?)

> >An application receiving data may either use these signatures to
> >identify the coded representation form, or may ignore them and treat
> >FEFF as the ZERO WIDTH NO-BREAK SPACE character.
> How do you interpret this "or"?

I interpret it as "inclusive or", "and/or", "vel".

> One could argue that when EF BB BF
> is recognized as a signature, it is not treated as the ZWNS.

I think that it may or may not be treated as the ZWNBSP.  In any event,
the whole annex is informative, and describes "a convention [...]
applied by a certain class of applications".  It is reasonable to
suppose that XML is not in that class of applications, at least
so far as UTF-8 recognition is concerned.

-- 
John Cowan	http://www.ccil.org/~cowan		cowan at ccil.org
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list