Extender characters, Production 89 of XML 1.0

John Cowan cowan at locke.ccil.org
Mon Jan 11 19:08:13 GMT 1999

Elliotte Rusty Harold wrote:

> In XML ["extender"]
> characters can be used anywhere a base character or ideographic
> character can be used.

This is not quite true, because extenders are not name-start characters
in either XML or Unicode.

> However I have been unable to find in the Unicode book or Web site any
> definition of what makes a character an extender. Can anyone clue me in on
> why some Unicode characters have the extender property while others don't?
> What's the logic behind this grouping of characters across languages?

Roughly (and unofficially) speaking, an extender is something that isn't
a letter or combining mark but often appears embedded in words.
For example, one may use L plus MIDDLE DOT as a compatibility equivalent
of L WITH MIDDLE DOT in writing Catalan, and we do not want a
Catalan name to break into two names at the MIDDLE DOT.
(The dot is used to distinguish two successive Ls, written with
a dot, from the unitary Catalan letter "ll", written without a dot.)

Extenders are enumerated (but not explained) in Section 5.14 of
the Unicode Standard.

John Cowan	http://www.ccil.org/~cowan		cowan at ccil.org
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list