Specifying custom characters in XML (was Re: SDATA or UNICODE)

Rick Jelliffe ricko at allette.com.au
Tue Jan 27 17:44:58 GMT 1998

> From: Tim Bray <tbray at textuality.com>

> Martin, I am dismayed that you of all people are counselling egregious
> non-conformance in this manner on this forum. -Tim

I don't think he was...in his original post he said that the formal
XML reference was &#20AC; but that on Microsoft systems the code 128
worked. This was in reply to a question "In other words, how do I 
specify my small salary in euro's?" which I took to be a request for
a workaround.

Maybe we also need to figure when we need to specify characters
and when we just need glyphs (pictures). I think the thing that a 
character has that a glyph may not have is that it is interesting for
searching, indexing and collation. If you need a "character" that is
not in ISO 10646 but it will not be interesting for searching, indexing
or sorting, then you really just want a glyph, and an embedded bitmap
or a reference to a particular font may be fine for you, if you can
get it looking OK.

If you do need to stick in an actual character, then you can make use
of the user-defined codepoints available in ISO 10646. Then, as a next
step, you need to provide a mechanism in your markup to map the
code point to some element or entity or processing instruction.
SGML has a mechanism called short-references you can use for this, but
you still will have to build it into your software, since it is not part
of XML.  

You can use PIs instead of markup declarations: instead of SGML's
	<!SHORTREF blah ...>
in XML you could just use ISO 8879 (i.e. the SGML standard) as the 
PI target
	<?IS8879 SHORTREF blah ...?>

So ISO 10646 gives us free code points we can use. ISO 8879 gives us
the short reference mechanism to let us map a code point to any kind
of markup we need. XML lets us use ISO 10646 and provides PI targets
so we can use any ISO 8879 declaration (that we care to implement)
in the body of the instance.

So all that is needed is to name the character and give its characteristics
as far as collation etc goes, and to point to a glyph. For this the 
TEI Writing System Declaration may provide some useful conventions.

Rick Jelliffe


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list