How best to represent unrepresentable characters in NAME tokens?

James Clark jjc at jclark.com
Tue Nov 4 12:31:17 GMT 1997


Andrew Greene wrote:
> 
> If you have a Unicode-friendly XML environment, then users can create
> elements whose GIs or attribute names contain "interesting"
> characters. (Yes? A NAME token can contain "BaseChars", which includes
> characters beyond ASCII and even beyond Latin-1.)
> 
> So, if a user requests that the document instance be saved as an ASCII
> file, what is the best way for a Unicode-aware and standards-compliant
> application to represent these characters?

I would use numeric character references wherever XML allows them; if
there are non-ASCII characters in places where numeric character
references aren't allowed I would use UTF-8 and give a warning to the
user.  The ASCII characters will still be there as ASCII, and the
non-ASCII characters won't get lost, although they will look a bit funny
in an 8-bit editor.  An interesting case is when there are non-ASCII
characters in places where numeric character references are not
recognized but do not cause an error (eg PIs, comments); one could have
an application convention that recognizes numeric character references
in these cases.

> 2. Rename all the offending elements and attributes, and use PIs to
>    ensure that when they're read back in we can patch things up.
>    So, for example, the file could contain:
> 
>    <?GoodCitizen MangledGI Strae1="Stra&#x00DF;e"?>
>    <Strae1>foo bar</Strae1>
> 
>    Advantages: It's fully compliant.

If I was going to do this sort of thing, I think I would use a variation
on URL % encoding.  I would have a convention that underscore (say)
followed by 4 hex digits represented the Unicode character with that hex
code.

James



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list