DOM & Entities

James Tauber jtauber at
Mon Apr 12 17:25:12 BST 1999

> I'm curious as to how this is handled with entities such as those
> used in mathematical equations, or accented characters, or
> other special characters that aren't strictly 'plain text'?

Well strictly they *are* plain text. That's the whole point of XML
characters being Unicode characters. Accented Latin haracters, Japanese,
symbols are just as much plain text as a capital A.

> I'm writing an XML processing application which reads in an
> XML document, performs some processing (based on another
> XML 'rules' document) and then produces a final XML document.
> Ideally I'd like the entities retained from start to finish, so
> that I can be sure that they survive the transformation unchanged.
> But I'm unclear how I can ensure this? Will I have to wrap all
> entity references in CDATA sections before parsing?

A CDATA wrapper wouldn't work because *after* your processing they'd still
be in a CDATA section or would be things like é

If you absolutely want to have entity references at the end of the day, your
safest bet would be to post process the character data and replace any
characters you don't want literally with an equivalent. Character references
might be an even better solution and certainly this would make the post
processing easier. Just run over the text replacing (say) any character >
128 with &#...;

James Tauber / jtauber at /
XML Standards and Product Coordinator
HarvestRoad Communications /

Full-day XML Tutorial @ WWW8 :

Maintainer of :, and

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as: and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list