ASCII control characters in XML

Steve Harris sharris at primus.com
Tue Apr 28 18:21:02 BST 1998


Is it possible to transport UTF-8-encoded text that includes some
characters in the byte range x0000-x001F (ASCII control characters)?
These codes are valid within UTF-8 (via RFC2044), but the XML
specification clearly says that these codes do not constitute 'valid
characters'. My application that wraps Clark's "expat" dies upon
encountering codes in this range, citing well-formedness violations. I'm
looking for the proper method for transporting text that occasionally
includes these codes.
I've been RTFM'ing this for a while now, and I've found plenty of
archived discussion regarding raw binary data as PCDATA content, but
this seems closer to common text-processing problem. Any advice or
further interpretation would be greatly appreciated.

Steven E. Harris
Software Engineer
PRIMUS
1601 Fifth Avenue, Suite 1900
Seattle, Washington 98101
(206) 292-1001 x436


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list