Characters having an ASCII value > 127

Toby Speight tms at
Fri Sep 18 15:05:10 BST 1998

Steffen> Steffen Rodig <URL:mailto:rodig at>

0> In article <199809181228.OAA16525 at>, Steffen wrote:

Steffen> imagine a plain text file which I want to markup using
Steffen> XML. Now it could be that there are characters in this file
Steffen> whose ASCII value is greater than 127 (in PCDATA sections).

No character has an ASCII value greater than 127: ASCII is a 7-bit
encoding.  Of course, it's possible to use characters beyond ASCII,
since the Document Character Set for XML is Unicode.

Steffen> If I try to use expat on the generated XML file, it tells
Steffen> me that it is not wellformed at the position where such a
Steffen> character occurs.

Perhaps your XML declaration doesn't agree with the actual encoding
of the document (you don't say what either of these are for your
document).  See Sections 2.8 and 4.3.3, and Appendix F.

Steffen> I guess, to correctly interpret and display those characters
Steffen> I have to know the character set which was used to encode the
Steffen> original text file.

Of course - the parser is unlikely to be able to tell the difference
between the various parts of ISO 8859, for instance.

Steffen> How can I communicate this character set to an XML parser?

In the encoding declaration, <?xml encoding="utf-8"?> (or whatever).

You may prefer to write the problematic characters as entities or
character references, if they are rare in your source.  This may
allow you to write your documents in a smaller character set.  (As an
example, I find it easiest to author in ISO-8859-1, but I need to
define entities for the Welsh characters, which lie in the Latin-2


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as:
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list