Characters having an ASCII value > 127
tms at ansa.co.uk
Fri Sep 18 15:05:10 BST 1998
Steffen> Steffen Rodig <URL:mailto:rodig at sdm.de>
0> In article <199809181228.OAA16525 at sunfi1.fi.sdm.de>, Steffen wrote:
Steffen> imagine a plain text file which I want to markup using
Steffen> XML. Now it could be that there are characters in this file
Steffen> whose ASCII value is greater than 127 (in PCDATA sections).
No character has an ASCII value greater than 127: ASCII is a 7-bit
encoding. Of course, it's possible to use characters beyond ASCII,
since the Document Character Set for XML is Unicode.
Steffen> If I try to use expat on the generated XML file, it tells
Steffen> me that it is not wellformed at the position where such a
Steffen> character occurs.
Perhaps your XML declaration doesn't agree with the actual encoding
of the document (you don't say what either of these are for your
document). See Sections 2.8 and 4.3.3, and Appendix F.
Steffen> I guess, to correctly interpret and display those characters
Steffen> I have to know the character set which was used to encode the
Steffen> original text file.
Of course - the parser is unlikely to be able to tell the difference
between the various parts of ISO 8859, for instance.
Steffen> How can I communicate this character set to an XML parser?
In the encoding declaration, <?xml encoding="utf-8"?> (or whatever).
You may prefer to write the problematic characters as entities or
character references, if they are rare in your source. This may
allow you to write your documents in a smaller character set. (As an
example, I find it easiest to author in ISO-8859-1, but I need to
define entities for the Welsh characters, which lie in the Latin-2
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev