expat and encodings

Steve Kearon stevek at fineline-software.co.uk
Tue Dec 15 10:20:45 GMT 1998


Can someone clarify the issue of character encodings for me - I think this
is an expat issue, but it may be a more general thing.

I'm trying to save/load text that might contain accented characters (>127).
Running on Windows95. I realise that when writing XML, I either have to
convert such characters to "&#xxx;" form, or note that the file format
encoding is "iso-8859-1", otherwise the XML parser (expat)objects when
subsequently reading the file.

The snag is that whether the file has utf-8 or iso-8859-1 encoding, the text
the application receives from the parser seems to be always utf-8. I've
tried specifying "iso-8859-1" as the encoding to the XML_CreateParser()
call, but this seems to have no effect (I guess the parameter actually
overrides the default (rtf-8) file encoding, rather than specifying the
encoding the client would like to see).

The questions...
Is my understanding correct - does expat feed UTF-8 text to clients when
parsing?
Can expat be asked to feed clients iso-8859-1?
If the client must convert manually, are there any helper functions in
expat/xmltok?
If I use the unicode build of expat, does it feed utf-8, unicode or utf-16?

Many thanks,
Steve Kearon
FineLine Software



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list