encoding problem fixed
John Cowan
cowan at locke.ccil.org
Fri Jul 30 19:54:53 BST 1999
David Brownell wrote:
> Actually, that's not correct either. My general advice is to pass a
> URI to the parser -- which is required to do the correct thing! -- and
> in those rare cases that can't be done:
>
> * If the data is externally typed according to character set,
> you MUST use some Reader ... e.g. given a MIME type of
> "application/xml;charset=Big5", then use a reader set
> up to use the "Big5" encoding (a Chinese encoding). There
> isn't much choice of classes; InputStreamReader, or a custom
> reader that understands that encoding.
>
> * If the data is NOT externally typed, then you MUST rely on
> the XML parser's autodetection ... pass an InputStream.
This is all quite sound, and I was wrong to overlook the case of
external charset information.
> > Actually, it's doing what it's expected to: reading the native charset,
> > CP-1252. (Unix JVMs use 8859-1 instead.)
>
> Those are actually system-specific defaults ... many localized versions
> of those environments work differently. For example UNIX JVMs may well
> use the "EUC-JP" coding in Japan, or MS-Windows the "Shift_JIS".
Reasonable.
> In fact, my own basic guidance is never to pass any sort of I/O stream
> (InputStream -or- Reader!) to the parser; let the parser work from the
> URI, if at all possible. It's normally quite possible, and it's a lot
> less likely to handle the encodings wrong than application code!!
This leads to an interesting question: what do various XML parsers
do when fetching http: URIs that produce explicit charset declarations?
Someone should try Aelfred, etc. and see if the header-level charset
declaration is respected, overriding the internal encoding declaration.
--
John Cowan http://www.ccil.org/~cowan cowan at ccil.org
Schlingt dreifach einen Kreis um dies! / Schliesst euer Aug vor heiliger Schau,
Denn er genoss vom Honig-Tau / Und trank die Milch vom Paradies.
-- Coleridge / Politzer
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list