encoding problem fixed

John Cowan cowan at locke.ccil.org
Fri Jul 30 19:54:53 BST 1999


David Brownell wrote:

> Actually, that's not correct either.  My general advice is to pass a
> URI to the parser -- which is required to do the correct thing! -- and
> in those rare cases that can't be done:
> 
>     * If the data is externally typed according to character set,
>       you MUST use some Reader ... e.g. given a MIME type of
>       "application/xml;charset=Big5", then use a reader set
>       up to use the "Big5" encoding (a Chinese encoding).  There
>       isn't much choice of classes; InputStreamReader, or a custom
>       reader that understands that encoding.
> 
>     * If the data is NOT externally typed, then you MUST rely on
>       the XML parser's autodetection ... pass an InputStream.

This is all quite sound, and I was wrong to overlook the case of
external charset information.
 
> > Actually, it's doing what it's expected to: reading the native charset,
> > CP-1252.  (Unix JVMs use 8859-1 instead.)
> 
> Those are actually system-specific defaults ... many localized versions
> of those environments work differently.  For example UNIX JVMs may well
> use the "EUC-JP" coding in Japan, or MS-Windows the "Shift_JIS".

Reasonable.
 
> In fact, my own basic guidance is never to pass any sort of I/O stream
> (InputStream -or- Reader!) to the parser; let the parser work from the
> URI, if at all possible.  It's normally quite possible, and it's a lot
> less likely to handle the encodings wrong than application code!!

This leads to an interesting question: what do various XML parsers
do when fetching http: URIs that produce explicit charset declarations?
Someone should try Aelfred, etc. and see if the header-level charset
declaration is respected, overriding the internal encoding declaration.

-- 
	John Cowan	http://www.ccil.org/~cowan	cowan at ccil.org
Schlingt dreifach einen Kreis um dies! / Schliesst euer Aug vor heiliger Schau,
Denn er genoss vom Honig-Tau / Und trank die Milch vom Paradies.
			-- Coleridge / Politzer

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list