multiple encoding specs (Re: IE5.0 does not conform to RFC2376)

Rick Jelliffe ricko at
Tue Apr 6 15:55:57 BST 1999

From: Chris Lilley <chris at>

 >An alternative method for achieving the same result is to use a filter
>(this can be done in Apache and in Jigsaw) which automatically emits
>correct charset parameter based on reading the encoding declaration in
>the XML instance.

I think this is the approach that, ultimately, we all are hoping will be

We are having a variant of this at our site: for when serving XHTML one

* make sure the HTML meta tag is correct (for HTML conformance)
* make sure the XML encoding is correct (for XML conformance)
* make sure the MIME charset is correct (for MIME conformance)

Three chances to get something wrong!  (And not forgetting that some
HTML editors
push the metatags or PI around, so you may end up with duplicated tags
relating to
character set inside the document.)

Given that an XML processor may transcode the document without knowing
the meanings of the elements (i.e., that the meta tag means something),
the XML encoding has to have priority over the HTML meta tag value. And
given that a proxies can transcode text/* files without knowing what
kind of text it is (i.e., that it is XML, and so has a label), the MIME
header has to have priority over the XML header PI. I think that is the
logical order: generic operations must be allowed.

However, it is all spoiled if there are systems which corrupt the
labels: for example by rewriting the charset parameter incorrectly. It
is far better to send the XML file without a charset parameter than to
send it with a wrong one.

>And if the charset parameter is present, then it should say the same
>thing as the encoding declaration. The best way to ensure this is to
>treat the XML encoding declaration as the prmary metadata resource and
>to programatically derive the charset parameter from this;  greater
>robustness is at once achieved and also harmonisation of the MIME and
>XML labelling.

Yes, with the exception that the XML encoding PI could itself be derived
from internal data (e.g. in XHTML). All of them need to be harmonized.

The problem isn't really "which should have precedence?" (because all
systems will break somewhere, given the state of webservers and current
awareness) as much as it is "how can move the Web towards safety and
interoperability, where the markup now available at each stage is made
available to the next?"  I think the current XML media types for MIME
gives the appropriate policy for charset preference (transcodability is
one property of text/* which application/* must not have), but, as
Chris is pointing out, mechanisms to set the MIME charset parameter from
XML (and to  overwrite it on delivery too) have yet to be put into

>However, I will point out that it is the consensus of the XML 1.0
>Recommendation that I am respecting - and that the RFC does not, by
>altering the meaning of the default encoding.  It could have been
>harmionised with the XML REC; it was not.

I think the XML SIG and WG pretty much all had concensus on the RFC at
the end, in full knowledge of XML 1.0.   But I think many of us came out
of it thinking that it is safer to use application/xml.

In particular, I think that a mismatch between the XML encoding
declaration and the MIME charset (and the XHTML met tag) should some
kind of weak Reportable User Error: people who don't want to accept
transcoded text which has been mislabelled should have some kind of user
option to report the error or abort.

application/xml for safety
text/xml for reach

Rick Jelliffe
Academia Sinica Computing Centre
Taipei, Taiwan

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as: and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list