IE5.0 does not conform to RFC2376
Chris Lilley
chris at w3.org
Mon Mar 22 13:12:37 GMT 1999
MURATA Makoto wrote:
>
> I believe that IE 5.0 does not conform to RFC2376 (XML Media Types),
> of which I am a co-author.
>
> As for the XML media type "text/xml", the charset parameter in the
> MIME header is authoritative. Encoding declarations have to be ignored
> so that transcoding is possible.
So, if the file is saved to some local browser cache and then re-read,
it may have no MIME header so the encoding declaration is then
authoritative.
Why can't the transcoding proxy also rewrite the encoding declaration,
since it is rewriting the file anyway? It is trivially easy to find,
process, and change.
I imagine that someone could take some generic charset-converting code
and make a n XML-aware transcoding servlet that rewrote the encoding
declaration in about what, an hour? If someone does this, I will see
about getting it included in the next Jigsaw version.
> However, IE 5.0 appears to always ignore the charset parameter and use
> the BOM or encoding declaration only. Therefore, IE 5.0 does not conform to
> RFC 2376.
Okay. But does RFC 2376 conflict with the XML 1.0 Recommendation?
> Proof: I made a UTF-8 XML document which also parses even when it is assumed as
> Shift_JIS. Then, I provided the correct charaset parameter "utf-8"
> in the MIME header by configuring Apache and provided an encoding declaration
> "Shift_JIS" in the XML document. Such mismatch is perfectly legal and
> usual when proxies perform code conversion. I tried this document with IE 5.0.
> Incorrect characters were displayed. Q.E.D.
Okay, proof accepted.
> When the charset parameter is not specified, it is assumed as US-ASCII.
Wow. So, what this RFC says is that, when used in email and on HTTP, the
encoding declaration is *always ignored*.
That is a pretty big change and, frankly IMHO, ill-advised.
> If you are using Apache and overriding by AddType is allowed, you only have to
> create a file named .htaccess in your directory and write a line as below:
>
> AddType "text/xml; charset=utf-8" xml
Correction: if you are the *administrator* of an Apache server. One of
the ways in which the Web has changed over the last 5 years is that the
percentage of Web authors who also administer the site that they serve
from has dropped from a substantial majority to an insignificant
minority.
What this RFC appears to do is remove author control over correctly
labelling the encoding, and ensure that most if not all XML documents
get incorrectly labelled as US-ASCII. Then, if the parser is working
correctly, they will compain about all bytes with value >127 being
"illegal characters" and halt with a fatal error[1]
So, this RFC removes at a stroke the possibility of authors correctly
labelling the encoding of their XML documents and takes us back to that
dark time (the present) when the majority of, say, Japanese Web content
was mis-labelled. And it seems to have done this simply to save a very
small part of coding effort for people writing transcoders.
I suspect that this was not the desired result.
This could have been avoided:
1) Require explicit charset for overriding the internal encoding
declaration, so if one really wants to re-label a document as US-ASCII
one actually has to send it out as text/xml; charset="US-ASCII"
2) Define the absence of an explicit charset encoding in the MIME
header not as "US-ASCII" but as "use encoding in XML instance" in
accordance with the XML 1.0 Recommendation.
3) Encourage transcoding software to rewrite the internal encoding
declaration
4) Make suitable transcoding softare freely available so that the cost
of not complying with point 3 (write your own) is higher than the cost
of complying with it (use a pre-built one).
Please consider points 1 and 2 to be a defect report on RFC2376
--
Chris
[1] http://www.w3.org/TR/REC-xml.html#charencoding
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list