IE5.0 does not conform to RFC2376
MURATA Makoto
murata at apsdc.ksp.fujixerox.co.jp
Wed Mar 24 02:04:26 GMT 1999
XML requires a draconian approach. 100% interoperability for conformant
implementations is most important. Very low interoperability
for non-conformant implementations is acceptable.
As for HTML, users see corrupted documents when the browser chooses
an incorrect encoding. Then, they can tell the correct encoding to
the browser. Thus, it might not be a bad idea to provide 80% interoperability
for conformant implementations and 50% interoperability for non-conformant
implementations. The heuritics in HTML 4.0 is based on such an assumption,
as I see it.
As for XML, recipients of XML might be programs or database systems.
In the worst case, corrupted documents will contaminate the entire
database. A single XML document on the WWW may destroy XML-aware
search engines. Hence, I believe that we need a draconian approach; we
have to ensure 100% interoperability for conformant implementations.
Ideally, it should be possible to point out non-conformant data and
implementations. expat sometimes detects incorrect charsets.
HTTP/1.1 quite clearly says that the charset parameter is authoritative.
If RFC 2376 had said something different, interoperability for
conformant implementations would have been destroyed.
Chris Lilley wrote:
>
>
> MURATA Makoto wrote:
> >
> > I believe that IE 5.0 does not conform to RFC2376 (XML Media Types),
> > of which I am a co-author.
> >
> > As for the XML media type "text/xml", the charset parameter in the
> > MIME header is authoritative. Encoding declarations have to be ignored
> > so that transcoding is possible.
>
> So, if the file is saved to some local browser cache and then re-read,
> it may have no MIME header so the encoding declaration is then
> authoritative.
The same thing applies to HTML. The cache must have MIME headers as well.
> Why can't the transcoding proxy also rewrite the encoding declaration,
> since it is rewriting the file anyway? It is trivially easy to find,
> process, and change.
For security reason, transcoding proxies should not rewrite documents.
Moreover, if we mandate embedded encoding signatures for HTML, XML, CSS,
etc., I18N of flat text will become impossible.
I have believed that there is a conssensus in the W3C team and I am quite
puzzled by your response. You might want to speak with Martin Duerst.
> I imagine that someone could take some generic charset-converting code
> and make a n XML-aware transcoding servlet that rewrote the encoding
> declaration in about what, an hour? If someone does this, I will see
> about getting it included in the next Jigsaw version.
Please don't do that.
> > However, IE 5.0 appears to always ignore the charset parameter and use
> > the BOM or encoding declaration only. Therefore, IE 5.0 does not conform to
> > RFC 2376.
>
> Okay. But does RFC 2376 conflict with the XML 1.0 Recommendation?
As Jon Cowan pointed out, it does not.
> > When the charset parameter is not specified, it is assumed as US-ASCII.
>
> Wow. So, what this RFC says is that, when used in email and on HTTP, the
> encoding declaration is *always ignored*.
If the media type is text/xml, yes. As for application/xml, we use
the procedure in Appendix F of XML 1.0.
> That is a pretty big change and, frankly IMHO, ill-advised.
Frankly, I am quite surprised that a W3C team member says such a thing
in a public place after an RFC is published.
Chris Lilley wrote:
>
> Correction: if you are the *administrator* of an Apache server. One of
> the ways in which the Web has changed over the last 5 years is that the
> percentage of Web authors who also administer the site that they serve
> from has dropped from a substantial majority to an insignificant
> minority.
Are you aware of the "AddCharset" patch developed by W3C Keio? It
allows casual users to configure Apache. Please concact Koga-san at
W3C Keio (y-koga at ccs.mt.nec.co.jp).
Chris Lilley wrote:
> Please consider points 1 and 2 to be a defect report on RFC2376
These points are clearly in conflict with HTTP 1.1.
Cheers,
Makoto
Fuji Xerox Information Systems
Tel: +81-44-812-7230 Fax: +81-44-812-7231
E-mail: murata at apsdc.ksp.fujixerox.co.jp
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list