IE5.0 does not conform to RFC2376

MURATA Makoto murata at apsdc.ksp.fujixerox.co.jp
Wed Mar 24 02:04:26 GMT 1999


XML requires a draconian approach.  100% interoperability for conformant 
implementations is most important.  Very low interoperability 
for non-conformant implementations is acceptable.

As for HTML, users see corrupted documents when the browser chooses 
an incorrect encoding.  Then, they can tell the correct encoding to 
the browser.  Thus, it might not be a bad idea to provide 80% interoperability 
for conformant implementations and 50% interoperability for non-conformant 
implementations.  The heuritics in HTML 4.0 is based on such an assumption, 
as I see it.

As for XML, recipients of XML might be programs or database systems.  
In the worst case, corrupted documents will contaminate the entire 
database.  A single XML document on the WWW may destroy XML-aware 
search engines.  Hence, I believe that we need a draconian approach; we 
have to ensure 100% interoperability for conformant implementations.  
Ideally, it should be possible to point out non-conformant data and 
implementations.  expat sometimes detects incorrect charsets.

HTTP/1.1 quite clearly says that the charset parameter is authoritative.  
If RFC 2376 had said something different, interoperability for 
conformant implementations would have been destroyed.

Chris Lilley wrote:
> 
> 
> MURATA Makoto wrote:
> > 
> > I believe that IE 5.0 does not conform to RFC2376 (XML Media Types),
> > of which I am a co-author.
> > 
> > As for the XML media type "text/xml", the charset parameter in the
> > MIME header is authoritative.  Encoding declarations have to be ignored
> > so that transcoding is possible.
> 
> So, if the file is saved to some local browser cache and then re-read,
> it may have no MIME header so the encoding declaration is then
> authoritative.

The same thing applies to HTML.  The cache must have MIME headers as well.

> Why can't the transcoding proxy also rewrite the encoding declaration,
> since it is rewriting the file anyway? It is trivially easy to find,
> process, and change.

For security reason, transcoding proxies should not rewrite documents.  
Moreover, if we mandate embedded encoding signatures for HTML, XML, CSS, 
etc., I18N of flat text will become impossible.  

I have believed that there is a conssensus in the W3C team and I am quite 
puzzled by your response.  You might want to speak with Martin Duerst.  

> I imagine that someone could take some generic charset-converting code
> and make a n XML-aware transcoding servlet that rewrote the encoding
> declaration in about what, an hour? If someone does this, I will see
> about getting it included in the next Jigsaw version.

Please don't do that.
 
> > However, IE 5.0 appears to always ignore the charset parameter and use
> > the BOM or encoding declaration only.  Therefore, IE 5.0 does not conform to
> > RFC 2376.
> 
> Okay. But does RFC 2376 conflict with the XML 1.0 Recommendation?

As Jon Cowan pointed out, it does not.

> > When the charset parameter is not specified, it is assumed as US-ASCII. 
> 
> Wow. So, what this RFC says is that, when used in email and on HTTP, the
> encoding declaration is *always ignored*.

If the media type is text/xml, yes.  As for application/xml, we use 
the procedure in Appendix F of XML 1.0.
 
> That is a pretty big change and, frankly IMHO, ill-advised.

Frankly, I am quite surprised that a W3C team member says such a thing 
in a public place after an RFC is published.  

Chris Lilley wrote:
> 
> Correction: if you are the *administrator* of an Apache server. One of
> the ways in which the Web has changed over the last 5 years is that the
> percentage of Web authors who also administer the site that they serve
> from has dropped from a substantial majority to an insignificant
> minority.

Are you aware of the  "AddCharset" patch developed by W3C Keio?   It 
allows casual users to configure Apache.  Please concact Koga-san at 
W3C Keio (y-koga at ccs.mt.nec.co.jp).

Chris Lilley wrote:
> Please consider points 1 and 2 to be a defect report on RFC2376

These points are clearly in conflict with HTTP 1.1.

Cheers,



Makoto
 
Fuji Xerox Information Systems
 
Tel: +81-44-812-7230   Fax: +81-44-812-7231
E-mail: murata at apsdc.ksp.fujixerox.co.jp

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list