IE5.0 does not conform to RFC2376

MURATA Makoto murata at apsdc.ksp.fujixerox.co.jp
Wed Apr 7 07:40:53 BST 1999


Chris,

> It's good to see a concrete proposal. On the other hand, relying on a
> complex convention of filename suffixes is problematic:

I understand your concern.  However, Uchida-san's proposal is not an attempt 
to use convention instead of the charset parameter.  It is intended 
to help to provide the correct charset parameter.  I agree that there are some 
side-effects which some people might oppose to.

> An alternative method for achieving the same result is to use a filter
> (this can be done in Apache and in Jigsaw) which automatically emits the
> correct charset parameter based on reading the encoding declaration in
> the XML instance. This can easily cache its results, and need not
> result in processing overhead on each request.

I strongly agree.   This is the best approach.  I sincerely hope that such 
an attempt will happen at W3C.

> > At *IETF*, the default of the charset parameter for text/HTML *is* 8859-1.
> 
> Yes, which is different to the default for text/* - this demonstrates
> that it is possible to give a more specific rule for a particular
> registration.

Actually, in the case of HTTP MIME, the default of the charset parameter of 
text/* is always ISO-8859-1.  In the case of real MIME, the default of 
the charset parameter of text/* is always US-ASCII.  text/html is not an exception.
text/xml is an exception, since the default is always US-ASCII.  This was 
recommended by ISEG.

> > It is going to be very difficult or
> > impossible, since HTTP and MIME people will disagree.
> 
> I think you mean, HTTP and Mail(SMTP/IMAP/POP). MIME is used by both
> email and HTTP.

HTTP MIME is not quite the same as real MIME.  There are many differences 
between the two.

> > There have been a lot of discussion about this issue.  None of your arguments
> > are new to me.  In fact, my original opinion was not so different from yours but
> > I have changed my mind during the discussion.  More about this, see the archive
> > of the XML SIG (around April and May of 1998).
> 
> OK, I will check this out. I cannot of course discuss such material in
> this forum, however. Perhaps you could post your technical reasons for
> the change of direction here?

text/xml has to be consistent with HTTP and MIME.  Autodetection 
or the use of META tags as the default of the charset parameter has been 
extensively discussed by HTTP people and MIME people.  They strongly dissent.

> But, if it is not present,
> then the XML Rec says exactly what should happen; 

Appendix F is non-normative.  RFC2376 supercedes it, as intended by the 
XML WG.   XML 1.0 cleary says:

  "Rules for the relative priority of the internal label and the MIME-type 
  label in an external header, for example, should be part of the RFC document 
  defining the text/xml and application/xml MIME types.   ...  in particular, 
  when the MIME types text/xml and application/xml are defined, the recommendations 
  of the relevant RFC will supersede these rules."

By the way, now that RFC 2376 is publisehd, XML 1.0 will be revised.

>carefull wording which
> this RFC nullifies. Problems arise if an XML file is saved from the Web
> to a local filesystem, perhaps for further editing; the MIME charset
> information is lost. It could perhaps be stored in some way - but, there
> is already a standard way - the XML encoding declaration.

Since it is a standard way, RFC 2376 recommends recipient programs to 
rewrite encoding declarations.

> And if the charset parameter is present, then it should say the same
> thing as the encoding declaration. 

This disallows code conversion by proxy servers.  One could argue 
that proxy servers should rewrite encoding declarations.  However, 
documents should not be rewritten for security reasons.  Moreover, 
if we require different code conversion for different subtypes of text, 
there is not much hope for interoperability, especially because 
fallback to text/plain is required.

> The best way to ensure this is to
> treat the XML encoding declaration as the prmary metadata resource and
> to programatically derive the charset parameter from this;  greater

If it is done when the document is stored in the WWW server, that is 
superb.

> However, I will point out that it is the consensus of the XML 1.0
> Recommendation that I am respecting - and that the RFC does not, by
> altering the meaning of the default encoding. It could have been
> harmionised with the XML REC; it was not. 

RFC 2376 IS the consensus (it was not unanimous, though).  It is based 
on really extensive discussion at the XML SIG and XML WG.  My mail 
folder named text/xml has 687 e-mails ;-(   Larry Masinter (the HTTP WG 
chair) and Martin Duerst (the I18N IG chair) was heavily involved.  On 
the other hand, appendix in XML 1.0 is merely informative and was meant 
to be replaced by the XML media type RFC.

Cheers,

Makoto
 
Fuji Xerox Information Systems
 
Tel: +81-44-812-7230   Fax: +81-44-812-7231
E-mail: murata at apsdc.ksp.fujixerox.co.jp

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list