IE5.0 does not conform to RFC2376

MURATA Makoto murata at apsdc.ksp.fujixerox.co.jp
Wed Mar 24 02:28:30 GMT 1999


Chris Lilley wrote:
> > Unfortunately this is a side effect of the rules for the media type
> > "text/*", which says that the default value of "charset" is always US-ASCII.
> 
> The default rules if no other rule is in place for a specific Media
> type. The registration for text/xml can overridfe this behaviour if it
> wishes to.

HTTP/1.1 (RFC2068 and the latest "Draft Standard") quite clearly says:

   The "charset" parameter is used with some media types to define the
   character set (section 3.4) of the data. When no explicit charset
   parameter is provided by the sender, media subtypes of the "text"
   type are defined to have a default charset value of "ISO-8859-1" when
   received via HTTP. Data in character sets other than "ISO-8859-1" or
   its subsets MUST be labeled with an appropriate charset value.

Here, the default is 8859-1 ;-(

The latest I-D for RFC2376 also said that the default is 8859-1 when the XML 
document is being tramsmitted by HTTP.  However, the IESG requested  US-ASCII 
as the default.

> > IESG discussed the document today that defines the text/xml media type.
> > We note that it contines the practice of text/plain where the default
> > charset is iso-8859-1 if transported over HTTP, but us-ascii if
> > transported over SMTP.
> >
> > This inconsistency was a result of a wide deployment of HTTP
> > implementations that did not properly following the MIME spec.
> > Having one media type which is used inconsistently between HTTP
> > and SMTP is bad enough, but we don't want to continue this practice
> > for new media types.  Inconsistencies between HTTP and SMTP
> > usage make it more difficult to gateway between HTTP and email,
> > or to use HTTP to access email contents.
> >
> > We suggest to have the charset parameter default to US-ASCII regardless
> > of transport, and strongly recommend that the parameter always be
> > supplied by senders.  (If the sender is unsure whether the charset
> > is US-ASCII or ISO-8859-1, it can safely label it as ISO-8859-1,
> > since the former is a subset of the latter).


Chris Lilley wrote:
> So, in consequence: example file such as the Chinese XML examples at
> http://xml.ascc.net/xml/test/index.html (where each example is available
> in 
> UTF-8, Big5 and GB2312, all correctly labelled in the XML encoding
> declaration) are now sets of invalid XML files which are required to
> produce a critical error because of the invalid byte sequences in what
> is now described as a US-ASCII file?

Yes.  Conformant XML parsers must report a fatal error.  This is great since 
non-conformant data can always be detected.

Examples of conformant XML documents are available at: 
http://www.fxis.co.jp/DMS/sgml/xml/charset/

Cheers,

Makoto
 
Fuji Xerox Information Systems
 
Tel: +81-44-812-7230   Fax: +81-44-812-7231
E-mail: murata at apsdc.ksp.fujixerox.co.jp

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list