text/xml vs. application/xml

David Megginson ak117 at freenet.carleton.ca
Sat Dec 20 18:32:28 GMT 1997

MURATA Makoto writes:

 > >  http://www.microstar.com/XML/donne.xml
 > >  http://home.sprynet.com/sprynet/dmeggins/texts/darkness/darkness.xml

 > As a co-editor of an (upcoming) RFC for text/xml and
 > application/xml, I think that I should point out the correct
 > procedure for encoding determination.  (I have not checked these
 > two Web sites, and Ælfred.)

Thank you very much for the information.  Currently, both of these web
servers return "application/octet-stream" as the MIME type for *.xml
and *.dtd files: in this case, is it correct for an XML parser to fall
back on other character-encoding detection techniques, as Ælfred does?

 > For those XML documents transmitted by the HTTP protocol, XML parsers 
 > should use the charset parameter of the media type text/xml  (BTW, 
 > the default of this parameter is 8859-1).  XML parsers should ignore
 > the encoding declaration within XML documents transmitted by HTTP.  
 > More about this, see the XML PR and the HTTP/1.1

I have two important queries:

1) Are you certain that ignoring the encoding declaration is
   conforming behaviour?  It seems to me that it would make more sense
   to report an error if the charset parameter and the encoding
   declaration differ (especially since the PR requires any document
   without a BOM or encoding declaration to be in UTF-8).

2) Why pick a default encoding that conforming XML parsers are not
   required to support?  Ælfred does accept encoding="ISO-8859-1", but
   some other parsers do not.  It seems to me that either the RFC or
   the PR needs to be amended.

I can also anticipate a different problem: few private people (as
opposed to companies or organisations) have any control at all over
what their HTTP servers send out.  

Imagine an exchange student at a big American University, who wants to
publish a UTF-8 or UCS-2 Arabic XML text in her personal web space.
She will have a very hard time even finding out who is in charge of
the university's HTTP server (if she knows what an HTTP server is),
and she will probably have graduated before the university's
administration has gotten around to approving letting the web-master
look into reporting the correct encoding for her document.

In the end, it looks like application/xml is a _much_ better choice
than text/xml -- with Ælfred, I have found that I can do a very good
job autodetecting character encoding, and I imagine that other parser
writers will find the same.

All the best,


David Megginson                 ak117 at freenet.carleton.ca
Microstar Software Ltd.         dmeggins at microstar.com

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list