text/xml vs. application/xml

Mon Dec 22 03:36:24 GMT 1997

> From: David Megginson <ak117 at freenet.carleton.ca>

> I strongly support Rick's suggestion for application/xml, partly
> because it will avoid the requirement to make several last-minute
> changes to the PR, and partly because it will save XML from being
> trapped by some of the same constraints as HTML.  If typical (private)
> users cannot post XML documents in their web space in languages other
> than English, then the whole effort will be at least a partial
> failure.

No, XML the way it is will be a partial success, and a complete 
success for most scripts and most systems :-)

And people will avoid doing things that break, so eventually mistakes 
will be learned from. This is not as bad a thing as might be expected:
look how much concensus there is on most of XML--many people do not
trust experts when the experts come from a slightly different domain
(e.g. SGML's peoples expertise in electronic publishing was previously
sometimes regarded as being off-topic to web-publishing, whereas in fact
there is a deal of overlap).

This is only natural, and the way humans are. As a methodology, it is
a way to get the simplest possible system: start with the easiest, see
where it breaks, fix it.  

This is like a child playing with a knife: a parent may think themselves
wise to allow a child to play with the knife to discover how sharp it is,
but that parent is not being very far-sighted. 

So we shouldnt despair if people are not ready to accept what I (and I think
what Gavin Nicol and most people who have been trying to come up with useful
solutions agree on this) am saying: that is that 

* all document must be adequately labelled with "prime metadata" 
(i.e., all the information needed to process the information without 
inexact heuristics), and 

* all this prime metadata must be kept with the document at all stages 
of its transmission, in whatever form.

This is why text/xml falls down (broken as designed, as someone might say)
if any intermediate transcoders do not rewrite the MIME headers correctly.
Point-to-point protocols allow makers of intermediate WWW systems to
fiddle with the prime metadata in unpleasant ways.  However, there is no
guarantee that generators of XML will get the encoding correct in the
firstplace (in which case a guess by the server based on locale may give
better results anyway.)   Nevertheless, I think end-to-end service is 
far better in this regard, especially if we need to transmit database
material. 

For example, if a database record is sent

<person>Pavel Ha*ek</person>

where the * character is the c hacek ligature in 8859-2 (&#*c8;)
but blindly re-labelled as being 8859-1. In that case the * is 
E grave, which will be quite wrong. If there is a chance of 
intermediate systems around the world relabelling MIME character
sets inappropriately then that is a big problem for text/xml.

Note that in the above example, a transcoder would also stuff up
the data, unless it was smart enough to know that the file was XML,
and so put in the correct numeric character reference.

Rick Jelliffe

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)