multiple encoding specs (Re: IE5.0 does not conform to RFC2376)

Tue Apr 6 21:09:36 BST 1999

Rick Jelliffe wrote:
> 
> From: Chris Lilley <chris at w3.org>
> 
>  >An alternative method for achieving the same result is to use a filter
> >(this can be done in Apache and in Jigsaw) which automatically emits
> the
> >correct charset parameter based on reading the encoding declaration in
> >the XML instance.
> 
> I think this is the approach that, ultimately, we all are hoping will be
> deployed.
> 
> We are having a variant of this at our site: for when serving XHTML 

well, and good luck to you making it conform to all these requirements
*and* be processed correctly (for a given value of correct) by todays
HTML "implmentations".

>one must
> 
> * make sure the HTML meta tag is correct (for HTML conformance)
> * make sure the XML encoding is correct (for XML conformance)
> * make sure the MIME charset is correct (for MIME conformance)
> 
> Three chances to get something wrong!  (And not forgetting that some
> HTML editors
> push the metatags or PI around, so you may end up with duplicated tags
> relating to
> character set inside the document.)

So, yes. one needs to be made the master and the others derivved from
it, so that any derived duplicates can be deleted.

> Given that an XML processor may transcode the document without knowing
> the meanings of the elements (i.e., that the meta tag means something),
> the XML encoding has to have priority over the HTML meta tag value. 

Yes

> And
> given that a proxies can transcode text/* files without knowing what
> kind of text it is (i.e., that it is XML, and so has a label), the MIME
> header has to have priority over the XML header PI.

But, 

Given that documents can be stored locally (this is, I think, still the
99% case for document authoring for example), then one can equally show
that the MIME charset parameter has to be derived from the XML encoding
declaration. Alternatively, transcode away but remember to alter the
encoding declaration. (This was my original proposal, although now I
think that auto-generating the MIME data from the document is the best
approach)

> I think that is the
> logical order: generic operations must be allowed.

I guess I think that loading and saving documents is a generic process
too. Quick question - how many transcoding proxies are on  your current
machine? How many on your server?

Now, how many locally stored XML documents are on your current machine?
How many on your server?

Thought so.

> However, it is all spoiled if there are systems which corrupt the
> labels: for example by rewriting the charset parameter incorrectly. It
> is far better to send the XML file without a charset parameter than to
> send it with a wrong one.

Yes, that was also my point - given that XML 1.0 Rec already has an
excellent description of how to read the encoding declaration, and given
that (as has been pointed out) it already has that machinery to deal
with application/xml, then use that declaration as the primary label.
For consistency and robustness, I would make the server also send that
information again as a MIME charset parameter (in the case of text/xml).

> >And if the charset parameter is present, then it should say the same
> >thing as the encoding declaration.
> > The best way to ensure this is to
> >treat the XML encoding declaration as the prmary metadata resource and
> >to programatically derive the charset parameter from this;  greater
> >robustness is at once achieved and also harmonisation of the MIME and
> >XML labelling.
> 
> Yes, with the exception that the XML encoding PI could itself be derived
> from internal data (e.g. in XHTML). 

Well, I will declare that one out of scope except to note a problem - if
there are multiple incompatible META tags - a problem you pointed out -
then which of them do you use?

> The problem isn't really "which should have precedence?" (because all
> systems will break somewhere, given the state of webservers and current
> awareness) as much as it is "how can move the Web towards safety and
> interoperability, where the markup now available at each stage is made
> available to the next?"  I think the current XML media types for MIME
> gives the appropriate policy for charset preference (transcodability is
> one property of text/* which application/* must not have), but, as
> Chris is pointing out, mechanisms to set the MIME charset parameter from
> XML (and to  overwrite it on delivery too) have yet to be put into
> place.

I am working on that. Jigsaw is a nice system to prototype on, and
Apache is a good target given its >50% market share.

> >However, I will point out that it is the consensus of the XML 1.0
> >Recommendation that I am respecting - and that the RFC does not, by
> >altering the meaning of the default encoding.  It could have been
> >harmionised with the XML REC; it was not.
> 
> I think the XML SIG and WG pretty much all had concensus on the RFC at
> the end, in full knowledge of XML 1.0.   But I think many of us came out
> of it thinking that it is safer to use application/xml.

So, the text/xml registration could have said "do not use this media
type for the following reasons".

> In particular, I think that a mismatch between the XML encoding
> declaration and the MIME charset (and the XHTML met tag) should some
> kind of weak Reportable User Error:

On the contrary, I think it should be a fatal error. You can't parse the
document if you don't know whata character is.

--
Chris

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)