Content-Document-Type: was (Re: MIME types vs. DOCTYPE)

Mon Mar 1 20:09:54 GMT 1999

Greetings,

	Here I am speaking for myself, not the HTML Working Group or CNET:

> -----Original Message-----
> From: Walter Underwood [mailto:wunder at infoseek.com]
> Sent: Friday, February 26, 1999 9:42 AM
> To: xml-dev at ic.ac.uk
> Cc: www-html-editor at w3.org
> Subject: Re: Content-Document-Type: was (Re: MIME types vs. DOCTYPE)

<SNIP Content="off-topic"/>

> The objection about thin clients or palmtops not wanting to download
> large files doesn't really hold water. XML will generally be the 
> smallest files. Mine are almost always smaller than the corresponding
> HTML. Powerpoint, PDF, JPEG -- those are big files. 

	This is simply incorrect. The limited capabilities of thin clients
and the expense of transmission of the information require
capabilities-based analysis and profiling of documents on a per-client
basis. As an example, consider a web-enabled cellphone such as this one:
http://www.attws.com/business/pocketnet/index.html. The transmission costs
to this
device vary greatly worldwide, from ~$1/minute in the US to ~$22/min in
Nairobi (actually you can only get basic cell phone via satellite in
Nairobi, but let's pretend.) If I send a 1/2 megabyte XHTML file to this
device, including its 100K CSS stylesheet, the user is entirely justified in
bringing legal action against me. The page would cost many tens or hundreds
of dollars to send, and of course could not be displayed. In fact the client
phone would necessarily display an HTTP error message (or its equivalent) on
the tiny screen. Not to mention the costs of transmitting the inevitable
~12k banner ad, which again cannot be displayed. (Information may want to be
free but information providers want to get paid.)

	At this point in time, no method other than MIME types exists for
informing the client of the type of content
arriving, without first downloading the entire file and then checking it, an
obvious absurdity. Doctypes, FPIs, 
etc. have all be suggested, but none of these solutions provides the
necessary level of transaction control required to identify the content
prior to content reception. Given the massive costs involved, the client
must always be allowed to reject content prior to downloading the entire
file. 

> Adding an XML-specific HTTP header line makes HTTP 1.1 more complex
> (shudder), and imposes an extra coding and testing burden on HTTP
> implementations. Also, it does nothing for XHTML over other 
> transports,
> like SMTP or FTP.

	It is also introducing a new set of dependencies for all XML
documents. Not feasible.

> Essentially, this is document information, not protocol information. 
> It belongs in the document. To describe the document out-of-line, 
> use RDF, not HTTP headers.

	Thin clients will almost necessarily reject all RDF documents (and
most XML documents in general).
RDF is complex and experimental; I am unconvinced that a cell phone should
have to deal with it. 

> Pragmatically, HTTP Content-type isn't even reliable. Somebody will 
> decide that Excel and XML are the same thing, and start serving 
> spreadsheets as text/xml. Cell phones have to deal with that world, 
> and adding things to the HTTP spec doesn't fix ignorant sysadmins. 

True; unfortunate; costly for the victims; possibly legally actionable.

> XHTML Spec comment: the spec doesn't mention application/xml. 
> It should. 
> If application/xml is never appropriate for XHTML (say, the UTF-16
> encoding is forbidden), then say so.

	The XHTML spec is very clear on this, explicitly stating the MIME
types that can be used. No other MIME types are *ever* appropriate. With
MIME types being used for document type identification, sending a document
with the wrong MIME type guarantees an error.

> 
> XHTML Spec comment: Are the Strict, Transitional, and Frameset DTDs
> subsets or extensions? Or neither? Is one a subset of another? These
> intentions should be spelled out in the spec so that future versions
> won't break them.
> 

The 3 XHTML DTDs are neither subsets or extensions in a literal sense. They
correspond as closely as possible to the HTML 4.0 DTDs of the same names.
While to some extent the 'strict' DTD is a subset of the other two, it also
uses different content models for elements with the same name. Once could
not, for practical purposes, use it as an external subset and include the
frameset DTD as an internal DTD subset without conflict between their
content models. I will not attempt to justify the division of HTML into
these 3 groupings - this was decided by the HTML 4.0 committee and is
loosely justified by the HTML 4.0 specification. Current attempts are
designed to follow this  existing prior art to the greatest extent possible.

Regards,

D-

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)