Content-Document-Type: was (Re: MIME types vs. DOCTYPE)

MURATA Makoto murata at apsdc.ksp.fujixerox.co.jp
Fri Feb 26 13:25:33 GMT 1999


I am a co-author of RFC 2376 (XML media types).   I am attaching two 
of my e-mails about text/xml and application/xml.

I am quite sympathetic to Jonathan, but I do not think that the URI 
of the DTD is always appropiate.  Tim's suggestion (a namespace URI plus 
the root element type) sounds very interesting. 

Since so many people (MIME, HTTP, XHTML, XML, ...) are very much interested, 
I am not sure if xml-dev the right place to make the final call.  I still 
do not know where is the right place.  But I promise to read discussion 
at xml-dev carefully.

Cheers,

Fuji Xerox Information Systems

Makoto
----------------------------------------------------------------------
Mail 1.

Regarding additional parameters for text/xml and application/xml, there 
have been some private discussion between Paul Hoffman, Frank Dawson 
and me (see Mail 2).

I have come to like the idea of adding two optional parameters to text/xml and 
application/xml.  They are "profile" and "map".  The "profile" parameter specifies 
a URI, which indicates the profile (e.g., XHTML, SMIL, and MathML) of the XML 
document.  This URI may coincide with the URI of the DTD but may differ.  
By introducing this parameter, W3C can avoid the task of registering many MIME 
subtypes.

The "map" parameter specifies the converstion table from the specified 
charset to Unicode.  It appears that most existing charsets 
have more than one possible conversion tables and it is a good idea to solve 
this conversion issue at one place. 

Issue 1: Long time ago, Michael said that having special types (e.g., text/xhtml) 
helps negotiation.  Does the "profile" parameter make such negotiation impossible?

Issue 2: Some classes of XML applications might require more additional 
information.  (Another parameter was proposed by Frank Dawson.)

Issue 3: Confusion between the namespace URI, schema URI, DTD URI, 
and this URI.

Issue 4: Adding the "map" pseudo-attribute to encoding declarations.

Issue 5: Do we need another parameter whose value is either "external dtd subset", 
"external parameter entity", "external parsed entity", or "document entity"?  
(Note that some MIME entity is an external parsed entity AND a document 
entity at the same time.  Some external DTD subsets can also be used as 
external parameter entities.)

Cheers,

Makoto


-------------------------------------------------------------------------
Mail 2

Needs more information in the MIME header!

MURATA Makoto, Paul Hoffman, Frank Dawson, Jim Whitehead

1. Problem statement

We would like the MIME parser to be able to dispatch different sorts
of XML documents to different applications, such as specialized
programs that handle just one type of XML document.  Because MIME
parsers do not look inside the MIME parts, identifiying the sort of
documents must be done in the MIME headers.  However, neither text/xml
nor application/xml allow such information.

2. Possible solutions

Three approaches have been proposed.  They are (1) specialized media
types such as text/calendar, (2) a top-level media type xml and its
subtypes such as xml/calendar, and (3) a new parameter "externalid" of
text/xml and applcation/xml.

(1) Specialized mime types

For each specialized applications of XML, we introduce a new subtype.
It may introduce more parameters and might even have some added
security consideration.  This is the approach that has been assumed by
the authors of RFC2376.

Pros:

Each application will be documented by some RFC.

Cons: 

When the MIME parser does not know of such a subtype, the only
available fallback is text/plain or application/octet-stream.  That
is, the MIME parser cannot invoke generic XML parsers/viewers, but has
to display the document as a plain text file or save the document in a
file.

Each specialized application will require a new subtype registration,
which takes a lot of time and therefore can have long delays.

(2) New top-level media type xml and its subtypes

Pros:

Fallback to "xml/plain" allows the use of generic XML parsers/viewers.

We can also lift the line termination rule of the top-level media type
"text".

Cons:

It is extremely difficult to register a new top-level media type and
therefore can have long delays (practically, who wants to do this?).
The default behavior is probably not a good enough reason.

Each specialized application will require a new subtype registration,
which takes a lot of time and therefore can have long delays.

(3) A new parameter "externalid" for text/xml and application/xml

This parameter specifies the externalID from the DOCTYPE of the XML
document (if the DOCTYPE is present).  Examples would be:

Content-type: text/xml;
   externalid="http://www.foo.com/whizzy.dtd"
or
Content-type: application/xml; charset="utf-16be";
   externalid="-//IETF//DTD vCard v3.0//EN"

Pros:

This is probably the easiest solution which also provides XML-specific
fallback.

Requires no registration with a central authority.

Other parameters can be added in the future when we have new schemata
or when we find new usage patterns for DTDs. The definitions for those
parameters can define which sets of parameters can appear together.

Cons:

DTD's do not necessarily exist.  For example, RFD metadata do not have
DTD's.

The use of DTD's to choose applications might be an abuse of DTD's.
Moreover, some DTD's might be handled by many different programs on a
system, such as by a specialized processor and one or more XML
browsers.  On the other hand, some applications (such as XML browsers)
handle a variety of DTD's.

There will be new schemata that will probably overshadow DTD's, and
these schemata may not use externalIDs the same way they are used in
today's DTDs.

(4) Yet another parameter "optinfo" or "ADD-PARAM"

On top of (3), provide yet another parameter "optinfo" (list of name-value pairs) 
or "ADD-PARAM" (plain text) for additional information.

Pros:

Some applications of XML require even more appliction-specfic information so 
as to launch appropriate software tools.  For example, if iCalendar information 
is captured by an XML DTD, the text/xml or application/xml MIME header has 
to mimick "method", "component", and "optinfo" of text/icalendar.  (The 
latest internet draft for text/icalendar is available at:  
ftp://ftp.isi.edu/internet-drafts/draft-ietf-calsch-ical-12.txt)

If this parameter is not available, people will abuse the parameter "externalid" 
by providing different names for the same DTD so as to express more information.  
This parameter stops such abuse.

Cons:

The "optinfo" parameter makes it difficult for a simple MIME parser to know 
what to expect in the parameter.   The "ADD-PARAM" parameter does not have 
this problem, but does not have enough expressiveness.


Note: None of the above proposals handle non-monolithic XML documents very well, 
since different islands of non-monolithic XML documents belong to different 
namespaces and thus different schemata.  For example, the MIME parser cannot 
invoke vCard applications if the vCard is embeded by the namespace mechanism.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list