Peter Murray-Rust Peter at ursus.demon.co.uk
Sat Mar 1 18:54:28 GMT 1997

I believe that the use of MIME types for interacting with legacy data
has great potential for XML.  I'd welcome comments on the following 

Many legacy documents have been registered as MIME types, and are also capable
of being represented as structured documents.  This means that an XML
application is capable of reading MIME types on-the-fly and converting to
XML internally.  (The only key requirement is that the document description is
well-defined and stable so that it possible to write a DTD (or meta-DTD)
for it.)  

I have done this for my JUMBO parser.  It is able to read in ~12 MIME types
(belonging to chemistry) in native form.  It then converts them into a
Tree object internally and as it parses the document serially, 
adds Nodes and Attributes where appropriate. This is isomorphic to the 
equivalent XML document and can be displayed in the GUI,
edited, etc. and written out as XML.  It is obviously capable of SD searches
as well. The average user therefore sees JUMBO as a universal browser and
possibly as a transformation tool (though _writing_ legacy formats from an
arbitrary tree is usually difficult and information is lost).

The architecture is (fairly) simple.  Each MIME type requires a Java
subclass of SGMLTree.  As the (FORTRAN) document is read, it is poked into the
nodes as appropriate.  One enormous advantage of this is that the order
of the data in the document doesn't cause any problems in writing the
code (whereas for a conventional parser it can be a nightmare - 'have we
already read this section?').  I am still amazed at how valuable this
simple tree-building is.  Of course, SD search techniques can then be 
used to add contextual information for processssing or the tree can be
reordered, pruned, etc.

I think it would be enormously valuable to have MIME->XML converters for
helping us at the editing stage.

This may be easier than we think.  Reading the Java Beans spec (a few months
old, so it may have changed), there are statements like:

'... the [current proposal] .. is that the MIME namespace for data types
shall be used by _DataFlavors_' [an interface for transferable data].

'we want [Java beans] to be able to pretend to be an Excel document inside
a Word document'.

This implies that interfaces (?IDLs) will be produced for common MIME types.
It should therefore be possible to obtain Word, Excel, GIF, RTF, etc. beans.
The XML immplementation would then be:

legacy--[bean]-->JavaInterface--[Java application]-->SDinMemory--[DTD]-->XML

I haven't kept in close touch with Beans, (although I have played with the
beta-release and it's very powerful for what I want to do).  If we could
offer Java browsers for common MIME types, with automatic viewing, editing
merging and transformation into XML, it could be a very attractive way of
bringing people into this arena.


[The only downside is that the magic of XML is completely hidden from
the user :-)]

Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)

More information about the Xml-dev mailing list