Combining DTDs

Fri Sep 11 18:22:14 BST 1998

At 17:28 11/09/98 +0200, Ron Bourret wrote:
[...]
>
>I suggest that a short-term solution for the latter is to simply combine 
>elements from different DTDs as one sees fit.  Although the resulting
documents 
>are not valid wrt their original DTDs and cannot be used by DTD-specific 
>applications, XML does not require valid documents and the use of standard
tags 
>facilitates the search process.  I am advocating a certain degree of anarchy 
>here, but the Web is inherently anarchic and if we wait until we find a
way to 
>combine DTDs without breaking DTD-specific applications, we're missing the 
>chance to build some extremely useful applications right now.
>
>(By the way, a nice feature of XML editors that would help this along
would be 
>to read DTDs/schemas from said Yahoo-like repositories, let users insert 
>elements whereever they want from whatever DTDs/schemas they want, and
generate 
>new DTDs as requested.)
>
I agree with this. I have written two DTDs in XML (CML and VHG) both of
which have to interoperate with other *unknown* DTDs. As a simple example,
a paper in chemical physics requires (at least) xHTML, MathML, CML, RDF and
DC. It is inconceivable that a generic DTD can be created that has valid
content models for all conceivable applications in this domain. [It *is*
conceivable that the J.Chem Phys produces a DTD and it's also highly
probable that if J. Phys Chem also does it would use a different one.] I
cannot see how, except in very carefully regulated domains (such as legal,
patent, regulatory) it will be possible to combine generic DTDs to provide
a useful mixture. For example, if someone wishes to embed a <price> in a
<molecule> this is a perfectly possible and reasonable thing to do. Why
should I say they can't?

Example:
<molecule>
  <price currency="USD" unit="litre">1.0</price>
  <atomArray builtin="element">O H H</atomArray>
</molecule>

This does NOT break my software because it simply scans for things it knows
about in content (e.g. <atomArray>). Similarly it's perfectly possible to
scan the document with XLink/Xpointer (whenever they get finalised) to find
a <molecule> with a descendant of  type <price> with attributes of
currency. <price> could easily come from a well defined DTD, as will
<molecule>. This is - and has to be - the approach that CML takes. So
almost all XML-elements will have to have ANY content. This is a pity,
because I'd like to be able to insist that <molecule> contained
(atomArray)* - yes a molecule without atoms is conceivable. I think that
schemas must allow for this - and I believe that XSchema does.

The other approach is to allow links - and I really wish that we could see
some work going on here. There are two ways - one is to have a link on the
molecule, e.g.:
<molecule id="H2O" href="price.xml#water">...

and the other is to have a link database (I have missed out the other XLink
attributes for brevity and because I can't remember the current version of
the spec):

<extendedLink title="chemical catalogue">
  <locator href="molecules.xml#H2O"/>
  <locator href="prices.xml#H2O"/>
</extendedLink>

This is perhaps cleaner, but it's a lot more complicated and not many
people (with 2-3 honourable exceptions) seem to be interested in developing
XLink applications or software.

	P.

Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)