Weak DTDs

Sat Oct 18 10:26:04 BST 1997

At 17:05 17/10/97 -0400, Paul Prescod wrote:
>
><!ENTITY TEI SYSTEM "http://....">
>%TEI;
><!ELEMENT CAUTION TYEPOF P>
>
This is the sort of construct that I started with (but using HTML2.0). CML
was designed to allow other frequently-used DTDs to be  incorporated into a
single conventional DTD that would validate any 'CML' document.  The
namespace syntax had not been addressed then, and gave me some headaches,
but even when that is neglected I found difficulties.  In essence they
could be summarised by:
	- wishing to insert CML elements within HTML sections (since HTML has very
weak support for typed data).
	- wishing to insert HTML sections within CML elements (e.g the descriptive
hypertext for, say, a molecule.
	In the end the complex rules I devised became unworkable even for me (the
author) that I abandoned them. I therefore gave up formal DTD validation.

Yesterday evening I converted a typical chemical manuscript into CML
including RDF and DC metadata, images, spectra, molecules, bibliography,
XML-LINKs to several related XML and non-XML documents, and so on. I found
the freedom of NOT having a 'conventional' DTD was very liberating. I
believe that (with the latest JUMBO) it displays quite attractively and
meaningfully to human readers. 

So what is the formal value of the document to *non-human* readers? I can
see at least the following:
	- TEI 'searches' of the document (especially with STRING) are very
powerful. [BTW, the fact that TEI defines substrings in PCDATA but not in
attribute values means that I now favour using subelements rather than
attributes. To that extent I think the XML-specs tilt the balance.] I
should like to 'extend' the TEI approach to search for more complex
fragments (early drafts suggested a FOREIGN keyword, which means that any
algorithm can be tacked on). I'd like to keep in step with others here - is
there any consensus on a formalised search language for XML documents?
	- many 'readers' will not need to access all the data in the document, and
can reasonably extract small fragments, e.g.
DESCENDANT(ALL,PERSON)CHILD(1,VAR,BUILTIN,EMAIL)
will locate all the people who have e-mail addresses. 
	- XML-STYLE looks likes being extremely valuable for many document
transformations. [In the early days of JUMBO I wrote a lot of horrible code
to process and display specific elements, and I now realise this should be
done in XML-STYLE. Is anyone else hacking a Java version of XML-STYLE or do
I have to do it myself?].

The most common operations on a generic CML document look like being:
	- display this attractively to a human
	- search document(s) for particular chunks of information and <do
something useful with them>

I then see a role for more specific DTDs for those people who need their
documents to conform to specific formats (e.g. regulatory submissions,
safety sheets, pharmacopeias, etc.) Hopefully they will pick features out
of CML so that the semantics of elements is consistent throughout the
community. 	I am an idealist :-)

BTW I am particularly interested in actual implementations of things
discussed on this list, or people who are interested in developing them
collaboratively. Although XML has come a long way, we are nowhere near
having enough examples of tools to convince the rest of the world :-)

	P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)