EMBED and validation

Peter Murray-Rust peter at ursus.demon.co.uk
Sun Nov 30 11:01:11 GMT 1997

At 01:25 30/11/97 +0200, Gerard Freriks wrote:
>As an outsider I follow the discussions about the topic.

Welcome Gerard,
	We do not have 'outsiders' here :-). We welcome the diversity and
crossfertilisation from other disciplines.


I was invited to a Drug Information Association mtg 2 weeks ago about
e-submissions for new drugs. There was a lot of excitement about XML. :-)

>Within Health CAre I forsee a need to achieve the following:

I assume you are familiar with the HL7 effort - I believe they are
seriously thinking of using XML.

>- there will be one Universal DTD (or whatever)
>- based on this one DTD users will select portions of it to construct
>- these messages might contain other messages or references to it
>- depending on circumstances decided upon by the user he might or might not
>want to view the whole collection of data as one piece (merged) or as data
>plus references
>- messages will be added to a receiving master patient record and either be
>shown as references or merged.

I think this is a very general concern among the XML/SGML community. A
useful concept is 'information objects' or 'DTD fragments' [please correct
me if these are not identical :-)]. Essentially they are 'Pick-N-Mix' DTDs,
which you combine for your own purposes. Thus in submitting a new drug, you
have to submit clinical records, manufacturing processes, personal data,
documents, safety, statistics, and (yes) chemistry. IMO it is impossible to
create a single DTD that covers all of this. These are all different and
complex disciplines and it is much better to re-use the work that people
who are experts have done. (So, gratifyingly, there was interest in using
CML for drug submissions.)

I would therefore strongly advise people not to develop a multidiscipline
DTD at present without looking carefully at what is being done by the
specialist communities.  That may even extend to textual passages (at least
for technical documents). For example I use XMLised HTML for all my
chemical stuff rather than invent my own <PARA>, <TITLE>, etc. 

The technical problem of how these are combined in any given document is a
very active concern of the W3C and related community. The problem is that
if you simple combine all the relevant DTDs you will get name clashes. E.g.
<A> means anchor for HTML, may mean Answer for someone else, may mean
Author for another. If these are blindly combined, the validation will fail
(DavidD has pointed this out clearly). 

Two current XML ways to get round this are:
XLL, where sections from different DTDs are XML-LINKed, rather than being
merged or included via entities. If the two components are to be jointly
displayed or otherwise combined the application has to be quite flexible.
JUMBO does this by using different java.awt.Frames to display them.

Namespaces. the W3C/XML community is investigating namespaces as a way of
tackling this. There are no firm recommendations yet, so treat this with
great caution. The formal position is (XML 2.3) that 'the colon character
is [...] reserved for experimentation with name spaces'. So, if JUMBO
(which is nothing if not experimental :-) is given two elements
<MathML:VAR> and <CML:VAR> it knows they are different and can also link to
different 'schema' files which will tell you about the different namespaces
(and will enable namespace-dependent display). 

>So which way you organise it, I don't mind.
>And Oh Yes.
>We in medicine count upon the fact that all DTD's and subDTD's will be
>stored in an Internet repository.

Absolutely essential. The curation of DTDs and semantics (e.g through
terminology) is a critical part of markup. Most DTDs are semantically void
(the semantics are added through prose) and this worries me. I therefore
see the need for additional representation of semantics in machine-readable
form, and XML is the obvious format. Therefore JUMBO is able to read
'schemas' (which use DTD information if available) and include help,
datatyping, etc.


Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list