Proposed process for DTDs in XML (was Re: Q: SW for parsing DTD's, etc.)

Peter Murray-Rust peter at ursus.demon.co.uk
Sun May 24 00:40:01 BST 1998


At 01:14 23/05/98 -0400, Paul Prescod wrote:
>service in a couple of months. I believe that Jumbo already converts DTDs
>into something XML instance-ish internally.

Yes. And it's because I'd like this to be *syntactically* compatible with
any other similar tool that I'm keen on exploring this idea.
 
>
>If every tool did this, then it wouldn't matter whether DTDs were in
>instance syntax or DTD syntax. Yes, I recognize that this means that every

Agreed. BUT we would have to agree on a set of elementTypes (DTD-speak) or
property-set
(Grove-speak).  As a very simple example, do we use 'ATTRIBUTE' or
'Attribute' or AttName or whatever. Agreement here would go a long way
towards interoperability.

I have raised this idea periodically and SimonStL has persevered over the
last 2 weeks and  my feeling that there is a critical mass of people who
would like to see if something could be formalised out of this. I
encouraged Simon to keep posting so take responsibility for the continued
discussion. This posting includes a proposal as to how we go forward.

NOTE: Objections have been raised on the basis that:
	- such a proposal is impossible and we are bound to fail. In my mind the
operations we are prosing are simply syntactic transformations with an
agreed vocabulary and therefore almost trivial and automatable. I think
that these objectors think we propose something far more ambitious that we
actually do.
	- the proposal is not worthwhile, because the DOM/WG/XML-data/RDF/etc. are
working on this and we are simply duplicating work that they are/will_be
doing.  This is a potentially valid objection, but I suspect that our
effort will be valuable in any case. Even if later subsumed by other
efforts, experience gained will be valuable in and may help those efforts.
	- the proposal is irresponsible because it will encourage people to do
things they didn't ought to be doing. By creating a DTD syntax that is
potentially extensible, it actually will be extended. I think that this
community has shown itself responsible, and I shall suggest that our
proposal is outlined in such a way as to encourage responsibility.

Motivation
----------

There seems to be a feeling that the current DTD syntax does not meet a
number of needs. *** In all discussion that follows I am NOT suggesting
that XML DTD syntax should be replaced *** . I hope that the suggestion
will enhance it.

Current limitations seem to be:

1. There is no mechanism for adding human-readable semantics to a DTD. The
point has been made strongly on XML-DEV that DTDs must be documented, but
the method of documentation is undefined. Even in a simple example like:

<!ELEMENT FOO EMPTY>
<!-- This represents a widget -->
<!ATTLIST FOO PLUGH CDATA #REQUIRED>

it is impossible to know whether the comment is associated with the
element, the attribute, both or neither. 

[I am in the last stages of releasing the next VHG DTD and feel very
strongly the lack of a mechanism for documenting it.] This problem alone is
enough to convince me that a mechanism would be desirable. We all agree
that a DTD per se cannot carry semantic information but there is an urgent
need to be able to associate semantic information with a DTD.

2. There is no mechanism for associating machine-readable semantics with a
DTD. This is also a serious problem for me. I need to be able to link
elements and attributes to behavior (at present through Java classes). I am
not suggesting that we develop universal mechanisms for doing this, but
that we choose a syntax which allows it.

3. There are no defined tools or other processes for analysing a DTD in XML
format. The attraction of a DTD-in-XML is that we can use the very large
number of XML tools for manipulating, filtering, rendering, etc. For
example, JUMBO1 can create a tree from the DTD and can therefore express it
as a JUMBO object just like a document. XSL and CSS could apply to DTDs as
well as documents. Help could be created from DTDs, etc.

It is possible that the XML property set (if it is anywhere defined) might
meet part of this need. If so, and if it can be expressed in a tree
structure, then it could be isomorphous with an XML representation of a DTD.

4. There is a need for an additional level of semantic validation using
concepts not expressible in a DTD [cardinality, data typing, etc.] I think
this is one of the more sensitive areas of this proposal because it
encourages the creations of schemas as opposed to DTDs.

Proposal
--------

I think we can proceed by the methodology that David Megginson developed so
successfully for SAX. The Megginsonic 'dialogue' consists of posting
succinct questions and gathering feedback - as a result of answers gained a
new round of questions are proposed.

I personally do not intend to play this central role although I am happy to
try to provide a subsidiary role as before. It seems that we:
	- need to agree on goals (and especially to limit their scope)
	- define the limits of use of the resulting document.
	- define timescales.

For myself I would suggest the following criteria and hope that others
would be added:

	- the DTDXML DTD should be algorithmically derivable from the XML 1.0 spec
	- a DTDXML for a given DTD should represent the DTD after normalisation
(i.e. no support for PEs and other lexical operations). It should
correspond to information potentially available after parsing and should
correspond roughly to the goals of SAX (i.e. be simple and not try to
represent everything in a DTD)
	- the DTDXML DTD must support Help and documentation. Individual
attributes should be documentable.
	- a DTDXML should be internally addressable by Xlink (unlike DTDs). The
granularity should allow addressing of attributes and elements.
 	- the DTDXML should NOT devise methods for extending the range of
semantic validation.
	- from a DTDXML it should be possible to recreate the corresponding
(normalised) DTD without loss.

I do not know whether Simon can take on the central role himself or whether
volunteers are needed. If he were to devise some procedural questions we
could get a feel of whether and how the process could proceed further.

	P.



Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list