Proposal for src files

Fri Apr 3 11:51:46 BST 1998

At 11:20 01/04/98 -0800, Tim Bray wrote:

>For the record, I disagree with the first half of Eliot's thesis here.
>I think that there is an *excellent* chance of getting consortium and
>leading vendors to coalesce in support of a schema proposal which attains
>the notorious MPRDV (Minimum Progress Required to Declare Victory) level
>but does not rule out downstream extensibility.
>
>MPRDV components IMHO are
>
>1. does what DTDs do in as intuitive as possible a way
>2. uses XML syntax
>3. is compatible with the RDF data model
>4. does basic lexical data typing of character data and attribute
>    values

This is a good place to start from (I have also received some private
support for my suggestions.) In fact I was being very conservative and my
proposal was limited to 1 and 2 because I thought that it would be
relatively uncontroversial. Personally I also need 4 (and am keen on 3,
although I'm not sure of the implications). 

It's fairly important to do something quickly as otherwise there will be
arbitrary conflicting syntaxes. An example:

DTD: <!ELEMENT FOO (BAR*)>

How is this represented in XML? The first version of XML-data used
something like OCCURS="STAR", whilst the latest uses occurs="ZEROORMORE".
We all agree that any DTD in XML syntax would need to represent the "*"
concept; I am *merely asking that we standardise the syntax we use* :-)

Similarly XML-data uses 'elementType' to represent XML's <!ELEMENT ...>
construct. Perfectly reasonable, but arbitrary. Others might choose ELEMENT , 
element_type or whatever.

contentspecs are slightly more challenging as we could either simply hold
the string, or could expand this with Choice, Seq, etc.

The *simplest* way to resolve this would be to use the terminology in the
spec itself. Thus we should use AttType [54], AttDef [53], etc. Although
there are probably things that I've overlooked I can't see this exercise
taking more than two hours in a pub. At the end of this we would have:

***A DTD (in XML-DTD syntax) for representing DTDs in XML***

nothing more. An example might be:
<ELEMENT Name="foo">						<!-- [45] -->
	<contentspec>						<!-- [46] -->
		<children type="choice" occurs="ONEORMORE">	<!-- [47] -->
			<cp Name="bar"/>
			<cp Name="plugh"/>
		</children>
	</contentspec>
	<AttlistDecl>						<!-- [52] -->
		<AttDef>					<!-- [53] -->
			<Name>ID</Name>
			<AttType>ID</AttType>			<!-- [54] -->
			<DefaultDecl type="#REQUIRED"/>		<!-- [60] -->
		</AttDef>
	</AttlistDecl>
</ELEMENT>

It seems to me that the spec is so clear that the only decisions are on a
few attribute names (e.g. type above) or whether some attributes should be
elements.

Since this is an xml document we can use XML technology to process it. ***
In particular we could create a stylesheet which filtered out any elements
or attributes not in the XMLDTD set. Thus if someone added
dataType="integer" to an ELEMENT we could easily ignore it whilst reading
the rest. The point is that whatever *additions* are made to the document
above the 'true' DTD can be easily extracted.*** This means that if I
encounter a 'schema' which honours the philosophy above I can
*automatically* extract the DTD from it. 

Our motivations, are of course, for extending it in different ways. The
proposal above seems to preserve complete latitude in how we do this. I
make the following suggestions.

(a) as the DTD is now an XML document we have precise methods for linking
to any component of it. If we wished to say that bar represented an integer
with given ranges, this could be expressed through  an out-of-line link
using XLL. This seems to me the purest way of extending it - essentially an
annotation. [We've spent a long time creating XLL - why not start using it
:-)]

(b) we could put in-line links in the XMLDTD. Thus bar could have a
xml:link to jumbo/xml/bar.class (Java).

(c) we can add elements in the content of ELEMENT, AttDef or other fields.
Thus bar might be:
<ELEMENT bar>
	<contentspec type="#PCDATA">		<!-- a daring compression -->
		<dataType>integer</dataType>
		<AllowedValue>3</AllowedValue>				<AllowedValue>65</AllowedValue>
	</contentspec>
</ELEMENT>

Note that a purist processor can ignore any children of contentspec other
than those in XML1.0

If we can agree on the base terminology and syntax we can then move to
discussing the much more difficult questions of how to extend and whether
there is likely to be any consensus. If we can't agree on the extensions,
at least we have a base that everyone honours. I am often over-simplistic,
but I can't see any downside to this (other than making it slightly easier
to open Pandora's box - which will happen anyway).

	P.

Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)