Proposed process for DTDs in XML (Implementations)

Peter Murray-Rust peter at ursus.demon.co.uk
Mon May 25 15:50:31 BST 1998


Many thanks to all those posting.  I am getting the same sort of critical
mass and focussing as before SAX.

At 14:31 25/05/98 +0200, Ron Bourret wrote:
>
>I might be getting a bit ahead of the game here, so please bear with me --
these 
>thoughts are in my head now and I'd like to get them down.
>
>Trees vs. Events
>----------------
>It seems like we need to decide early on whether we are interested in
getting 
>the DTD as events or a tree.  Arguing in favor of events is the fact that
it is 
>more reasonable to build a tree from events than vice versa (less memory
usage), 
>so events are the more basic form.  However, I also think that what is
returned 
>really depends on intended usage.

I suspect that a tree will be the method of choice if it is used for
retrospective exploration (i.e. after the parsing). In that case the tree
will not be ordered. The only reason I can see for events is that they may
help the parser build the DTD in a particular order (?efficiency?).

I *hope* that we shan't get to the stage where memory usage of DTDs is a
problem. I am aware that DOCBOOK takes ca. 3000 lines (but that includes
PEs) - I assume that TEI in all its glory is larger. But even they
shouldn't cause problems compared to document size.

>
>In my limited imagination, events are mostly useful for display -- read in
the 
>DTD definition-by-definition and display it.  This is a common operation
with 
>the text in an XML document and is presumably why SAX returns events.
Except 
>for displaying a DTD or building a tree, how else would DTD events be used?
>
>The two prime uses of DTDs that I can think of are validation and
exploration.  
>Both of these require the information to stay in memory and be accessed 
>randomly, which (to me) implies a tree, hash table, or similar structure.
Are 
>there any common uses of DTDs that require serial access?

The *order* of declaration of elements in a DTD is presumably irrelevant. I
imagine that parsers have to build the DTD in memory anyway

AFAIR it was said on this list that the two uses of DTDs were:
	- syntactic/structural validation
	- processing minimisation

I have added some other *possible* uses of XTD yesterday and it would
probably be useful to group these and other suggestions to offer as questions.

>
>Flat Trees vs. Tree Trees
>-------------------------
>If trees are used, another question is what form the tree takes.  XML-Data 
>currently defines a tree that uses XML's hierarchy as a way to group
information 
>about individual elements.  However, the relation between those elements is 
>actually flat.  For example, the following DTD converts to the following  
>XML-Data structure:
>
>DTD:
><!DOCTYPE a [
><!ELEMENT a (b)>
><!ELEMENT b (#PCDATA)>
>]>
>
>XML-Data:
><schema id = "a">
>   <elementType id = "a">
>      <element type = "#b"/>
>   </elementType>
>   <elementType id = "b">
>      <string/>
>   <?elementType>
></schema>
>
>Notice that the definitions of a and b are at the same level.  That is,
when I 
>build a DOM tree from this XML, a and b are siblings, not parent and child.  
>When exploring a DTD, the parent-child relationship is far nicer -- I move
up 
>and down the DOM tree and get the metadata I need at each level.  On the
other 
>hand, such a tree complicates the DTD (sorry ;) for XSchema/XTD/etc. and
I'm not 
>sure if representing children with multiple parents would even be possible, 
>given the strict nesting requirements of XML.  Comments?

In JUMBO1 the *elements* are all children of a root XTD node. Each element
has a number of ATTLIST children, and also a single contentSpec child. The
ATTLIST is very flat (just type, default, etc) but the contentSpec can be
hierarchical. I used the terms in the spec (Choice and Seq) as nodes which
a contentSpec could possess recursively. I'd strongly urge sticking to this
because it makes it easy to extract sub-contentSpecs and trivial to parse.

I don't see that there is a useful way that a non-flat tree could be built
up - if the tree is attempting to show the children directly (e.g. not
using Choice and Seq) then we get into recursion. This is the sort of
problem that is faced by tools like Earl Hood's (very nice) dtd2html - a
Perl script for producing SGML documentation. He expands content models
fully the first time and then uses ellipses when the elements re-occur at a
lower level.

	P.

Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list