Proposed process for DTDs in XML (Implementations)

Mon May 25 15:26:26 BST 1998

Simon St. Laurent wrote:

> I feel strongly that this project will need an implementation, though I also 
> fear that I'm not a good programmer to execute it.  I'd like to see the 
> implementation built on SAX if possible, to continue the tradition of openness 
> it began.  I can see something like a 'validating SAX', (vSAX?) a program 
> which uses the SAX API to parse a DTD (or whatever we call it) and then uses 
> SAX again to parse the document, validating it against the DTD.  vSAX would 
> then use the same SAX API to pass the information to the routine which called 
> it in the first place.  Applications already using SAX could call vSAX without 
> having to make many changes.
> 
> This may go beyond the capabilities of the event-driven model.  Building this 
> project in such a way that the vSAX parser could validate documents without 
> having to build an entire tree would likely warp the DTDs dramatically.  That 
> could be interesting, but I suspect vSAX would have to build a tree 
> internally. 

I might be getting a bit ahead of the game here, so please bear with me -- these 
thoughts are in my head now and I'd like to get them down.

Trees vs. Events
----------------
It seems like we need to decide early on whether we are interested in getting 
the DTD as events or a tree.  Arguing in favor of events is the fact that it is 
more reasonable to build a tree from events than vice versa (less memory usage), 
so events are the more basic form.  However, I also think that what is returned 
really depends on intended usage.

In my limited imagination, events are mostly useful for display -- read in the 
DTD definition-by-definition and display it.  This is a common operation with 
the text in an XML document and is presumably why SAX returns events.  Except 
for displaying a DTD or building a tree, how else would DTD events be used?

The two prime uses of DTDs that I can think of are validation and exploration.  
Both of these require the information to stay in memory and be accessed 
randomly, which (to me) implies a tree, hash table, or similar structure.  Are 
there any common uses of DTDs that require serial access?

Flat Trees vs. Tree Trees
-------------------------
If trees are used, another question is what form the tree takes.  XML-Data 
currently defines a tree that uses XML's hierarchy as a way to group information 
about individual elements.  However, the relation between those elements is 
actually flat.  For example, the following DTD converts to the following  
XML-Data structure:

DTD:
<!DOCTYPE a [
<!ELEMENT a (b)>
<!ELEMENT b (#PCDATA)>
]>

XML-Data:
<schema id = "a">
   <elementType id = "a">
      <element type = "#b"/>
   </elementType>
   <elementType id = "b">
      <string/>
   <?elementType>
</schema>

Notice that the definitions of a and b are at the same level.  That is, when I 
build a DOM tree from this XML, a and b are siblings, not parent and child.  
When exploring a DTD, the parent-child relationship is far nicer -- I move up 
and down the DOM tree and get the metadata I need at each level.  On the other 
hand, such a tree complicates the DTD (sorry ;) for XSchema/XTD/etc. and I'm not 
sure if representing children with multiple parents would even be possible, 
given the strict nesting requirements of XML.  Comments?

-- Ron Bourret

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)