Proposed process for DTDs in XML (Implementations)
Ron Bourret
rbourret at dvs1.informatik.tu-darmstadt.de
Mon May 25 15:26:26 BST 1998
Simon St. Laurent wrote:
> I feel strongly that this project will need an implementation, though I also
> fear that I'm not a good programmer to execute it. I'd like to see the
> implementation built on SAX if possible, to continue the tradition of openness
> it began. I can see something like a 'validating SAX', (vSAX?) a program
> which uses the SAX API to parse a DTD (or whatever we call it) and then uses
> SAX again to parse the document, validating it against the DTD. vSAX would
> then use the same SAX API to pass the information to the routine which called
> it in the first place. Applications already using SAX could call vSAX without
> having to make many changes.
>
> This may go beyond the capabilities of the event-driven model. Building this
> project in such a way that the vSAX parser could validate documents without
> having to build an entire tree would likely warp the DTDs dramatically. That
> could be interesting, but I suspect vSAX would have to build a tree
> internally.
I might be getting a bit ahead of the game here, so please bear with me -- these
thoughts are in my head now and I'd like to get them down.
Trees vs. Events
----------------
It seems like we need to decide early on whether we are interested in getting
the DTD as events or a tree. Arguing in favor of events is the fact that it is
more reasonable to build a tree from events than vice versa (less memory usage),
so events are the more basic form. However, I also think that what is returned
really depends on intended usage.
In my limited imagination, events are mostly useful for display -- read in the
DTD definition-by-definition and display it. This is a common operation with
the text in an XML document and is presumably why SAX returns events. Except
for displaying a DTD or building a tree, how else would DTD events be used?
The two prime uses of DTDs that I can think of are validation and exploration.
Both of these require the information to stay in memory and be accessed
randomly, which (to me) implies a tree, hash table, or similar structure. Are
there any common uses of DTDs that require serial access?
Flat Trees vs. Tree Trees
-------------------------
If trees are used, another question is what form the tree takes. XML-Data
currently defines a tree that uses XML's hierarchy as a way to group information
about individual elements. However, the relation between those elements is
actually flat. For example, the following DTD converts to the following
XML-Data structure:
DTD:
<!DOCTYPE a [
<!ELEMENT a (b)>
<!ELEMENT b (#PCDATA)>
]>
XML-Data:
<schema id = "a">
<elementType id = "a">
<element type = "#b"/>
</elementType>
<elementType id = "b">
<string/>
<?elementType>
</schema>
Notice that the definitions of a and b are at the same level. That is, when I
build a DOM tree from this XML, a and b are siblings, not parent and child.
When exploring a DTD, the parent-child relationship is far nicer -- I move up
and down the DOM tree and get the metadata I need at each level. On the other
hand, such a tree complicates the DTD (sorry ;) for XSchema/XTD/etc. and I'm not
sure if representing children with multiple parents would even be possible,
given the strict nesting requirements of XML. Comments?
-- Ron Bourret
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list