Data models: Groves and tutti quanti.

Didier PH Martin martind at netfolder.com
Tue Feb 1 17:04:44 GMT 2000


Hi Michael

Michael said:
That's what the DOM WG was trying to do when we first wrestled with Groves.
Most of us would probably be a little more general and say that what's
behind the DOM *could be* an XML Grove plan, or it could be the proprietary
MS, Netscape, SoftQuad, Arbortext, Inso, etc. data structures which may or
may not look like Groves.  Or perhaps Groves is already a reasonable
abstraction for all these data structures .... I think we were agnostic on
that subject.

Didier replies:
The DOM is not so agnostic as it seems to be. There is an implicit data
model. Off course, it is not explicitly said how the NodeList is implemented
(an array? a list?) but the data model is nonetheless very present in the
DOM.
For instance, just to say that you have attribute nodes implies that if I
have the following expression:
<Book author="Didier PH Martin" publisher="Wrox" subject="XML">Professional
XML</Book>

Then the DOM implicit model is:

element node
     |___ attribute node = author
     |___ attribute node = publisher
     |___ attribute node = subject
     |___ text node = Professional XML
     |___ other element node if there are any.


So, there is a data model and the DOM is not so agnostic at it seems to be.

Michael said:
How about a mixed content example?  That's where my headache always starts
... What does

   <book> The book
         <title>Professional XML</title> by
         <author>Didier PH Martin</author>
         is available from
         <publisher>Wrox</publisher>
         now.
   </book>

parse to? Is a "content" object what the DOM calls a TextNode?

Didier replies:
headache you said? that's an euphemism! mixed content is what's giving
problems to any model :-))
I my model, where an element = an object, mixed content bring some problems
too because we have here two contents instead of one (and even possibly
worse, more than two). Moreover, the content has to be placed in a specific
order. So, for mixed content my data model is not easily mapped to the
structure and I would have to map into:

object = Book ---- {content = The Book}
   |___ object = title ----- {content = Professional XML}
   |___ object = author ---- {content = Didier PH Martin }
   |___ object = content ---- {content = is available from}
   |___ object = publisher --- {content = Wrox}

So, for mixed content, the solution is less elegant than for other king of
elements. Now, the question is: What is most frequent? do we create a data
model for the 20% (or maybe 10%) cases or a data model that fits the 80%
cases.

Conclusion:
Yes for mixed content Michael I have a big headache too this $"%$/?/($*
(translate all this with all kind of good words about Mixed content :-))))
kind of expression force us to use a data model closer to parse tree and
further from semantic modeling.

If I create an object only for elements (with the exception of mixed content
stuff) then each object is not mapped to a parsed elements but more to a
semantic element. For instance, to take your example, the "book" object is
semantically significant as is the title, the author and  the publisher.

So, Michael, I agree, my data model for the 20% of cases where mixed content
is used is probably not more elegant than the actual model. However, for the
80% it may be better and more seamless with macro structures like directory
services. So that, from the macro to the micro, we can have the same model.

Cheers
Didier PH Martin
----------------------------------------------
Email: martind at netfolder.com
Conferences: Web New York (http://www.mfweb.com)
Book : XML Pro published by Wrox Press
Products: http://www.netfolder.com


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Please note: New list subscriptions and unsubscriptions
are  now ***CLOSED*** in preparation for list transfer to OASIS.





More information about the Xml-dev mailing list