ANN: XML and Databases article

Thu Sep 9 11:36:06 BST 1999

Steven R. Newcomb wrote:
> > The nice thing about groves is that all groves, regardless of what
> > they are built on, have certain commonalities, such as
> > addressability, so you can perform certain common functions with
> > them.
>
> Right.  All nodes in groves have the same "object model" (I'm using
> this term in a more formal, scientific sense than the term is used in
> the phrase "Document Object Model (DOM)".)  The grove object model is:
> Groves have nodes, nodes conform to classes, and classes have named
> properties with value constraints.  Nodes have named properties, and
> values for those properties.  That's about it; the rest is detail.
> (It's pretty interesting detail.)

What other common functions can you perform besides addressing / 
hyperlinking?

> > GroveMinder is generic grove middleware. It has plug-ins, called
> > Minders (I think of them as drivers),
>
> Hooray, thank you!  I have sometimes called them "notation drivers"
> only to get the blankest stares imaginable.  (I then have asked
> something lame, like, "Do you know what a device driver is, and why we
> have them?")  But you obviously get the point of Minders: Minders
> represent plug and play support for individual notations, in a system
> that makes all content look alike (i.e., conform to the grove object
> model).

Actually, if you had said "notation driver" to me, I would have given you 
the same blank stare.  The problem for me is not "driver", but "notation" 
-- it might mean something to SGML gurus, but is unlikely to mean much to 
the average programmer, who at least has a chance with "driver".  I can't 
think of a simple generic term right off, but the following would do pretty 
handily -- "Just like you need a different ODBC driver for each brand of 
database, you need a different Minder for each type of data -- SGML 
document, database, email, Word documents, and so on."  It's probably not 
quite technically correct, but it will get the point across.

>
> > that can build groves over different property sets. For example,
> > there is one Minder for SGML/XML documents and a different Minder
> > for relational databases.
>
> Well, actually, there's probably a one-to-one correspondence between
> property sets and database schemas.  In order to address information
> in terms of its structure, you have to know the structure.  In
> grove-land, the structure is defined by a property set.  Different
> databases have different structures, normally expressed as database
> schemas.  Making a database look like a grove is very straightforward.
> The bulk of the work is translating the schema into a property set
> (which is, after all, a kind of schema).  There's a bit of coding
> involved, too, but the GroveMinder developer kit has tools that make
> this amazingly easy.  (At least the Lockheed-Martin people were
> amazed, and they said so publicly at XML '98.)

What do you mean by "different databases" here?  If you mean relational 
databases v. hierarchical databases v. object-oriented databases, then 
there's no problem -- I would expect each to have a different property set. 
On the other hand, if you mean DB2 v. Oracle v. Informix v. SQL Server, 
then it seems there is something broken -- I would very much expect all 
relational databases to have the same property set.

> The grove paradigm breaks down the distinction between documents
> (resources) and databases.  Everything, in its addressable form, is a
> grove, and a grove is a database.

Saying that a grove is a database strikes me as a bit misleading, at least 
in the database world.  A grove is a database, in the sense that it 
contains data and you can extract that data, but in this sense, a 
spreadsheet is a database, a Word document is a database, a file system is 
a database, and so on. I think it would be more accurate to say that a 
database (in the traditional sense of the term) can be used to persist a 
grove, especially as you go on to say:

> ... If the resource is *already* a database, there's
> probably no parsing or processing involved.  All that needs to be done
> is to put a translating layer over it that makes the database look
> like a grove.  Then, the database and all its contents are fully able
> to participate in the wider world of interchangeable information
> resources: they can be linked, re-used by reference, have any kind of
> metadata associated with them, etc. etc.
>
> > One thing GroveMinder can do is store a grove in its own
> > database. (Note that this is separate from the database addressed by
> > the relational database Minder -- it has a structure designed to
> > store groves.) Thus, GroveMinder can store an XML document in a
> > database as a grove and is what I, in my article, called a content
> > management systems. That is, it can store and retrieve an XML
> > document as a document.
>
> Sounds right to me.  ("...its own database" sounds a bit odd because
> GroveMinder can use any ODBMS for grove storage.)

What I meant to do here was distinguish between the database that 
GroveMinder uses to store groves and other databases that might be external 
resources, such as that Informix database run by Billy Bob over in 
Engineering. (Granted, GroveMinder's database can undoubtedly be treated in 
a fashion similar to Billy Bob's database, but that's getting a bit 
self-referential and misses the point that GroveMinder runs just fine 
without Billy Bob's database but can't run without its own.) I'm also 
assuming that the database used by GroveMinder is configured for grove 
storage, although this might not be such an earth-shaking amount of work -- 
I'll take a guess that the class definitions are fed pretty much directly 
into the database as schema.

>
> > Some questions:
>
> > 1) Is it possible to combine groves of different types? For example,
> > can I take a grove representing a table in a relational database and
> > stuff it into a grove for an XML document, for example as the
> > content of an element?
>
> I'm afraid I don't grasp the intent of this question.  When such an
> XML document is exported from its grove as an XML document, what
> should the document look like?
>
> There's no need (and no way) to stuff something into something else.
> It is only necessary that the "content" property of the element have,
> as its value, the node in the database grove that represents the
> table.  The ISO standard SGML Property Set does not allow this; only
> certain classes of nodes within the same grove are allowed as the
> value of the "content" property of "element" nodes.  However, if you
> want to change your operative SGML Property Set so that this will be
> permitted, nothing (other than good sense) prevents you from doing it;
> the grove paradigm will readily support you in your madness.
>
> I don't know why it would be sensible to regard an RDBMS table as the
> content of an SGML or XML element.  The normal meaning of "content" is
> elements, character data, and/or other SGML constructs, right there,
> inside the element.  There is no way to write a general purpose
> grove-to-SGML converter unless the classes of the nodes that can
> appear in element content are limited and known.  (We certainly don't
> want to dump arbitrary data into the content of an element; this would
> invite a situation in which the document that is ultimately exported
> is unparsable.)

What I meant here was whether you could perform an operation similar to 
embedding an Excel spreadsheet in a Word document -- that is, can I 
(easily, generically, and without modification of property sets) combine 
information from different groves into a single grove. This would be 
extremely useful because it would be a step on the way towards being able 
to query the entire enterprise with requests such as, "Get me the names, 
addresses, and company prospectuses of all customers that I've sent email 
to in the last three days". The names and addresses come from the corporate 
database, the email information comes from my email, and the prospectuses 
come from a document database somewhere or perhaps the Web. The result 
could be navigable by a generic grove navigation tool.

My guess is that groves right now give me this functionality by 
hyperlinking one grove to the next.  This is fine in some cases (a 
grove-based query tool), not in others (wanting to expose tabular database 
data as XML). Actually, I would have been absolutely amazed if groves could 
do this without any sort of conversion software.

(Note that when the Word document is persisted, the result isn't really a 
"Word" document, it's (if I've got my terminology right), an OLE Compound 
Document. When you get to the point where the spreadsheet is, Word no 
longer has a clue how to process it. Instead, there's a flag that says, 
"Yo! Go start Excel" and processing is handed over to Excel. A similar 
situation would be reasonable in a "compound" grove. It could easily be 
persisted to a grove database, which understands groves, but couldn't 
readily be persisted as XML or a database without conversion software.

This is the fundamental difference between what I classify as data transfer 
middleware and content management systems. Data transfer middleware views 
the XML document as something it understand -- data -- and "converts" it to 
a database format; this is similar to Word storing a spreadsheet as a 
table. Content management systems view XML documents as documents and 
simply store them in a database rather than a text file.)

> >  If so, does the table grove retain its table-ness, or is it
> > converted to one or more XML elements?  Both cases seem reasonable,
> > although the latter would presumably require a special converter. If
> > the latter case is true, then GroveMinder might also fit what I call
> > data transfer middleware, depending on how the conversion is done.
>
> I would suggest that an efficient way to handle this would be to
> convert the table into node classes that *are* permitted to appear in
> element content, and then make *those* nodes the value of the content
> property.  If you do it this way, you're necessarily making the
> decisions that must be made about how the XML document, when exported,
> will reflect the table data.

Exactly.

> You're right that one application of GroveMinder is data transfer
> middleware.  The conversion program is comparatively easy to write,
> since everything already conforms to the same object model.

I'm not quite sure I understand this. Above, you've just said you need to 
make decisions about how database node classes are converted to XML node 
classes, which makes sense to me. What do you mean by "everything already 
conforms to the same object model"?  As far as I can tell, groves allow 
everything to be expressed as objects (which have a few common properties), 
but these objects are no more the "same object model" than a model for a 
person is the same as a model for a book.  Groves may have gotten me from 
some other format (notation) into an object model, but whether converting 
one object to another is easy depends on the objects themselves.

> > 2) Are groves themselves relevant at a high level in a discussion of
> > XML and databases? It strikes me that, like SAX and the DOM, they
> > are a useful tool in implementing software that stores/retrieves XML
> > documents (or data from those documents) in a database but are not
> > directly relevant to the discussion itself. Instead, they are most
> > relevant to the user in that they are likely to weigh heavily in the
> > feature set exposed by a content management system or (possibly)
> > data transfer system.
>
> Good question.  I guess that's for the person who's doing the
> discussing to decide.  Since groves can be persistent (e.g., stored in
> databases), and since XML resources can become groves, it seems to me
> that groves are relevant.  You're right, the real reason they're
> interesting is their impact on feature sets.  But aren't feature sets
> (and especially tradeoffs between feature sets) what technical
> discussions are all about?

In this case (since I'm the discusser ;) I'd say they're worth discussing 
in individual product descriptions, but not in the meat of the article, any 
more than a discussion of the DOM is relevant to the discussion.

-- Ron

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)