ANN: XML and Databases article

Fri Sep 10 19:59:43 BST 1999

[Ron Bourret:]

> What other common functions can you perform besides addressing / 
> hyperlinking?

Hmmm.  

Anything that you can do with markup -- and that covers a lot of
territory.  Especially, anything that you can do with markup, but
can't, for one reason or another, use markup to do.  (E.g., you can't
change what you want to mark up, or what you want to mark up can't be
marked up without making it useless for its intended purpose.)

A better question might be, "What *can't* you do if you have the
ability to attach any information at all, for any reason whatsoever,
to any other piece(s) of information, without changing any information
except for creating the attachment instructions (which can be
completely separate from everything else), in such a way that an
application can always tell what is attached to what?"

> > Well, actually, there's probably a one-to-one correspondence between
> > property sets and database schemas.  In order to address information
> > in terms of its structure, you have to know the structure.  In
> > grove-land, the structure is defined by a property set.  Different
> > databases have different structures, normally expressed as database
> > schemas.  Making a database look like a grove is very straightforward.
> > The bulk of the work is translating the schema into a property set
> > (which is, after all, a kind of schema).  There's a bit of coding
> > involved, too, but the GroveMinder developer kit has tools that make
> > this amazingly easy.  (At least the Lockheed-Martin people were
> > amazed, and they said so publicly at XML '98.)
> 
> What do you mean by "different databases" here?  If you mean
> relational databases v. hierarchical databases v. object-oriented
> databases, then there's no problem -- I would expect each to have a
> different property set.  On the other hand, if you mean DB2
> v. Oracle v. Informix v. SQL Server, then it seems there is
> something broken -- I would very much expect all relational
> databases to have the same property set.

Let me define some terms here so it will be less confusing:

database: a bunch of data, such as the Accounts Receivable of XYZ
          Corporation as of 6:00 am, Greenwich Mean Time, on January
          1, 1999.  Databases can have a variety of genres and
          schemas, and their management is generally facilitated by
          some sort of DBMS.

database management system (DBMS): some specific software for storing
          and retrieving databases and components of databases, such
          as Oracle 8i, ObjectStore, etc.  When I say "database", I
          try not to mean "DBMS", although I make this mistake a lot.

database genre: a distinct but broad database paradigm, such as
          "relational", "hierarchical", "object-oriented", etc.  When
          I say "database", I try not to mean "database genre",
          although I make this mistake a lot.

database schema: the formal expression of the structure of a
          particular database.  (Not the structure of a DBMS, and not
          a database.)

When I said "different databases", I meant "different databases" --
different bunches of data.  The data found in a particular field is an
example of a "component" of a database.  A particular table is a
component of some particular database.

It's true, what you said: you can have a property set for relational
databases in general.  I (confusingly) skipped that step, assuming
that we would be making a property set for accessing the semantic
aspects of databases in terms of their schemas, so that the properties
of the database get their names from the field definitions in the
schema, etc.

What you're suggesting (i.e., that the properties of all relational
databases can be represented with a single property set) is very
similar in spirit to the SGML Property Set.  The SGML Property Set is
not configured for a particular DTD; it works for all DTDs.  It knows
nothing about the semantic properties of any particular DTD's
vocabulary.  Instead, the DTD is regarded as one of the properties of
an SGML document.  We could do the same thing for all relational
databases, by regarding the schema as one of the properties of the
database.

But I think that it's often attractive, in terms of software
re-usability and in terms of the reliability of information
interchange, to create property sets for the information sets of
specific DTDs (or "vocabularies"), in addition to using the SGML or
XML property set.  In the case of databases, in the same way and for
the same reasons, it's attractive to create property sets for
particular schemas, *in addition to* the generic SGML Property Set
(or, someday, the XML Property Set) and the generic Relational
Database Property Set.  When an XML document is processed, you get a
"primary XML grove", plus as many groves as there are vocabularies
used in that document, each with values for the properties defined for
that vocabulary.  Similarly, when a grove interface is provided to a
relational database, you get a "primary relational database grove",
plus (a) grove(s) for the schema(s) that govern that database.

> Saying that a grove is a database strikes me as a bit misleading, at
> least in the database world.

I take your point.  I was trying to show that in grove-land it's all a
matter of perspective whether something is a "document" (a word that
connotes a lump of interchangeable data that normally must be read and
processed before anything useful can be done with the information it
contains) or a "database" (a word that connotes a bunch of data whose
interconnectedness and accessibility exist in a state of full
application-readiness).  Groves offer both perspectives at the same
time, which is why they allow applications to ignore the difference
between documents and databases.

> What I meant to do here was distinguish between the database that
> GroveMinder uses to store groves and other databases that might be
> external resources, such as that Informix database run by Billy Bob
> over in Engineering. (Granted, GroveMinder's database can
> undoubtedly be treated in a fashion similar to Billy Bob's database,
> but that's getting a bit self-referential and misses the point that
> GroveMinder runs just fine without Billy Bob's database but can't
> run without its own.) I'm also assuming that the database used by
> GroveMinder is configured for grove storage, although this might not
> be such an earth-shaking amount of work -- I'll take a guess that
> the class definitions are fed pretty much directly into the database
> as schema.

Yes.  When all is said and done, GroveMinder runs as a native
application of the DBMS in which the groves persist.  A lot of the
GroveMinder technology is devoted to making GroveMinder portable.

> What I meant here was whether you could perform an operation similar to 
> embedding an Excel spreadsheet in a Word document -- that is, can I 
> (easily, generically, and without modification of property sets) combine 
> information from different groves into a single grove.

The answer is, "Yes, iff your property sets allow it."  Since you can
write your own property sets (and, in the case of GroveMinder, your
own minders), the short answer is "Yes".

> This would be extremely useful because it would be a step on the way
> towards being able to query the entire enterprise with requests such
> as, "Get me the names, addresses, and company prospectuses of all
> customers that I've sent email to in the last three days". The names
> and addresses come from the corporate database, the email
> information comes from my email, and the prospectuses come from a
> document database somewhere or perhaps the Web. The result could be
> navigable by a generic grove navigation tool.

Ah.  As you say below, you don't need to combine groves to do that.
All you need is a document (grove), such as a HyTime document (grove)
that addresses all these things in all these different groves (the
corporate database grove, the email groves, and the document database
grove or Web grove), so that it returns to your application an
amalgamated node list of what you want.

> My guess is that groves right now give me this functionality by
> hyperlinking one grove to the next.  This is fine in some cases (a
> grove-based query tool), not in others (wanting to expose tabular
> database data as XML).

What advantage does converting the nodes of a tabular database into
XML nodes offer, over accessing the nodes of the tabular database
directly?

> (Note that when the Word document is persisted, the result isn't
> really a "Word" document, it's (if I've got my terminology right),
> an OLE Compound Document. When you get to the point where the
> spreadsheet is, Word no longer has a clue how to process
> it. Instead, there's a flag that says, "Yo! Go start Excel" and
> processing is handed over to Excel. A similar situation would be
> reasonable in a "compound" grove. It could easily be persisted to a
> grove database, which understands groves, but couldn't readily be
> persisted as XML or a database without conversion software.

Yeah, something has to copy the nodes from one grove (say, the Word
grove) to a grove that is an XML grove, and then export the XML grove.
But I don't see how this improves one's access to the information in
the Word grove.  (I can see how it improves access by people who have
XML applications but not Word applications, but if you already have a
Word grove, making it into an XML grove offers little advantage that I
can see, unless you just need XML because you need XML.)

> > You're right that one application of GroveMinder is data transfer
> > middleware.  The conversion program is comparatively easy to write,
> > since everything already conforms to the same object model.
> 
> I'm not quite sure I understand this. Above, you've just said you
> need to make decisions about how database node classes are converted
> to XML node classes, which makes sense to me. What do you mean by
> "everything already conforms to the same object model"?  As far as I
> can tell, groves allow everything to be expressed as objects (which
> have a few common properties), but these objects are no more the
> "same object model" than a model for a person is the same as a model
> for a book.  Groves may have gotten me from some other format
> (notation) into an object model, but whether converting one object
> to another is easy depends on the objects themselves.

I think what I'm saying is confusing only because it is too obvious.
When I said "object model", above, I wasn't talking about particular
classes of objects; I was talking about "what an object is" (or, "the
virtual base class to which all objects belong, regardless of their
class").  Fundamentally, the grove paradigm's object model is:

                  Nodes (objects) consist of named properties, and
                  values for those named properties.

As object models go, it's a pretty simple one.  Most object models
also allow objects to have methods, but the grove paradigm's object
model doesn't have methods.

> In this case (since I'm the discusser ;) I'd say they're worth
> discussing in individual product descriptions, but not in the meat
> of the article, any more than a discussion of the DOM is relevant to
> the discussion.

I can't argue with that.  Thanks for the stimulating discussion, Ron.

-Steve

--
Steven R. Newcomb, President, TechnoTeacher, Inc.
srn at techno.com  http://www.techno.com  ftp.techno.com

voice: +1 972 231 4098
fax    +1 972 994 0087
pager (150 characters max): srn-page at techno.com

3615 Tanner Lane
Richardson, Texas 75082-2618 USA

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)