Storing Lots of Fiddly Bits (was Re: What is XML for?)

Tue Feb 9 01:08:46 GMT 1999

On Thu, Apr 08, 1999 at 12:17:24PM -0500, Paul Prescod wrote:
> Marcelo Cantos wrote:
> > 
> > ... [best of both worlds] ...  You get a nice object oriented
> > layer on top to talk to, and an industrial strength, robust
> > repository underneath.
> > 
> > Your comments give me the impression that this is unacceptable to
> > you in the XML/heirarchical universe.  You don't want DOM at any
> > level.  You insist on going straight to objects.  It is not even
> > good enough to build an object layer on top of the DOM layer.  I
> > find this a little implausible and hence am certain that you had
> > something else in mind.  Is it rather that you simply don't care
> > what the underlying API is, that you are only interested in what
> > happens at the object level?
> 
> If I had evidence that a bottom-level XML/"DOM" layer would "buy me"
> an industrial strength, robust repository then I would go for it. As
> you have pointed out, I can cover up the ugliness with objects. But
> to me, an industrial strength, robust repository implies
> sophisticated tree-smart *and* link-smart ad hoc query support. The
> DOM isn't a query language and doesn't (AFAIK) have a query
> interface. It might be okay as an API to the results of a query but
> even there I'm leery...

I agree with all this.  If you're dealing with objects, go with OODB.
I think, however, that the situation is far less clear when we are
dealing with pure data structures as opposed to first-class objects
with behaviour.  When it comes to maintaining and querying a large
database of _data_ (not objects), I believe a text retrieval engine
will generally outperform an object database and often by several
orders of magnitude (witness Eliot Kimber's anecdotal post).  If
scalability and performance are an issue (and, judging by recent
discussions, they often are) then text retrieval technology becomes
much more attractive.

Object databases excel in the area of expressiveness which enables
them to support much more complex queries than we can.  At present,
our product (SIM) doesn't support ad hoc queries.  It is more like a
relational database in that you define fields, which can be physical
fields or calculated fields (this means we support arbitrarily complex
structure, but have to decide in advance which set of queries to
support, a compromise that has kept our customers happy so far).  We
are, however, looking at full structure queries in the near future.

So while the IR community is closing the gap in the area of
expressiveness, I wonder if the Object community can catch up in the
area of performance (or maybe it's already there and I just don't know
it).

> Since trees can be built as a special case of links, I tend to look
> for such a beast to come out of the OO world (where links are
> usually primary) instead of the text processing world (where the
> tree is usually primary).  Maybe you guys at rmit.edu can surprise
> me though.

We certainly hope so.  Our customers constantly praise the performance
of SIM.  However, we definitely see a strong need to beef our product
up in the standards area.  We are looking into support for XQL and DOM
(we have the framework to incorporate both without too much effort.
In fact DOM is almost in since it is quite similar to our existing
model.  XQL is somewhat more effort, but the path indexing required to
support multi-gigabyte queries would require little effort--the hard
part is query evaluation and, more importantly, optimisation).

> But note that a DOM-on-the-bottom is the opposite of the
> architecture that I am speaking out against. I'm concerned about
> people who want to layer the DOM on "top" of things that do not look
> substantially like XML. In that case you are covering up an
> optimized, purpose-built abstaction with a homogenized "dumb tree"
> layer. That's a step backwards. Note that even the DOM creators do
> not view an XML-DOM as a "universal tree API." That's why there are
> several variants of the DOM -- for XML, HTML, CSS etc.

I must conclude from this that we have little to disagree about in
terms of the uses for DOM.  I had misunderstood you to mean that DOM
is _never_ appropriate for the bottom layer, and I, coming from the
document repository universe, would have disagreed.  Having said that,
however, we tend to view DOM more as a box ticking exercise, since it
doesn't really give SIM anything it doesn't already have, albeit in a
non-standard way.

My views on Object databases are ambivalent.  Their highly expressive
nature seems unfortunately coupled with poor performance.  However, my
opinion may be skewed by the very few attempts I've personally seen at
piggy-backing a text retrieval engine on an Object database (or, for
that matter, on a relational).

Cheers,
Marcelo Cantos

-- 
http://www.simdb.com/~marcelo/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)