Frontier as a scalable XML repository (was Re: Is XML dead already or what?)
Liam R. E. Quin
liamquin at interlog.com
Sun Jan 31 06:18:18 GMT 1999
On Sat, 30 Jan 1999, David Megginson wrote:
> I said that there is not a good, scalable XML repository available
> right now.
I have been involved in several attempts to build SGML or XML storage
systems using databases.
Performance is always a big issue.
With an rdbms you have the interesting notion that the database
understands neither sequence nor containment directly. You also
have a difficulty with text retrieval, especially if you need it
to span element <kw>boundaries</kw> and find "element boundaries"
twice in this paragraph, yet still treat the "kw" element as a
separately indexed object.
On the other hand, the relational database vendors have worked very
hard at peformance, and you get lots of benefits built in, such as
journalling, rollback, backup, standard texts on SQL... the whole bit.
Object oriented databases *do* understand sequence and containment,
and understand it very well. But they don't have a standard query
language. OQL is not implemented very evenly yet, and when it is, it
has restrictions.
We (at Groveware) found it difficult to represent a query such as
find an elementNode with .name = "P"
containing a childList
containing an object of type elementNode
with 'name' = "kw"
containing a childList
containing an object of type cdata
where strcmp(content, "boundaries") == 0
both in OQL and in Object Design's non-OQL query language.
One would like to say
find <p> containing <kw> containing "boundaries"
and have that be efficient, and that's a real challenge.
The OLAP people and the text retrieval people are probably best
placed to handle large quantities of XML, if the text retrieval people
can manage to swallow the word "dynamic" and the OLAP people can
get a grip on text retrieval :-)
Speaking of which, I am working on my C/Unix text retrieval package
(lq-text) again, hoping to add some XML support soon.
But I digress.
If you haven't seen an OODB system that scaled well, it may be (I am
speculating) that the generic systems are too slow because query
optimisation is still too hard; the application-specific ones are less
visible, and probably perform very well.
Lee
--
SGML/XML consulatant, Toronto, Canada -- liamquin at interlog.com --
http://www.interlog.com/~liamquin/
also Director of Development, Groveware Inc, http://www.groveware.com/~lee/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list