XML tools and big documents

Ingo Macherius macherius at darmstadt.gmd.de
Thu Sep 3 02:45:41 BST 1998

David Megginson <david at megginson.com> wrote at 1 Sep 98, 16:57:
> I do not need to build a tree for the whole document; instead [...] 
> dump it into my SQL database [...]. 
=> Put it into an RDBMS

> [...] it makes more sense to build the specialised object tree
> directly from the event stream rather than building a DOM tree
=> Put it into an OODBMS

"Michael Kay" <M.H.Kay at eng.icl.co.uk> wrote at Wed, 2 Sep 1998 10:31:41 +0100:
> [...] storing the Java serialization of DOM-like models on disk [...]
> takes a lot longer than reparsing original XML
=> Put it in a file and reparse

So when it gets big, use a database ? Did I get this wrong and XML 
was never ment to be a storage paradigm ? 

Anyway, I can affirm Michael's results.
We implemented an experimental database storage for SGML with jjc's 
SP and Informix's IUS. It generalizes something similar to David's 
second suggestion. Object-aggregation is done by marking the content 
of specified element types (e.g. <act> in a Shakespeare play) to be 
stored unparsed. When it comes to queries it is reparsed on the fly. 
Kind of automatic object generation.
Queries turned out to become slow when granularity gets less coarse. 
Most navigations trigger child/sibling lookups, which trigger object 
ID table lookups. That's at least one SQL statement firing for every 
DOM navigation call. Caching helps, but doesn't really the problem. 
Trees in RDBM are no fun. Michael writes they are no fun in OODB, 
too. IMHO the good timings in in-memory DOM implementations result 
from the fact that looking up children is a cheap operation. In 
current DB systems it's not cheap at all.

Is anybody aware of literature for efficient addressing in trees ? 
This should help both in-memory DOMs and DBs.

A bit disillusioned,

Ingo Macherius//Dolivostrasse 15//D-64293 Darmstadt//+49-6151-869-882
GMD-IPSI German National Research Center for Information Technology
mailto:macherius at gmd.de http://www.darmstadt.gmd.de/~inim/
Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Zappa)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list