XML tools and big documents (was: Re: Is there a size limitation on XML file given to MSXSL as input?)

Ingo Macherius macherius at darmstadt.gmd.de
Tue Sep 1 20:41:21 BST 1998

David Megginson <david at megginson.com> wrote at 1 Sep 98, 12:55:

> Ingo Macherius writes:
>  > My afterall impression is that most available tools do well with
>  > toy examples, but any input being in the MB range easily blasts
>  > them. At least that's true for what came from MS so far.
> I don't think that that's true in general.  Most of the Java-based XML
> parsers I've tried seem to be able to handle Jon Bosak's XML Old
> Testament (nearly 4MB) just fine

That's right, but as discussed in some xml list few weeks ago that's 
"just" middleware. With few exceptions (e.g. Techno2000) parsers were 

> The problem comes if the parser tries to build a tree rather than
> simply reporting an event stream.

How many real world applications will be happy with just the event 
stream ? XSL-visualization always needs two trees, the parser tree 
and the resulting Formatting Object Tree (FOT). Double impact ! XML-
querys/DOM need to build a transformed versions. Triple impact !

Each processing stage seems to duplicate data over and over. A 
possible way out is a shared pool which trees may only point to. 
IBM's xml4j goes in that direction with "subtree hashes". And 
(surprise, surprise) DOM-processing with xml4j was feasible.

> Depending on the implementation,
> document trees tend to be very large.  With a naive tree
> implementation, a 10MB document might use 100MB or more of virtual
> memory for the document tree -- that'll bring most current desktop
> systems to a screeching halt.

IE5b1 needs 28MB for the parse tree of an 0.6 MB document and the 
resulting  (very simple) JScript generated FOT. "Game Over" happens 
if I increase the source document size from 0.6MB to 0.8 MB. Little 
change, great effect. I won't even mention the one minute screen 
freeze while JavaScript/CSS processing. OK, my scripts are straight 
forward, but I wouldn't call them plain dumb. I hope MS does uses a 
"naive" implementation in the beta ...

Cruel reality ... XML rules viewed from theoretical point. But I was 
beamed from campus right to heavy-duty database research. I'm the XML-
geek, and I'm given database community tasks. Solving them with 
today's XML-tools turned out harder than expected.


Ingo Macherius//Dolivostrasse 15//D-64293 Darmstadt//+49-6151-869-882
GMD-IPSI German National Research Center for Information Technology
mailto:macherius at gmd.de http://www.darmstadt.gmd.de/~inim/
Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Zappa)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list