XML tools and big documents (was: Re: Is there a size limitation on XML file given to MSXSL as input?)
macherius at darmstadt.gmd.de
Tue Sep 1 20:41:21 BST 1998
David Megginson <david at megginson.com> wrote at 1 Sep 98, 12:55:
> Ingo Macherius writes:
> > My afterall impression is that most available tools do well with
> > toy examples, but any input being in the MB range easily blasts
> > them. At least that's true for what came from MS so far.
> I don't think that that's true in general. Most of the Java-based XML
> parsers I've tried seem to be able to handle Jon Bosak's XML Old
> Testament (nearly 4MB) just fine
That's right, but as discussed in some xml list few weeks ago that's
"just" middleware. With few exceptions (e.g. Techno2000) parsers were
> The problem comes if the parser tries to build a tree rather than
> simply reporting an event stream.
How many real world applications will be happy with just the event
stream ? XSL-visualization always needs two trees, the parser tree
and the resulting Formatting Object Tree (FOT). Double impact ! XML-
querys/DOM need to build a transformed versions. Triple impact !
Each processing stage seems to duplicate data over and over. A
possible way out is a shared pool which trees may only point to.
IBM's xml4j goes in that direction with "subtree hashes". And
(surprise, surprise) DOM-processing with xml4j was feasible.
> Depending on the implementation,
> document trees tend to be very large. With a naive tree
> implementation, a 10MB document might use 100MB or more of virtual
> memory for the document tree -- that'll bring most current desktop
> systems to a screeching halt.
IE5b1 needs 28MB for the parse tree of an 0.6 MB document and the
resulting (very simple) JScript generated FOT. "Game Over" happens
if I increase the source document size from 0.6MB to 0.8 MB. Little
change, great effect. I won't even mention the one minute screen
forward, but I wouldn't call them plain dumb. I hope MS does uses a
"naive" implementation in the beta ...
Cruel reality ... XML rules viewed from theoretical point. But I was
beamed from campus right to heavy-duty database research. I'm the XML-
geek, and I'm given database community tasks. Solving them with
today's XML-tools turned out harder than expected.
Ingo Macherius//Dolivostrasse 15//D-64293 Darmstadt//+49-6151-869-882
GMD-IPSI German National Research Center for Information Technology
mailto:macherius at gmd.de http://www.darmstadt.gmd.de/~inim/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev