XML parsing memory overhead concerns

Paul Miller stele at fxtech.com
Thu Dec 16 23:00:00 GMT 1999

As I've written before I've been working on a callback-based streaming
XML parser that is sort of DOM-like, specifically for reading
application data from XML files where you know what the object hierarchy
is. At first I tried layering my work over expat to no avail, since
expat uses a push model and I needed a pull model (so a subelement could
parse its subtree by itself).

Now that I am almost finished with the first cut and about to release
it, I was explaining the solution to a colleague who said I should have
tried to build it on expat anyway. The only way I could have done that
and keep the sub-element parsing model that I want is to have expat
parse entire document into one big internal memory buffer. One of the
advantages of my solution is you only need a small file buffer, since
it's streaming. If I have a data file with 100,000 elements in it, I
would need to store the entire file in memory, along with a couple of
extra megabytes of housekeeping data. My colleague said "so what?".

So, here is my plea for feedback about memory usage concerns. My current
solution works as designed and streams into a small buffer, but it only
supports ASCII and doesn't validate, and relies on C-style callback
functions which can require slightly more code. If I wasn't worried
about the memory I could rewrite my design on top of expat (gaining all
of its benefits, and presumably validation in the future), and provide
an optional DOM-like interface (without all the extra DOM mumbo-jumbo).
It would be possible to combine the streaming and the parsing and throw
away the housekeeping data when the elements are no longer needed (such
as when we've moved on to the next subelement tree).

What do people think? Spare the memory and provide a simpler (and
slightly less capable) solution or store the entire thing in memory and
use the nice stuff in expat and give more features?

Paul Miller - stele at fxtech.com

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list