XML Memory Requirements (was Re: Feeling good about SML)

Thu Nov 18 00:24:23 GMT 1999

At 05:55 PM 11/17/99 -0500, David Megginson wrote:
>Again, the big problem in creating a tiny XML processor is the
>required error reporting for illegal characters in names, attribute
>values, etc. -- if you build lookup tables, you're looking at an
>enormous amount of memory for each table.

Actually, it's not so bad as all that.   The trick is this: there are 
only a few interesting states, of which you're in one of these 3 most of
the time:

 - processing character data
 - processing a name (e.g. element type or attribute name)
 - processing the first character of a name

For the first, can check the small number of excluded regions with
an if statement.

For the others, you have a (relatively) small array of pairs of 
numbers being the low-high edges of legal ranges, then you binary-search
them.  In Lark, the class that does all the work is only 3.5K (the
arrays are compressed in the class file, so maybe 10* times in memory).

The performance is good enough that in Lark, the limiting factor was
my lousy input buffering, not the character class checking; and Lark
was always in the top half of the performance table. -Tim

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)