SAX drivers bug ... or feature !

Tyler Baker tyler at infinet.com
Sat Nov 21 21:54:46 GMT 1998


Toivo Lainevool wrote:

> ---david at megginson.com wrote:
> >
> > Depending on the virtual machine, this could be a killer.  Remember
> > that a medium-sized XML document (such as a book) might have 10,000
> > elements: that would mean an extra 10,000 attribute lists allocated
> > and then garbage collected in what should be only a few seconds of
> > parsing.
> >
>
> If your worried about the performance of the parser, just setting the
> attributeList to null would be faster than doing the
> AttributeListImpl::clear() which would cause a removeAllElement() on
> each of the underlying member vectors.  If your cranking away with the
> parser, chances are the low priority gc task wouldn't be fired while
> your doing this, unless you hit your memory limit.
>
> If your worried about memory space,  the clear() and resulting
> resulting removeAllAttributes() would allow you to reuse the
> AttributeListImpl and Vector objects, but the removeAllElements just
> releases their hold on the underlying String within the vectors,
> meaning that the Strings, which I assume would count for most of the
> memory would be left hanging around for the gc to free anyway.
>
> So which of these approaches would result in a more optimized parser
> would highly depend on the size of the document, the amount of memory
> you have available, and the gc algorithm your VM uses.

Simply put, the number one killer in XML parsing as well as application use of
XML data is creation of temporary handler objects.  In the native interface of
the parser we have, we have utility routines for our CharacterData interface (as
well as AttributeList interface) for parsing raw booleans, integers, base64
content mainly because the java.lang.Integer utility routines only accept Strings
)-:

For some applications which are performance sensitive, creating the temporary
String object needed to call Integer.parseInt(String s) can really bog things
down.

I feel and many others seem to feel that XML Parsers are in the same league as
I/O libraries in terms of their need to be optimized as best as possible,
especially for server environments.  That includes making the parser itself fast
in tokenizing the content as well as making the handling of the parsed content to
the application as fast as possible.  From the application developers perspective
what is important in a component like an XML Parser is that first the component
is fast, and second that the code you have to write to use the component is not
slow, something unfortunately many tools vendors often neglect.

Tyler


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list