recursion in XML parser

Fri Apr 16 06:01:07 BST 1999

On Thu, Apr 15, 1999 at 07:06:37PM +0100, Kay Michael wrote:
> > My reasoning for not using recursion was performance (function
> > call/stack framing considerations) and that it made the code
> > easier to understand.  
> 
> Did not-using-recursion make it faster? I'd be surprised. The
> superstition that recursion is slow dates back to COBOL and IBM 360
> days, i.e. to machine architectures with very inefficient memory
> architectures and subroutine calls. I don't know much about the Java
> VM, but I doubt it shares those characteristics.

I can't speak for the JVM, but it is far from safe to generalise and
state that a function call is as fast as a stack push, particularly
when the programmer knows exactly what needs to be pushed.

Moreover, modern architectures often penalise you heavily for deep
recursion.  For instance, the SPARC architecture uses register
windowing.  The register window allows access to three register groups
(something like in, local, out).  Passing parameters to a function in
this scheme typically involves setting the out registers as required
and moving the register window down by two groups.  Consequently, when
the subroutine is called, the calling function's out registers become
the called function's in registers.  This technique is blindingly
fast, until you run off the end of the register stack.  When that
happens, a register trap is invoked which saves two complete register
groups out to main memory to free up more space.

In short, when you recurse too deeply and too frequently, you will be
hitting main memory in a big way and performance will suffer
enormously.

In shorter, recursion is not necessarily fast, even today.

> "Easier to understand" is obviously in the eye of the beholder, but
> in my view recursive algorithms are usually far easier to understand
> than their non-recursive equivalents.

I don't think anyone disagrees here, but sometimes performance is far
more important than clarity.

> > It would be interesting to do some benchmarks on various parsers
> > out there to measure performance.  The Java parsers I've tested
> > (Sun, IBM) are _dog_ slow compared to expat, etc.
> 
> How slow is a dog (greyhounds are quite fast)? What kind of factor
> are you talking about? A lot depends on your Java VM implementation.
> My experience is that most of the mill is used in my application,
> not in the parser.

But with Java you suffer (vs. C/C++) across the board, not just in the
parser.

> > For server-side I don't think that matters, since in the corporate
> > scene people tend to just add more servers/infrastructure and not
> > worry about performance.
> 
> Not true when you're serving a million pages a day! Except that in
> that scenario we cache the rendered pages so we don't keep
> re-rendering the same thing. But a factor of 2 in performance is
> definitely worth investing in.

It is more true than ever when you are serving a million pages a day.
Take AltaVista.  Last I heard they had six huge servers dishing out
HTML.  It is often far less expensive to add hardware than to
rearchitect software.

> > Client-side XML is a completely different kettle o' fish tho'
> > since you can't just keep popping in processors every time your
> > machine at home/work bogs down.
> 
> On the contrary: you've got a rather strange client configuration if
> it takes longer to render a page on your machine than to download
> it.

Got some perf. specs on a LAN?  I can easily imagine a Java XML parser
on the client slowing things down noticeably.

Cheers,
Marcelo

-- 
http://www.simdb.com/~marcelo/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)