Benchmark of 6 XML parsers on Linux

Sun May 9 21:51:25 BST 1999

(Not sent to the Perl list.)

* Dave Winer
|
| Interesting article! Now I'm curious to know why Perl, Python and
| Java are so much slower than the C parsers? 

For one thing, the Python application seems to spend somewhere around
40 % of its time counting UTF-8 characters. If the character-counting
code were to be replaced by a C Unicode implementation (either that
of Fredrik Lundh or the one by Martin von Löwis in the XML-SIG package)
Python would show much better performance. (I'd send in a version that
did this if I had the time to spare.)

Another thing is that although Perl and Python both use a C parser (in
this benchmark) calling from C into the interpreters is slow, and you
have to do that once for each element as well as once for each and
every piece of text.

Since the example application also performs a fair bit of work most of
the time spent is probably spent in the application code and not in
the parser.

| FWIW, the parser built into Frontier is fully native. No script code
| executed when parsing XML.

Does this mean that Frontier doesn't have a callback mode? How do you
deal with huge documents, then?

--Lars M.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)