lex, yacc, and xml

David Megginson ak117 at freenet.carleton.ca
Mon Dec 22 22:10:15 GMT 1997

Ward Harold writes:

 > <question name="why hand code parsers" class="potentially stupid">
 > Why is it that all of the XML parsers/processors I've seen appear to be
 > hand coded rather than generated via lex/yacc or flex/bison? I seem to
 > recall seeing something to the effect that yacc/bison can't handle the
 > class of grammar that XML falls into. Then again I'm not a compiler
 > constructor, opted for the AI sequence in graduate school, so I may be
 > imagining things. Even if there is a technical reason for eschewing
 > parser generation surely the basic lexing and scanning could be done
 > with lex/flex, no?
 > </question>

This is actually a very good question, but I will second most of Tim's
comments.  With Ælfred, I set out to produce an Java-based XML parser
under 20K (I missed by about 6K, but I'm still working on it).  A
hand-crafted recursive-descent parser seemed like the only reasonable
choice, and it turned out to be very fast as well.

In fact, it is not much harder to write a recursive-descent parser
than it is to write out EBNF productions, at least not once you get
into a rhythm and write a few helper methods for lexical scanning
(like "readName()").

All the best,


David Megginson                 ak117 at freenet.carleton.ca
Microstar Software Ltd.         dmeggins at microstar.com

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list