lex, yacc, and xml

Peter Murray-Rust peter at ursus.demon.co.uk
Tue Dec 23 00:21:19 GMT 1997

At 17:07 22/12/97 -0500, David Megginson wrote:
>Ward Harold writes:
> > <question name="why hand code parsers" class="potentially stupid">
> > Why is it that all of the XML parsers/processors I've seen appear to be
> > hand coded rather than generated via lex/yacc or flex/bison? I seem to

I think there are also historical reasons :-). Most (but not all) of the
XML parsers have been written in Java and the lex/yacc functionality wasn't
as fully available in Java.  (As Tim says, Norbert has used JACC - which is
the "right" way to do it, but it does generate a large amount of Java
code/classes. This is fairly impenetrable, so in the absence of an agreed
API it isn't easy to integrate into other applications if you want access
to things not in the API.

Another reason was that some early constructs in the languages (especially
involving Parameter Entities) were difficult for some humans and machines
to interpret :-). The current approach to PEs is considerably simpler and
(I assume) can be lexed and yacc'ed OK.

In my view the difficulty in writing a parser is not the BNF or recursive
descent (even I have partially written one of those) but agreeing on what
the semantics are in various cases :-)

Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list