Roll-Your-Own Parsers (was: Re: What Clean Specs Achieve)

roddey at us.ibm.com roddey at us.ibm.com
Fri Feb 12 19:01:11 GMT 1999




>>There will always be a tradeoff between code size,
>>performance and conformance to the spec.  We have taken the same
approach:
>>for XML which might go outside our environment or some in from outside,
we
>>use a heavyweight parser with full validation.  But where it's "behind
the
>>covers" we use a homegrown (tiny, nonconformant) parser and just check
the
>>structures a few times during design, with a validating parser.
>
>If we could work with parser layers rather than parsers, this might become
>a lot easier to manage.  We could just turn on the parts we need and turn
>off the ones we don't.

We have taken that approach with our 'Version 2" parsers, Java and C++.
They are pretty well layered and pluggable. Don't plug in a validation
handler and you won't do any validation work. Don't plug in an entity
handler, and you won't get any entity information, etc... Basically we've
just extended the concept of a SAX-like handler all the way into the core
of the parser. It allows both for extensibility by rolling your own
handler, and for the client who is putting together a particular type of
parser configuration to tell the lowest level of the parser "do the least
work possible for this group of things, since I'm not even interested".

Though, relative to the original conversation, despite allowing for better
scalability and optimization in the field according to need, it does
increase the complexity of the parser itself in some ways.

BTW, the C++ version should hopefully hit Alphaworks before too much
longer. We are on our 4th or 5th internal release and the next weeks will
be 'making the last details work' part of the effort. I can't say when it
will get out there, since I dunno about such things (I'm just the measly
author :-), but it should be relatively soon. In terms of the external
interfaces to the client code, it pretty much matches our version 2 Java
parser architecture, though internally it is quite different.



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list