nestable C/C++ XML parser?

Thomas B. Passin tpassin at idsonline.com
Wed Dec 8 13:24:23 GMT 1999


----- Original Message -----
From: Paul Miller <stele at fxtech.com>

> I'm trying to develop a tag-based front-end to expat and having no luck.
> I'd like to be able to parse an XML document in nestable chunks, by
> calling into a nestable parser. In other words, I'd like to start
> parsing, then branch to a function to handle a specific element, parsing
> in there until that element is closed, then fall back out of the
> function to continue parsing the rest of the document.
>
I take it that you want to be able to ignore part of the doument, and only
process the pieces you are interested in.  Is that right?  Then each piece
would be valid XML if it were enclosed in a root element.  You don't need to
literally do what you have suggested. That is, "parse in there...".  You do
need to parse handle the elements of different pieces differently.  Three
approaches come to mind.

1) Preprocess to extract just the pieces you want, wrap them in root
elements so they are complete documents, then run expat (or whatever)
separately on them using SAX. The preprocess should be fast and easy, and
perhaps could be done using regular expressions, or SAX.  Alternatively, if
the xml is relatively simple, don't wrap the fragments, and process them
using regular espressions insstead. (Search this archives of this group for
the last few months to find a reference to "shallow parsing using regular
expressions").

2)  You really are talking about a state machine, I think.  That is, if you
have reached the right piece of the document, you go to a different manner
of handling the elements (they will still parse the same, it's just the
handling that would be different).  So you could explicitly maintain a state
variable and have the SAX (or whatever) callbacks behave differently
according to the state.  This would be conceptually simple but might be a
pain to implement depending on how many different element handlers you will
use.

3) Again as a state machine, you could use a function pointer to specify the
callbacks, and when you change state you change the function pointers to
point to different handlers.  I don't know whether you would have to modify
expat to do this or not, but changes should be minor if needed.

Regards,

Tom Passin



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list