RFC: "even simpler" C++ XML parser for object hierarchies
Stephen R. Savitzky
steve at rsv.ricoh.com
Wed Dec 8 01:31:27 GMT 1999
This is basically a traditional top-down, recursive-descent parser.
Unfortunately, it's completely different from the way most XML parsers I've
seen work, although I believe there's a lexical layer underneath expat that
can be made to work this way.
But there's better way of looking at the situation, namely that what you
really want to do is make a top-down traversal of the document's parse tree.
In other words, at any given position in the tree, you want to do pseudocode
like
// process a <foo> element.
Foo::process(const XML::Element &elem) {
// do the setup
for (XML::Node *node = elem.getFirstChild();
node != null;
node = node->getNextSibling())
{
processChild(node); // dispatch on node's type & tag
}
// do the cleanup
}
This works as-is if the result of your parse is a DOM tree or some
equivalent parse-tree representation of the document, but trees take memory.
So the next step is to use a parser that looks like a tree traverser:
// process a <foo> element.
Foo::process(TreeTraverser &it) {
// do the setup, using it.getAttrList(), etc. on the current node
if (it.hasChildren()) {
for (it->toFirstChild(); !it.atEnd(); it.toNextSibling()) {
processChild(it); // dispatch on new current node's type & tag
}
it.toParent(); // go back up the tree
}
// do the cleanup
}
Note that if your parser has this interface, you may never have to actually
build the whole tree. Similarly, you can output to a ``tree constructor''
that merely appends characters to a string.
We've built a document-processing system (currently in Java) using this kind
of interface; you can find it at <http://RiSource.org/PIA/>.
--
Stephen R. Savitzky <steve at rsv.ricoh.com> <http://rsv.ricoh.com/~steve/>
Platform for Information Applications: <http://RiSource.org/PIA/>
Chief Software Scientist, Ricoh Silicon Valley, Inc. Calif. Research Center
voice: 650.496.5710 front desk: 650.496.5700 fax: 650.854.8740
home: <steve at theStarport.org> URL: http://theStarport.org/people/steve/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list