RFC: "even simpler" C++ XML parser for object hierarchies

Stephen R. Savitzky steve at rsv.ricoh.com
Wed Dec 8 01:31:27 GMT 1999


This is basically a traditional top-down, recursive-descent parser.
Unfortunately, it's completely different from the way most XML parsers I've
seen work, although I believe there's a lexical layer underneath expat that
can be made to work this way.

But there's better way of looking at the situation, namely that what you
really want to do is make a top-down traversal of the document's parse tree.
In other words, at any given position in the tree, you want to do pseudocode
like

// process a <foo> element.
Foo::process(const XML::Element &elem) { 
   // do the setup
   for (XML::Node *node = elem.getFirstChild();
	node != null;
	node = node->getNextSibling()) 
   { 
	processChild(node); // dispatch on node's type & tag
   }
   // do the cleanup
}
 
This works as-is if the result of your parse is a DOM tree or some
equivalent parse-tree representation of the document, but trees take memory.
So the next step is to use a parser that looks like a tree traverser:

// process a <foo> element.
Foo::process(TreeTraverser &it) { 
   // do the setup, using it.getAttrList(), etc. on the current node
   if (it.hasChildren()) { 
       for (it->toFirstChild(); !it.atEnd(); it.toNextSibling()) { 
	   processChild(it); // dispatch on new current node's type & tag
       }
       it.toParent();  // go back up the tree
   }
   // do the cleanup
}
 
Note that if your parser has this interface, you may never have to actually
build the whole tree.  Similarly, you can output to a ``tree constructor''
that merely appends characters to a string.

We've built a document-processing system (currently in Java) using this kind
of interface; you can find it at  <http://RiSource.org/PIA/>.

-- 
Stephen R. Savitzky  <steve at rsv.ricoh.com>  <http://rsv.ricoh.com/~steve/>
Platform for Information Applications:      <http://RiSource.org/PIA/>
Chief Software Scientist, Ricoh Silicon Valley, Inc. Calif. Research Center
 voice: 650.496.5710  front desk: 650.496.5700  fax: 650.854.8740 
  home: <steve at theStarport.org> URL: http://theStarport.org/people/steve/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list