RFC: "even simpler" C++ XML parser for object hierarchies

Paul Miller stele at fxtech.com
Wed Dec 8 00:37:46 GMT 1999

Thanks for all who have given feedback on my desires for a relatively
atypical parsing idiom for XML. Some of my interest is based on a
proprietary parser I wrote a few years ago, that I've used for
everything since. It's tag-based and object-oriented, and each block of
a document can be parsed as a complete unit. When used to parse
object-oriented data, it lets each object easily handle its own parsing.

Now I'd like to apply the same concepts to an XML parser, used primarily
when object-oriented program data is stored as XML syntax.

I believe the best way to describe what I want to do (and why) is to
show a concrete example. Suppose I have a program that generates images
composed of layers with multiple objects in each layer. Each layer has a
size associated with it as well.

The classes I have are:
	Document (contains one or more layers)
	Layer (contains one or more objects and a Size)
	Object (some type of object)
	Size (an object which represents a width and height)
	Point (x,y value)
	Rect (x1,y1 to x2,y2)
	Circle (type/subclass of Object)
	Square (type/subclass of Object)

Ideally, each object would be able to write out its data in XML form,
and parse its own data (along with a list of attributes if it uses

Here is an example xml file:

<Document name="mydocument">
	<Layer name="background">
		<Object type="circle">
		<Object type="square">

If you think about the object hierarchy associated with this document,
you have something like this:

	contains Layer ("background")
		contains Size (640x480)
		contains Circle (Object)
			Contains Point (320,240)
			Contains float (25)
		contains Square (Object)
			Contains Rect (10,10 - 40,40)

I tend to design APIs from the point of view of the programmer. Since as
the number of classes in my application grows, I want to minimize the
amount of extra code I have to write. So I'd like to simplify the
parsing down to the minimum amount of necessary boilerplate code. So
let's assume that each object has its own Parse() method. This method
gets called with an XML::Element object which has the name and
attributes for that object. Parsing of the entire object should be an
atomic operation.

I use static function pointers as callbacks to avoid having to subclass
from any XML-specific classes. User-data is passed along in the parsing
so we can cast it back to the necessary type in one of the element
handlers. The code is presented in C++ but the parsing operations can
easily have a "C" interface. Exceptions are thrown if anything goes
wrong, so there are no error codes.

Here is the code needed to open the XML file and find the top-level XML

Document *App::LoadDocument(const char *path)
	// specify a handler to look for "Document" elements
	XML::ElementHandler handlers[] = {
		XML::ElementHandler("Document", sParseDocument)
	XML::Input file(path);
	file.Parse(handlers, this);

>From here on out each object is responsible for parsing itself, based on
an XML::Element object that is passed to it. Please examine the code
closely to see the indended design and flow.

// when a Document element is found, it is passed to the sParseDocument
void App::sParseDocument(const XML::Element &elem, void *userData)
	// userData is the App * from the file.Parse() call above
	App *app = (App *)userData;
	// we found a document element, so make one using the attributes
	Document *doc = new Document(elem.GetAttribute("name"));
	// now parse the document
	// if we get here without a thrown exception, the Document parsed
	// okay and we can add it

void Document::Parse(const XML::Element &elem)
	// specify handlers to look for "Layer" elements
	XML::ElementHandler handlers[] = {
		XML::ElementHandler("Layer", sParseLayer)
	elem.Parse(handlers, this);
	// if we needed to do something special, like validating the
	// document, we could do it right here

void Document::sParseLayer(const XML::Element &elem, void *userData)
	// again, userData is the Document * passed in elem.Parse() above
	Document *doc = (Document *)userData;
	// make a new layer
	Layer *layer = new Layer(elem.GetAttribute("name"));
	// parse the layer

void Layer::Parse(const XML::Element &elem)
	// specify handlers to look for "Size" and "Object" elements
	// note that for the Size element we call the Size object's static
	// parse function directly, and we're passing the address of our
	// contained Size member as its user-data, so we do not need to
	// provide an additional static Size handler to forward to the Size
	// object's member Parse() method
	XML::ElementHandler handlers[] = {
		XML::ElementHandler("Size", Size::sParse, &mSize)
		XML::ElementHandler("Object", sParseObject)
	elem.Parse(handlers, this);

void Size::sParse(const XML::Element &elem, void *userData)
	Size *size = (Size *)userData;
	// size has no attributes, just data, so read it directly
	// note that elem.ReadData() reads character data up to the
	// ending element tag and returns the size found
	char tmp[40];
	size_t len = elem.ReadData(tmp, sizeof(tmp));
	tmp[len] = '\0';
	sscanf(tmp, "%dx%d", &size->width, &size->height);

void Layer::sParseObject(const XML::Element &elem, void *userData)
	// again, userData is the Layer * passed in elem.Parse() above
	Layer *layer = (Layer *)userData;
	// make a new object from the object type
	std::string type = elem.GetAttribute("type");
	// I would normally use a factory here but this illustrates the 
	// point better
	Object *obj = NULL;
	if (type == "circle")
		obj = new Circle();
	else if (type == "square")
		obj = new Square();

	// now let the object (whatever type it is) parse itself

So I hope this gets the idea across. I'd be interested in feedback.

Paul Miller - stele at fxtech.com

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list