Simple approaches to XML implementation

Peter S. Housel housel at
Thu Mar 6 04:34:54 GMT 1997

Gavin Nicol (gtn at wrote:
>I would tend toward an event-driven interface, and an
>option-setting interface as the core parser API. For example:
>class XMLEventHandler {
>public boolean OnComment(String comment);
>public boolean OnElementStart(...)
>class XMLParser {
>parser(XMLEventHandler handler);

That's one way of doing things.  The main problem I see with this interface
is that there are quite a few possible methods (I count 71 classdefs in
the SGML property set, though of course not all of those are applicable to
XML), and it becomes difficult to expand the set of events.

There's also the issue of "who's in charge?"  This is actually a tough
issue.  I like the way P.J. Plaugher put it in Programming on Purpose:
when you're designing the program's architecture, first you draw a graph
of nodes, with arrows showing the flow of information from subsystem to
subsystem.  Then you grab a node and shake the graph.  What you get is
your call graph, with the main processing loop located in the node you
shook, making requests to the other subsystems.

As much as possible, a good reusable component should not force the
user's hand when choosing what node to grab onto.  As an example,
YACC is pretty bad about this.  You supply it with a lexer (with a
fixed name) and a set of handlers to be called when productions are
reduced.  The YACC-generated parser insists on being in charge.

If all of today's popular languages had coroutines, we wouldn't have
this problem.  Every component could be written as if it were in
charge.  Unfortunately, most languages don't have a portable coroutine

For an XML document parsing system, the components we need to
consider are:

1. An external entity manager, responsible for obtaining document
   instances (the "start" document and others), DTD's, etc. from
   local storage, the web, some database, etc.  This should probably
   be user-customizable.

2. An encoding manager, responsible for mapping one of the possible
   XML document encodings (Latin-n, UTF-7, UTF-8, UCS-2, UTF-16, whatever)
   onto ISO10646 characters.

3. The parser itself, responsible for turning characters into XML events,
   and possibly into grove structures.

4. The user's application.

As far as I can see, we have the following scenarios:

* [Browser] If you're building a web browser, you want the network
  to be in charge.  That is, when a packet's worth of document/DTD/whatever
  data comes in from the net, the parser should use that to parse as much
  the document as it can, and pass as many events on to the application as
  possible.  This gives optimal user response, provided you don't need the
  whole document to start displaying it.  The external entity manager would
  have a callback for requesting additional external entities, that would
  the request to an internal queue and return immediately to the parser.

  In this architecture, the user would create a parser object by specifying
  an external entity manager callback, a set of parser options (grove plan,
  validate or not, etc.), and an XMLEventHandler like the one shown above.
  Then your external entity manager would send a message to the parser
  giving it a buffer full of bytes and an indication of which entity they
  belong to.

* [YACC] You may want the parser to be in charge, like YACC.  In this case
  you would call the parser, specifying the external event manager object
  (written using the Strategy pattern), list of options, and an
  XMLEventHandler object (which corresponds to the Builder pattern).

* [XMLEventStream] You want some part or another of your application to be
  in charge, and you want a stream of XMLEvent objects.  In this case, you
  create a parser object (XMLEventStream), specifying an external entity
  object, a start document, and a list of options.  You send a message to
  object whenever you want another event from the stream.

* [Grove] You want to access nodes in a grove.  So, you pass in your
  start document, your start document, and your options, and you get a root
  node back.  The parser might construct the whole grove, or do it lazily
  you ask for a property that hasn't been computed yet.

These scenarios assume that the document(s) are stored in ordinary files or
on the web.  As Peter Newcombe pointed out, another scenario is when the
document is stored in a database, possibly in grove form.  In this case
being able to specify an entity manager probably isn't desirable, and the
[Browser] scenario probably doesn't fit at all.

So, which of these scenarios do we want to specify for an XML API?  Should
all of them be?  Should [Browser] be one of the ones included?

[Browser] gives the most complicated parser, since it has to asynchronously
handle information from several different documents.

[YACC] is the easiest to write, but it's less flexible.  Given [Browser],
it's easy to write [YACC].  (Given [XMLEventStream] you can also derive
[YACC], but with greater overhead.)

[XMLEventStream] and [Grove] give you the most flexibility with respect to
the grove plan.

Hope this helps to clarity the issues a little.  I've been thinking about
this for awhile, in the context of reusable parser components for
languages, but the only firm conclusion I've come to is that I really wish
I could use coroutines.


xml-dev: A list for W3C XML Developers
Archived as:
To unsubscribe, send to majordomo at the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at

More information about the Xml-dev mailing list