relation between DOM, SAX, and Schema?

Mark D. Anderson mda at discerning.com
Thu Nov 26 03:31:02 GMT 1998


thanks for the clarification david.

>For streaming information
>(say, from a real-time news feed or an e-commerce server), event-based 
>APIs are probably the only workable choice.

Speaking of which, a month or so ago I was hacking some perl to parse
up a file of xml which in some cases was still being appended to.
I was using XML::Parser, which really seemed to want complete xml
to be parsed (even if it is event based), so I had to slurp in the
file, append a "</root>" to the string, and then parse it.
I still feel unclean.

>SAX could also handle this with an additional metadata event interface 
>without disrupting existing implementations, but the library approach
>seems more flexible.

Yes, in retrospect i can't actually think of a good reason for them
to be joined up. Must have been some after-effect of trying to understand 
SGML groves (http://www.prescod.net/groves/shorttut/). I'm still not
sure whether those give complete access to all DTD info or not; somehow
my brain was not cut out to understand SGML.

I also found a useful message from Ken MacLeod (ken at bitsko.slc.ut.us) to
the perl-xml list, attached here without permission.

-mda

From: Ken Macleod
Subject: Re: Subclassing XML::Parser
Date: Monday, September 07, 1998 4:07 PM
To: perl-xml at lyris.activestate.com

Speaking of styles (this earlier message) and subclassing (the current
thread), and that 2.x ``breaks'' the style that I was using in
XML::Grove, I thought it'd be useful to try to characterize the styles
of interfaces so we have a common ground to talk about.

NOTE: some of these are derivable from the others or mixed styles can
be used, I'm not suggesting any one is ``best''.

In all of these cases, ``caller'' refers to the procedure that
initiates the parsing and receives the events.  These interface
characteristics can generally be applied to both parsers and object
tree traversals.

Generally speaking, we have these types of interface behaviors:

* callback function(s)
  A callback interface is one where the caller passes function or a
set of functions as arguments, and the parser ``calls back'' the
function(s) for each event in the parse.

* callback object or package
  A callback object or package interface is one where the caller
passes an object or a package name and the parser ``calls back'' the
methods on the object or methods in the package for each event in the
parse.

* subclassing the parser
  This is very similar to a callback object interface, except that the
methods called are those of the parser subclass rather than an
receiver object or package.

* event generator
  An event generator is a function or an object that the caller asks
for the next event in the parse, called like this:

   $parser = Parser->new;
   while ($event = $parser->next) { [...] }


With those types of interface behaviors, there are two styles of
arguments:

* an object
  An object event encapsulates the whole event as a single object,
the object may be in a specific class, or a generic object like a hash
or dictionary.

* a parameter list
  A parameter list event passes a list of arguments appropriate for
the type of event, with the type of the event as one of the
arguments.  For a object or package-style callback, the type of the
event is implicit in the method called.


Some common extensions are usually available from the parser directly
or through a helper module.

* tag based
  Callbacks or event generator is based on the name of the tag, rather
than just a generic element.  The event can have either a start/end
flag or there can be different event types for start tags and end
tags.

* conversion to application objects
  The events are gathered to build a complete application object, and
then that application object is passed as the event.

* architectural forms or tag mapping
  This is very similar to tag-based events, except that the tags are
first mapped by a mapping table to another tag set.


Based on these characteristics, here's where I place several current
implementations:

* Larry Wall's XML-Parser-0.0
  A callback interface taking a package name (explicitly as an
argument or implicitly by using `caller') and passing a parameter list
as arguments with the event type implicit in the method name.
XML-Parser-0.0 also supports tag-based events (via the subs package).

* Clark Cooper's XML-Parser-2.x
  A callback interface taking multiple callback functions for event
types, passing a parameter list as arguments with the event type
implicity in the method name.

* David Megginson's SGMLSpm
  An event generator that returns a parse object for each event
generated by the parser.

* David Megginson, et al's SAX
  A callback interface taking an object to receive events and passing
a parameter list as arguments with the event type implicit in the
method name.  SAX is a standardized interface that has implementations
available in Java and Python.

* W3C's DOM
  DOM is an an object tree.  I was going to say it supports an
event generator interface, but it appears to be missing from
the Aug 18 Proposed Recommendation.

* Ken MacLeod's XML::Grove and SGML::Grove
  XML::Grove and SGML::Grove are both object trees supporting a
callback interface taking an object to receive events and passing a
tree object with the event implicit in the method name.  The Grove
modules also support tag-based callbacks.



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list