API thoughts...

Thu Mar 6 16:57:29 GMT 1997

At 2:14 PM -0800 3/5/97, Bill Smith wrote:
>> Well, the Lark event-stream API sure looks & feels like a bunch of
>> callbacks.  You make a Lark object, call its readXML() method with one
>> argument being a Handler object; Handler being a data-less class that
>> just has a bunch of methods called things like doPI() and doStartTag() and
>> doEntityReference() and doText() and so on; you'd normally subclass Handler
>> replacing the methods for the events you wanted to see, and pass in
>> that kind of object.  Lark calls these upon recognizing
>> the constructs in the input stream, passing the byte offset info, the
>> element & entity stack (*if* you're treebuilding), and other currently
>> relevant info.  These methods are all booleans; if any returns true,
>> Lark stops and returns control to whoever called readXML().

I like this boolean approach -- it lets the Handler object take back the
flow of control, pretty easily. If Lark could break PCDATA up when the
buffer stalls, you could easily implement a Browser-style application.

>Another way to do this is to have the Lark object (or interface) define the
>event methods rather than have a separate Handler object. When it's time to
>parse something, create a subclass that overrides the (standard) event methods
>for the Lark object.

I don't like this quite as well for a generic API as I can see the use of
Handler objects that don't know how to parse -- they can be glued to other
event sources to run off of DB engines -- or even broken across a network
to provide and XML event-stream mechanism...

>A possible advantage to this method is that it makes clear the inheritance
>relationship between the "standard" parser and something more specific. It
>is also "easier" to create a more specific parser from an exisiting parser
>object - simply subclass the existing parser and override the methods
>required to provide the desired new functionality.

If the methods in the standard parser don't do something you are interested
in, you still have to do override them all -- and I don't see what default
behavior would make sense other than "do nothing". It seems that you could
get the benefits of having that simply by supplying a predefined Handler
object that has null implementations for its methods.

>The Lark model "hides" the inheritance relationship in the Handler object
>making it necessary to look inside a Lark object to determine the type of
>a given parser (something you might need to do when debugging). An
>alternative is to create a new parser object that contains a subclassed event
>handler. This makes it possible to distinguish the type of parser at the
>"outer" level but requires two new objects instead of one to perform the
>subclass.

The debugging issue is certanly a bit inconvenient. If we use interfaces
rather than classes for the API (almost certainly a good idea), then we can
certainly create a Parser that implements Handler.

>I'm not a parser expert so the subclass model may not make any sense but this
>is a mechanism I have successfully used building other object-oriented
>(including GUI-based) systems. I have also used callbacks but find them most
>useful when forced to use C or other non-object-based languages.

I think that subclassing here just means that I might be forced to pull in
parser baggage (or null methods) when I want to implement a parser-free
event handler or event generator.

   -- David

_________________________________________
David Durand              dgd at cs.bu.edu  \  david at dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)