Simple approaches to XML implementation

Thu Mar 6 13:54:54 GMT 1997

>>class XMLParser {
>>...
>>parser(XMLEventHandler handler);
>>...
>>}
>
>That's one way of doing things.  The main problem I see with this interface
>is that there are quite a few possible methods (I count 71 classdefs in
>the SGML property set, though of course not all of those are applicable to
>XML), and it becomes difficult to expand the set of events.

I use about 8 event handlers for most of my API's...

>As much as possible, a good reusable component should not force the
>user's hand when choosing what node to grab onto.  As an example,
>YACC is pretty bad about this.  You supply it with a lexer (with a
>fixed name) and a set of handlers to be called when productions are
>reduced.  The YACC-generated parser insists on being in charge.

Sure. The important thing with is that if you want to query into
a document, you have to have parsed at least as far as the nodes you
want to access, and that haveing a tree representation for such cases
makes it a *lot* easier. For cases where you "want to be in control",
I would have the event handler be a grove constructor, and have the
application work upon the grove. Note that accessing a grove, or
querying a document is *different* to *parsing* a document.

>1. An external entity manager, responsible for obtaining document
>   instances (the "start" document and others), DTD's, etc. from
>   local storage, the web, some database, etc.  This should probably
>   be user-customizable.

I'm not sure about this. In some ways, I cannot see the reason for
*exposing* an entity manager, but then again, I can imagine an
implementation without one either....

>2. An encoding manager, responsible for mapping one of the possible
>   XML document encodings (Latin-n, UTF-7, UTF-8, UCS-2, UTF-16, whatever)
>   onto ISO10646 characters.

Streams...

>3. The parser itself, responsible for turning characters into XML events,
>   and possibly into grove structures.

Push grove building off to later stages.

>[Browser] gives the most complicated parser, since it has to asynchronously
>handle information from several different documents.
>
>[YACC] is the easiest to write, but it's less flexible.  Given [Browser],
>it's easy to write [YACC].  (Given [XMLEventStream] you can also derive
>[YACC], but with greater overhead.)
>
>[XMLEventStream] and [Grove] give you the most flexibility with respect to
>the grove plan.

I think these confluge many different processing layers.

>languages, but the only firm conclusion I've come to is that I really wish
>I could use coroutines.

Amen to that sentiment.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)