Simple approaches to XML implementation
Gavin Nicol
gtn at ebt.com
Thu Mar 6 13:54:54 GMT 1997
>>class XMLParser {
>>...
>>parser(XMLEventHandler handler);
>>...
>>}
>
>That's one way of doing things. The main problem I see with this interface
>is that there are quite a few possible methods (I count 71 classdefs in
>the SGML property set, though of course not all of those are applicable to
>XML), and it becomes difficult to expand the set of events.
I use about 8 event handlers for most of my API's...
>As much as possible, a good reusable component should not force the
>user's hand when choosing what node to grab onto. As an example,
>YACC is pretty bad about this. You supply it with a lexer (with a
>fixed name) and a set of handlers to be called when productions are
>reduced. The YACC-generated parser insists on being in charge.
Sure. The important thing with is that if you want to query into
a document, you have to have parsed at least as far as the nodes you
want to access, and that haveing a tree representation for such cases
makes it a *lot* easier. For cases where you "want to be in control",
I would have the event handler be a grove constructor, and have the
application work upon the grove. Note that accessing a grove, or
querying a document is *different* to *parsing* a document.
>1. An external entity manager, responsible for obtaining document
> instances (the "start" document and others), DTD's, etc. from
> local storage, the web, some database, etc. This should probably
> be user-customizable.
I'm not sure about this. In some ways, I cannot see the reason for
*exposing* an entity manager, but then again, I can imagine an
implementation without one either....
>2. An encoding manager, responsible for mapping one of the possible
> XML document encodings (Latin-n, UTF-7, UTF-8, UCS-2, UTF-16, whatever)
> onto ISO10646 characters.
Streams...
>3. The parser itself, responsible for turning characters into XML events,
> and possibly into grove structures.
Push grove building off to later stages.
>[Browser] gives the most complicated parser, since it has to asynchronously
>handle information from several different documents.
>
>[YACC] is the easiest to write, but it's less flexible. Given [Browser],
>it's easy to write [YACC]. (Given [XMLEventStream] you can also derive
>[YACC], but with greater overhead.)
>
>[XMLEventStream] and [Grove] give you the most flexibility with respect to
>the grove plan.
I think these confluge many different processing layers.
>languages, but the only firm conclusion I've come to is that I really wish
>I could use coroutines.
Amen to that sentiment.
xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list