API thoughts...

Tue Mar 4 18:50:51 GMT 1997

I was thinking that my earlier comments have been a bit too abstract, and
Richard's post got me thinking about what kinds of calls we might like to
have ... so I'm going to post some incomplete Java declarations that
express the kind of protocol that I'm suggesting. Details are not the issue
here, but the overall structure is what I'm proposing. Norbert's parser is
rather similar to this in some respects, as far as I've seen (unpacked +
browsed source, but not executed yet).

/** A XMLParser can be constructed with explicit or default options, and
always takes an XMLBuilder as an argument. The XMLBuilder is an interface
that implements a callback for each significant event that the parser
detects.

I think we should also provide a dummy starter class that implements
XMLBuilder, and implements null operations on each event -- otherwise we're
making implementors perform a typing exercise for events they don't care
about.
*/
public interface XMLParser {
     /** You can't put constructors in an interface, but the idea should be
clear. I'm not sure how Java maps to IDL anyway... */
     Parser(XMLBuilder builder);  // make a Parser to callback builder with
                                  // all options set to defaults

     Parser(XMLBuilder builder, XMLOptions options);
                                  // here we also set the options

     public void parse(String url_start);  // start parsing a document

     public void parse(InputStream input);  // start parsing a document
               // Methods like this may require a base URL argument. They also
               // might not make sense in the public interfaces...

     public void set_options(XMLOptions new_options); // change parsing options

     // One way to handle entity resolution is to make it part of the
XMLBuilder
     // API, but it may be better to instead have a method like the following.
     // ... And of course a new "protocol object to encapsulate the operations
     public void set_entity_resolver(XMLResolver resolver);
               // Set external resolution strategy

}

/** If you pass an XMLTreeBuilder to an XMLParser it will create an
XMLDocumentTree object, and return it to you, letting you keep the results
of a parse. */
public class XMLTreeBuilder implements XMLBuilder {
    public XMLDocumentTree product(); // return the built tree after a parse.

    /* ... XMLBuilder operations omitted ... */
}

/** An XMLDocumentTree should be the start of a nest of document
representation classes. I don't have many special ideas here, and you all
probably have a better idea about how it should work than I do.

   My one idea, is that it should be able to drive a Builder just the same
way that a parser does.

I'm not sure whether we should be providing classes like this, or if
everything should be an interface....
*/

public class XMLDocumentTree {

    /** This method takes options, and runs a builder over the document
tree calling the builder for the virtual events found during traversal. Can
be useful, if you want to build several different views of a document,
without building them all in a single pass. */
    public void traverse(XMLBuilder builder);  // traverse the tree with
         // standard options
    public void traverse(XMLBuilder builder, XMLOptions options);
         // traverse with specified options.

    public XMLDocumentElement access_TEI_location(String TEIpointer);
         // We probably won't make methods like this part of the
         // public interface

/* actual data access methods to be determined...

    I see two main approaches to creating the data access methods:
    1. to create a bunch of particular objects Element, Attribute, etc. and
allow looking at them directly. This does make for a rather fat interface,
and a lot of objects. In some contexts this is good (low object coupling),
in others, bad (currently applets pay a high price for using many classes,
and this will take at least a year to improve).

    2. Create a general node object that can represent an element or
attribute or entity, etc, and use a general protocol to explicity test and
act on node types, and to traverse. This is essentially the grove model, as
I understand it.  The disadvantage is that it's not very concrete, and so
it's harder to understand. You also lose the ability to use type-based
dispatching if your programming style favors it -- you have to test the
generic nodes yourself.

   Either of these models is good, but we need to examine the tradeoffs
much more carefully and explicitly.
*/
}

/** Simple class that holds flags and other options for an XML parse or
tree traversal. Default values are made by intitilization and can be
overridden by subclassing and overriding, or by simply assigning values. */
public class XMLOptions {
   // there should be flags for each individual type of event. Since that will
   // be a lot of flags, we should consider having some flags that lump
together
   // frequently occurring options. e.g.:
   public boolean visit_elements = true; // Visit elements
   public boolean element_start = true; //  element open events
   public boolean element_end = true; // element close events

   public boolean expand_external_entities = true;   // Should external
entities
          // be automatically expanded?

   // ....
}

public interface XMLBuilder {
     // I've included a DocumentPosition for each item that has content. This
     // This is for full-text indexers, and the like.

     public void start_element(String name); // an element began

     public void attribute(String name, String value,
            AttributeDeclarationInfo attinfo); // attinfo may be null if
the DTD
                   // was not parsed, or the parser was requested to
                   // discard such information.

     public void internal_entity_reference(String name, String value,
            String type);
                   // Some applications will need to know this for XML->XML
                   // transformation. It's also useful since we no
                   // longer have SDATA

     public boolean external_entity_reference(String name, String value,
            String type, String notation_name);
                    // The boolean return could be used to allow case-by-case
                    // decisions on whether or not to expand the entity in
line.
                    // This is the alternative to making it just a global
                    // option.
                    // If an XMLDocument gets a request to parse an unparsed
                    // external entity, it should create and invoke a new
parser
                    // with the options that it was originally created
with, and
                    // then resume traversing the new items (added to its
tree).

  /* ... etc. ... */
}

   Just a sketch of the kind of API that I'd like to integrate with.

  -- David

_________________________________________
David Durand              dgd at cs.bu.edu  \  david at dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)