YAXPAPI (Yet Another XML Parser API)- an XDEV proposal

Sun Dec 14 02:03:00 GMT 1997

Tim Bray writes:

 > >  attribute(XmlParser, String, String, boolean) 
 > 
 > It seems completely wrong to have an attribute event separate from
 > start-element events.

I have worried about this myself.  My design goal with Ælfred has been
to limit myself to two class files: one for the parser itself, and one
for the interface for the callbacks -- hence the separate event for
attributes.  This decision has forced some pretty severely hacked-up
internal code accompanied by very careful documentation.

I could send a hashtable of attribute names and values with the
startElement() callback, and let users look up types (etc.) with my
query methods, but I would have to lose a bit on two counts:

1) Allocating a new hashtable for every start tag will slow down the
   parser a fair bit.

2) I'd have no way to show which attributes were specified and which
   were defaulted (see below).

 > What's the boolean?  I don't think the application author should
 > to have to deal with anything but the name and value of attributes.

The boolean tells whether the attribute was specified or defaulted.  I
include this to allow people to do useful XML-to-XML transformations.

 > >  data(XmlParser, String) 
 > 
 > I feel that the 2nd argument should not be a String.  It is a recipe
 > for disastrous inefficiency if the processor has to cook up a 
 > java.lang.String object for every little chunk of text.  

The overhead isn't that bad with Ælfred because I coalesce my data
into the largest chunks possible before allocating the String.  I
think that returning a char[] array would be confusing for users, and
would lead to many bugs in their code as they ignored our warnings not
to rely on the value in the char[] array outlasting the callback.

 > Lark uses two
 > arguments, a char[] array and a character count; the app can
 > make a String if it needs to.  If you find this awkward, create
 > a new data type called Text so that if you need a String you
 > can make it with lazy-evaluation in Text.toString(), but if you
 > don't need it you don't build it.

Again, I'm reluctant to create new classes beyond XmlParser and
XmlProcessor.

 > Also, it shouldn't be named "data" - it should be named
 > characterData or charData or text or some such term that can
 > be mapped directly to the spec.

Agreed.  I will not change Ælfred now, but I think that this is a good
idea.

 > >  resolveEntity(XmlParser, String, String, URL) 
 > 
 > I don't think entities have any place in the first cut of this 
 > interface.  The processor exists to make these problems go away.

Normally, you should just return the URL argument; however, this
callback gives users a chance to do public-identifier resolution, URL
substitution, etc., and to return a different URL if desired.  For
example, if we had a DTD at

  http://www.microstar.com/XML/msldoc.dtd

and you had a local copy, you could substitute a local URL on your own
computer.  Likewise, you could do a catalogue lookup on the public
identifier "-//microstar//DTD Microstar Sample Document//EN" and
choose a different system identifier than the default supplied in the
document.

That said, I agree that this probably doesn't belong in the common
event API.

 > Generalities: 
 > Lark has a thing where if any callback returns 'true', the
 > parser drops out of its loop... which is awfully useful and easy
 > I think.  Lark will also re-enter, but this need not be a requirement.

Awfully easy with a DFA-driven parser, but trickier with a
recursive-descent parser like Ælfred.  I'd probably have to throw an
exception, and could not allow any kind of re-entry.

 > Also, for application programmers, especially dealing with smallish
 > objects, a tree interface is very natural.  I've written both
 > event-stream and tree apps using Lark, and the trees are a lot
 > easier to use for anything even moderately complex.  So the API 
 > should have Element, Attribute, and Text classes. 

Perhaps -- I may have to give in an allow Ælfred to use more than one
class file; or alternatively, these would be an optional extra, along
with the SAX-J layer.

 > And it shouldn't (sorry Peter) be called YAXPAPI - how about SAX, Simple
 > API for XML?  Maybe SAX-J for the Java bindings. -Tim

How about RUSTY?

All the best,

David

-- 
David Megginson                 ak117 at freenet.carleton.ca
Microstar Software Ltd.         dmeggins at microstar.com
      http://home.sprynet.com/sprynet/dmeggins/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)