YAXPAPI (Yet Another XML Parser API)- an XDEV proposal

Tim Bray tbray at textuality.com
Sun Dec 14 00:05:36 GMT 1997


At 12:03 AM 14/12/97, Peter Murray-Rust wrote:
>I am listing the main calls from Lark and AElfred that I find useful. As
>you can see there is a great similarity - I confess that I find the AElfred
>ones slightly easier to understand.

OK, let's get concrete.  I think that the AElfred callbacks each having
an XMLParser argument is a good idea.  Also AElfred's names are better,
the "Do*" prefix in Lark is silly.  So on the event-stream stuff, I'd
go with the AElfred model modulo the following changes:

>  attribute(XmlParser, String, String, boolean) 

It seems completely wrong to have an attribute event separate from
start-element events.  To start with, it suggests that the order of 
attributes is significant, which it is incorrect.  Secondly, since much
element-specific processing depends on what attributes are there, it is 
less convenient for the application programmer.  Third, if the processor
(as it must) does defaulting, he's going to have to do some attribute
list wrangling anyhow, so it can't really be extra work.  

What's the boolean?  I don't think the application author should
to have to deal with anything but the name and value of attributes.

Anyhow, I'd go with 

startElement(XmlParser processor, String type, Attribute[] attributes);

and lose the attribute() method.

>  data(XmlParser, String) 

I feel that the 2nd argument should not be a String.  It is a recipe
for disastrous inefficiency if the processor has to cook up a 
java.lang.String object for every little chunk of text.  Lark uses two
arguments, a char[] array and a character count; the app can
make a String if it needs to.  If you find this awkward, create
a new data type called Text so that if you need a String you
can make it with lazy-evaluation in Text.toString(), but if you
don't need it you don't build it.

Also, it shouldn't be named "data" - it should be named
characterData or charData or text or some such term that can
be mapped directly to the spec.

>  resolveEntity(XmlParser, String, String, URL) 

I don't think entities have any place in the first cut of this 
interface.  The processor exists to make these problems go away.

Generalities: 
Lark has a thing where if any callback returns 'true', the
parser drops out of its loop... which is awfully useful and easy
I think.  Lark will also re-enter, but this need not be a requirement.

Also, for application programmers, especially dealing with smallish
objects, a tree interface is very natural.  I've written both
event-stream and tree apps using Lark, and the trees are a lot
easier to use for anything even moderately complex.  So the API 
should have Element, Attribute, and Text classes. 

And it shouldn't (sorry Peter) be called YAXPAPI - how about SAX, Simple
API for XML?  Maybe SAX-J for the Java bindings. -Tim

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list