YAXPAPI (Yet Another XML Parser API)- an XDEV proposal

Tim Bray tbray at textuality.com
Sat Dec 13 17:58:18 GMT 1997


At 03:19 PM 13/12/97, Peter Murray-Rust wrote:

I agree with Peter that we should just buckle down and get on with what used
to be known as XAPI.  

But my approach would be quite different.  I think that the first step 
should be the end-user's API, the kind of thing that someone using a SMIL 
or RDF processor would need.  Such a person really doesn't want to wrestle
with entities and references and PIs and marked sections; all they want
is elements and attributes and the basic doctype info; they want the
processor to deal with entities and refs and quote marks and white space in
markup and encodings and so on.  

This would go a long way to address the whinings of the RDF & SMIL type 
people, who thought XML just meant elements and attributes.  I think that 
from their point if view, it should be, all the other stuff in the syntax 
is strictly to support authoring and management convenience.

It should come in event-stream flavor and tree flavor. 

Minimal event stream API:

1. Doctype, returns: root type, external subset system/public idents
2. Element start, returns: type, element name-value pairs, whether it's empty
3. Text
4. End Element, returns: type

Minimal tree API:

1. Document, with methods: root type, system ID, public ID, root element
2. Element, with methods: parent, children, attributeValueByName, allAttributes
3. Attribute, with methods: name, value
4. Text (presumably hiding lazy evaluation)

I acknowledge this is grossly insufficient for basing an editor on. You want
that, use the DOM.  Only a few choices have design implications:

1. How are children returned; possibilities would be to have Element and 
   Text crammed into the same class with a method for asking which is which,
   or have separate Text and Element classes, then children returns an Object
   array or a Vector, and you can find out what kind of child each member 
   is using the instanceof operator.  I favor the latter, Lark does this

2. Whether it's worthwhile putting children into, as opposed to a native
   array or Vector, a special ChildList class with enumerator and indexing
   so you can hide a lazy-evaluation behind it.  I favor the latter, the 
   DOM does this but Lark doesn't.

3. Whether the processor should be required to coalesce adjacent Text
   objects.  Suppose you have <a>foo <!--comment--> bar &ref; <?pi?>baz</a>,
   it's immensely less work if the processor can give this to the app
   as 4 Text chunks.  I think most of the processors do this now.
 
If I formalized and published this, it would look a lot like part of 
Lark's interface, but I bet all the other parsers could implement it.  
Should I? -Tim

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list