XML Java API - An Idea(*)

Sat Jun 21 20:01:28 BST 1997

-----Original Message-----
From:	Peter Murray-Rust [SMTP:Peter at ursus.demon.co.uk]
Sent:	Saturday, June 21, 1997 11:31 AM
To:	xml-dev at ic.ac.uk
Subject:	Re: XML Java API Standardization

In message <199706211310.JAA17653 at smtp2.erols.com> "Peat" writes:
> If the document is very large, and the parser is required to maintain the
> grove, we would then require the parser to also then include some type of
> defined memory management.  Can this be a problem, where different parsers
> implement resource management differently?

Memory management issues shouldn't be an issue in the API standardization. If you are using a parser that cannot serialize the tree, then you are certainly going to be limited by memory. If you are using an object database to implement the grove, then you don't have size limitations but speed may become an issue.

This is an important point and one which I've been conscious of but ignored so
far.  JUMBO is quite large (with all the MOL classes in there's about half a 
megabyte of classes and I have had outOfmem failures with large files (ca.
1 Mbyte legacy input and translation into a tree).  I don't know whether there 
is  a generic solution to this.  I tried to run the garbage collector (JDK1.02)
occasionally and this helps, but since parser and browser and document all have
to be in memory then large docs are a problem.

Presumably in an application subtrees can be saved to disk (serialized?)
> 
> I would think if this burden is on the application layer, then knowledge of
> the application can be used to optimize resources.

I would think that if the author uses entities, then knowledge of the entity
structure would help.  In the browser the entities could be treated as 
'pointers' and resolved only when required.

Yes this is how other groves have been implemented

> 
> Grove standardization is a good idea.  Any ideas on how the grove
> standardization can be implemented up one layer?
                                     ^^  ???  ^^^

I'm just entering this thread so I don't know what solutions have been discussed. There is already an API to draw from in the DSSSL spec and a definition of the SGML property set which gives us a common language to work from. The problem is that an XML API to a grove should be simple with a small interface and should leverage the object-oriented power and syntax of Java.

Personally, when working with groves I find some abstractions very useful in an API. I would rather have an API based on iterators than one based on a set of navigation function calls.  I'm talking about navigating the grove rather than building the grove. An iterator API would be extremely simple, well abstracted and more inline with patterns of C++ and Java programming than the SDQL API found in DSSSL. They could also maintain an adherence to the syntax of the SGML property set.

Here is an example although my naming syntax probably does not correspond to the SGML property set here.

// Assuming we have a object provided by the parser that is a grove, instantiate an iterator and navigate to the first element that is a TITLE tag

// A Factory is an object that defines what SGML/XML constructs the iterator knows how to iterate. It provides the grove iterator with a different node iterator for each property node that it knows how to walk.

ForwardGroveIterator XMLIter(OurGrove, XMLPropertySetFactory(), StartNodePropertyHandle);

While(XMLIter++ != XMLIter.end())
{
	XMLBaseProperty Prop = XMLIter.Object(); // in C++ we would use the dereference operator like this XMLBaseProperty Prop = *XMLIter;
If (Prop.GetClass() == Element.Class) // is this an element?
{
Element aElement = Prop; // lets convert the property from a base class object to it's concrete class 
// Now we have an element object and can call all it's member functions
		if (Element.GetIdent() == String("TITLE"))
		break;
}
}

// OK lets instantiate a new iterator to walk back up to the root of the grove
// use the copy constructor to produce a reverse iterator from our x and functions of individual properties in the grove. Hence we can use the SGML property set or another property set with the same code.
6.) Iterators work well in different memory models and garbage collection schemes.
7.) Iterators, Factories, and Algorithmns can be combined in very powerful and flexible ways.
8.) Finally, Iterators are fun!!

Chris Lloyd
clloyd at gorge.net

Again, I reiterate that I'd like to see something concrete in a few days and
not to lose the momentum again.  

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)