Groves, IsNess and the Generic Data Object.

Paul Prescod paul at prescod.net
Mon Sep 27 14:47:10 BST 1999


Sean Mc Grath wrote:
> 
> That level of abstraction -- nodes -- is all I need to
> process this data -- given some simple API. XML
> provides such a simple API. 

XML is not an API. XML is a serialization. What API are we *really*
talking about: SAX, DOM or something invented? If we're going to compare
the convenience and power of a grove-based API we need to compare it
*to* something.

> I don't buy it. I personally do not find the prospect of
> programming the latter rather than the former in any way
> daunting or limiting. It is a trivial transformation to
> take a hierarchy of typed nodes and associated attributes
> and create an object hierarchy if I really want to be able
> to use "object.instance attribute" syntax.

Really? What's the object.instance hierarchy for the following:

<A>
  <B>Foo</B>
  <C C="Foo">
  <D>Bar</D>
  <E>1</E>
  <D>Baz</D>
</A>

This is a serialization of an object.instance hierarchy I have in my
head but I'd  like to see how you will reconstruct it automatically.
Here's a notation you can use:

K.Q=5
K.Q.S="abc"
K.Q.D[0].E='a'
K.Q.D[1].E='b'

Don't forget to reconstruct the primitive data types.

> This is where I think you have misunderstood my position. As human
> beings we have a cognitive pre-disposition to thinking in terms
> of hierarchies. 

But in the modern world most data is NOT modelled primarily in terms of
its hierarchy. Mostly it is modelled in terms of property/value pairs.
This is the case with every modern programming language and most APIs. 

We interpret those property/value pairs as hierarchy because that allows
us to do enumeration. This in turn allows us to serialize the data
structure as XML.

> This API is all you need to process arbitrary hierachies
> of data. It is *not* a pre-condition of programming
> to this API that the data must have been previously
> serialized in XML notation!

Of course. But the "XML API" was designed completely with XML in mind.
No sane person would have proposed something like the DOM as a
"universal API to data" five years ago. They would have said it was way
too cumbersome and its concepts seem to be pulled out of nowhere. It
makes sense to us because we know that its concepts are pulled out of
*XML*.

> This is exactly my point! XML is syntax for representing
> a hierarchy. This syntax leads naturally to an API
> that is couched in terms of elements/attributes. 

The syntax leads naturally to an API that is natural for XML and
incredibly UNNATURAL and inconvenient for anything else.

> I see it as a trivial transformation to convert a
> hierarcy of elements and attributes into a
> collection of objects with associated instance
> variables.
> 
> I believe this has been done on
> numerous occasions in the SGML world. I think
> it was Bob duCharme who wrote a paper about
> transforming SGML instances into object hierarcies
> using Smalltalk as the implementation language.

If it is trivial then why is Bob DuCharme writing papers on it? Why is
Andrew Layman writing papers on it? I'm willing to bet that there are a
half a dozen other papers on it out there also. They all propose
incompatible ways of interpreting XML elements as objects and objects as
XML elements. So what happens if the MPEG engine uses the DuCharme
method to convert objects to XML and the application uses the Layman
method to convert the XML back to objects?

The obvious solution is to standardize the representation. Let's say we
arbitrarily choose the Layman representation. Now we have:

foo.bar =XML-izer=> XML =object-izer=> foo.bar

What in the world is the benefit of the XML in between? We've encoded
and decoded for nothing! The only possible benefit is if the client and
server are on different machines (or virtual machines). In other words
we're back to using XML for *interchange* not as an API.

Plus there's a more subtle problem here. The whole point of this entire
exercise was to make foo.bar *addressable*. We never wanted to provide
an API on top of an API for its own sake. We wanted to augment an
existing API with addressability (that's why I called it a "base class"
in my original message). But in your universe the addressability comes
from the XML. So addressing is done in terms of the middle layer even
though programming is done in terms of the object API. This is
incredibly inconvenient because the programmer must mentally switch back
and forth between the arbitrarily chosen XML representation ("the Layman
representation") and the object API.

> Somewhere along the line someone seems to have had
> an "Aha!" moment which went like this
>         "...ergo we need groves".

I doubt that there was a single such event because there are various
reasons we need groves. 

The first Aha experience is probably the same one that gave us the
information set in the XML world. Do you agree that we need the
information set? From there it is clear that addressing is done in terms
of the information set, not the serialization. From there it becomes
clear that addressing into non-XML media should NOT be done in terms of
the XML information set...it just doesn't make any sense. Every media
has its own information set model implicitly. From there it should be
clear that we need a schema language for information sets (the W3C is
using RDF schemas, groves use a property set).

I have recently being toying with the idea that most people do too much
in terms of the information set of XML. For instance if we invented a
transformation language that was optimized for property/value pair
structures then it could be directly applied to (e.g.) Python or Perl
objects or OQL result sets. Instead we've got XSLT which requires things
to be encoded as XML. If your only goal is object->object transformation
(not interchange), encoding as XML is just another unnecessary step.

 Paul Prescod

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list