Addressing the Enterprise: Why the Web needs Groves

Sun Jun 6 11:17:28 BST 1999

I'm posting the first few sections of 

http://www.prescod.net/groves/shorttut/

I've rewritten a lot of it to demonstrate the importance to the Web. I am
hoping to generate some discussion on XML-DEV and xlxp.

--------------------------------------------------------------------------------

1. Introduction
This paper is a high level introduction to the grove paradigm. Just as
SGML was a hidden jewel buried in among the ISO standards for screwdriver
heads, groves are another well-kept secret. The time has come to make
"groves for the Web". This document should be relevant to the people that
would do the specifying and coding to make groves available on the Web,
but also to technically-oriented managers that are not interested in the
fine details. 

This document is intended to explain the importance of the grove paradigm
to the 
It is intended to clarify in people's minds what the result of parsing an
SGML or XML document should look like. Some variation on the grove model
could be imagined, but the basics of the model seem fundamental and
unavoidable to me: for instance, W3C's DOM reflects the same basic
concepts. 
Groves were invented to solve the problems that had become revealed at a
particular point in the development of the SGML family of standards. XML
has reached the same point so the time is right to popularize the grove
idea.

Please send me your comments on this document. It will eventually become
an ISOGEN technical paper, but it is still rough. 

--------------------------------------------------------------------------------

2. Background

2.1 The Problem

In the early and mid-1990s, the ISO groups that were responsible for the
SGML family of standards realized that they had a large problem. The
people working on the DSSSL and HyTime standards found that they had
slightly different ideas of the abstract structure of an SGML document.
Understanding an SGML document's structure is easy for simple things, but
there are many issues that are quite complex. For instance, it is not
clear whether comments should be available for a DSSSL spec. to work on,
or whether they should be addressable by hyperlinks. It isn't clear
whether it should be possible to address every character, or only
non-contiguous spans of characters. Should it be possible to address and
process tokens in an attribute value or only character spans? Should it be
possible to address markup declarations? XLink and XSL must solve all of
the same issues.

Although this paper will discuss many problem domains, the reader should
keep in mind that addressing is the central one. If you cannot address
information (e.g. through a URL) then you cannot do anything else you need
to it: such as retrieve it, bind methods to it, attach metadata to it,
apply access control lists to it, render it, work with it in a programming
language and so forth. Addressing is the key. Value follows naturally and
immediately.

The reason that addressing into XML (and other data formats) is
ill-defined is because the XML specification speaks of the syntax of the
XML language, not the abstract, addressible objects encoded in the
document. Linking and processing are done in terms of some data model, not
in terms of syntax. When you make a link between two elements, you are not
linking in terms of the character positions of the start- and end-tags in
an SGML or XML entity. You are linking in terms of abstract notions such
as "element", "attributes" and "parse tree". The role of an XML parser is
to throw away the syntax and rebuild the logical ("abstract") view. The
role of a linking engine (such as a web browser) is to make links in terms
of that logical view. The role of a stylesheet engine is to apply
formatting in terms of that logical view.

Unless stylesheet languages, text databases, formatting engines and
editors share a view, processing will be unreliable and complicated. It is
not very common for XML and SGML applications and toolkits to provide all
of the information necessary for building many classes of sophisticated
applications, such as editors. There is not even a standardized way for an
toolkit to express what information from the SGML/XML document it will
preserve. Even if two toolkits preserve exactly the same information, it
is quite possible that they use different terminology to describe the
information. In some cases, APIs might be identical except that they use
different structures to organize the information! But those one or two
features could make navigating the APIs very different.

In the software engineering world we have a technique for avoiding this
sort of problem: modelling. Using languages like the Unified Modelling
Language (UML) we can build sophisticated, intricate models of the world
that can be independently implemented and yet interoperable. I can hand a
model of a human resources application to a developer on the other side of
the planet and we can build logically compatible applications. Of course
UML is at a very high level. The precise expression of an object in a
particular programming language or system is not fixed by UML. The UML is
a mathematical expression of the entities and relationships in a problem
domain. It doesn't usually translate directly into code or APIs. That is
why we also have to use more concrete object description languages such as
IDL, ODL and STEP Express.

2.2 Why Not the DOM

The W3C has partially addressed this situation with a specification called
the Document Object Model (DOM). Unfortunately the DOM is not really an
object model in the abstract sense. It is rather just a collection of IDL
interfaces and some descriptions of how they relate. This is different
from an abstract object model because it is too flexible in some places
and not flexible enough in others. 

The DOM is too flexible in that it is not rigorous enough to be a basis
for addressing. For instance the DOM says that a string of four characters
could be broken up into multiple text nodes or treated as a single one. If
we describe addresses in terms of DOM text nodes, those addresses will be
interpreted differently by various DOM implementations. This is one reason
that XPointer and XSL are not defined in terms of the DOM. This weakness
of the DOM is fatal for using it for addressing but it is also annoying
for programmers. In some cases they must write special code to work with
documents that have different text breaking algorithms because the DOM has
given implementors too much flexibility here. It puts their ease of
implementation above the ease of coding for DOM users.

In other ways te DOM is not flexible enough. One important weakness is
that it is defined in IDL which does not permit much variation in language
mappings and bindings. We have found this very limiting in the Python and
Perl worlds. With these high level languages there are more convenient
ways of mapping the high level XML concepts into APIs than the ways
dictated by CORBA. If we use these ways instead of the DOM ways, however,
our APIs are conformant to the DOM only in spirit, not in terms of the
formal detail of the specification.

2.3 Defining views

The DOM has a more important inflexibility. It would be useful for the
programmer using the DOM to be able to define whether all adjacent text
nodes are merged or not. There is a "normalize" method that attempts to
provide this feature that method actually modifies the tree. All viewers
must see the same view. Another useful view is one in which every
character is a separate node. That view allows us to address individual
characters very easily. Another view might provide DTD information for a
document. Yet another view would provide linking information. Still
another view would attach RDF properties to the DOM.

We can also make views that are simpler than the default DOM view. We
could have a view that got rid of CDATA nodes and treated them just as
text. Another view might remove processing instructions based on the
principle that many applications do not use them. It would also be very
nice to be able to remove "insignificant" whitespace from a view. The W3C
is working on a subset of XML to make XML easier to process for parsers
but there is no such spec to make simpler DOMs for application writers.

Let's take this back to the addressing realm for a second. Given all of
these views of a document, we could do things like query for the node with
the "author" RDF property with value "Paul" or for the node that is
reference by a particular hyperlink or for the third character of an
element and so forth.

There is an important truth here. Every time we create a new specification
built on XML, we implicitly define new properties that should be attached
to the nodes: almost a whole new data model! Consider a document type
based not only on XML but also on XLink, namespaces and RDF. That document
has many different views. Here are some obvious ones:

 * First there is the low-level XML view. It would have elements,
attributes, characters and so forth.

 * Then there is the namespace view built on top of that. A "namespace
engine" would add some namespace information to the tree. It would
probably hide namespace attributes that were visible in the lower view.

 * Then there is yet another view that adds hyperlinking information. The
engine that provides this view can let us know whether a node is an anchor
or a link.

On top of that there could be a view built specifically for that document
type. It would understand the constructs in the document type and make
them available to a programmer as objects with properties.

Let's step back for a minute again. If we can make all of these views
available to the programmer in some simple, consistent way, then we could
surely make them all available to people doing querying and addressing
also. That means that we could make a query language that could do queries
based on constructs from all four levels! We could also easily define
query languages that were specialized and optimized for a particular
level.

The way we currently handle this is through different "levels" of the DOM.
The second one is being worked on right now. These levels tend to lag
behind the specifications that they are supposed to work with by months or
years. There is a DOM for XML, HTML and CSS, but nothing for namespaces,
RDF, XLink, XSL queries or XHTML. There is a single group within the W3C
that will be responsible for building all of these "levels" of the DOM.
This group of intelligent, well-meaning people is the most fundamental
bottleneck in the standards world today.

Even if all of the DOM people gave up their day jobs and became full-time
DOM builders they could never keep up with the amount of innovation
occurring within the W3C. Consider then, that the problem is in no way
limited to the W3C. People are building little XML-based languages with
their own data models all over the Web. A central API bottleneck is not
inconvenient: it is impossible. The DOM cannot be a universal API for all
XML-based languages.

The XML Information Set project is similar to the DOM except that it works
in terms of abstractions instead of APIs. That is an important first step.
But the Information Set is designed only for a single view of XML with
certain optional features. It does not seem at this time that it will be
possible for "end-users" (programmers and query-writers) to tweak the
views. It also does not seem that the model is designed to be extensible
to completely new views. In other words it provides the very bottom layer
but does not define the infrastructure to build the upper ones.

In the ISO world we solve this problem by farming out the definition of
data models. A "property set" is a formal model for a view of an XML
document. A property set is half way between an abstract, unimplementable
UML data model and a narrowly defined IDL definition. It speaks in terms
of the higher level concepts that are the basis of hypermedia and so can
be implemented conveniently in high level programming languages.

The important thing about property sets is that they embed and embody the
requirements necessary for a data model to be useful in a hypermedia
context. That means that every node in the grove is addressable. It is
easy to construct an address for any given node (for instance the
character under a mouse click) or node list (e.g. a selected list of
characters).

2.4 Hypermedia to the Max
Now we get to the really exciting thing about property sets. You can build
property sets for views of XML documents. Those are extremely useful and
powerful. Even more powerful, though, are property sets for things that
are not even XML. SQL databases and OLE objects can have property sets.
LaTeX files can have property sets. People have defined experimental
property sets for CSS, CGM and for something as abstract as legal
documents. After all, a property set is just a simple data model. You
could define UML models for all of those types. Defining a property set is
no harder.

But property sets have a huge benefit over UML: once you define a property
set for a data object type, that data object becomes addressible. This
means that every subcomponent of every data object in an enterprise is
potentially addressable. The important point is that you do not have to
convert all of your data resources into XML or HTML to make them
addressable. You may need to turn them into XML or HTML to render or
transfer them between machines, but there are many other things that we
need to be able to do with addressable resources. We can attach access
control lists to to them, make hyperlinks to them, attach metadata to them
and so forth.

Some "XML people" have this idea already but they express it in terms of
"building a DOM" over some non-XML resource. The idea is right but the
expression of the idea is wrong. The logic goes this way: We want an
addressable data model for the resource. XML has a data model. Therefore
let's use the XML data model for the resource. This model seems logical
but it is inefficient both in terms of computer time and programmer time.

If the underlying object is a relational database then it makes no sense
to take numbers (for example) and encode them as strings so that the
client application can unencode them back to numbers. Similarly, it makes
no sense to turn a database record into an XML element so that the final
application can think of it as a record again.

If all you need to do is address a database record then what you want to
do is the minimum required to turn a database record into something
addressable. The grove model is designed so that defining a property set
for the database is the minimum you have to do. In this case you can
forget about XML altogether!

In buzzword-speak this is "addressing the enterprise." Every data object
in an organization from the smallest non-profit to the largest
multinational can be addressed through a single data model and query
language. You might also think of these as meta-models and meta-query
languages in that the grove model and its associated query language give
you a framework for defining the details of more precise models and richer
query languages.

Let me say again that rendering the document is another matter altogether.
Given an address, the easiest way to render the object might be via XML.
For a database record this might be the case. For a slide within a
PowerPoint document, however, the easiest way to render it might be
through OLE. Addressing is separate from rendering. Groves allow you to
say that you want to see the slide. OLE or XML/XSL might provide the
technology that you need to actually see the slide.

Without groves, hypermedia addressing is very poorly defined. For
instance, how do you, today, make a hyperlink to a particular frame in an
MPEG movie, or a particular note in a midi sequence? How would you extract
that information in a stylesheet (for instance for sequencing a multimedia
hyperdocument). It makes no sense to address in terms of bytes, because
often a single logical entity, like a frame, is actually spread across
several bytes and they may not be contiguous. Addressing in terms of
characters would make even less sense because MPEG movies and midi
sequences are not character based. The web solves this problem by
inventing a new "query language" (in the form of extensions to URLs called
"fragment identifiers") for each data type. This more or less works, but
it leads to a proliferation of similar, but incompatible query languages
doing the same basic thing. These languages have different syntax and
underlying models.

This brings us to the next point: implementation. Under today's W3C way of
doing things you would implement a hypermedia browser (e.g. SMIL) by
hard-coding support for each different type of query for each type of
playable object. If resources hyperlinked to each other through these
fragment identifers, the implementation engine would have to implement
separate query languages for each fragment identifier type. That is an
annoying waste of time.

Consider, the issue of metadata attached to parts of media objects through
links. For instance I might add a title to an MPEG frame so that I could
locate it later. Or I could add a pop-up-video style annotation.

Usually this would be implemented as some sort of on-disk or in-memory
database. In one column of a record you would have the properties that you
want to attach (expressed somehow). On the other side you need to have
things to attach them to. We need a generic term for "things that you can
address within media objects." Generically, we could call these "nodes"
and "node lists." As soon as you make that leap to describing the targets
of references generically, regardless of media type, you have essentially
reinvented groves. It follows that standards like RDF implicitly depend
upon a (currently underdefined) concept similar to groves. Instead of
reinventing them, however, you have the option of using an international
standard that specifies them! I hope that one day there will also be a W3C
standard that does something similar.

More at: http://www.prescod.net/groves/shorttut/

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"Silence," wrote Melville, "is the only Voice of God." The assertion,
like its subject, cuts both ways, negating and affirming, implying both
absence and presence, offering us a choice; it's a line that the Society
of American Atheists could put on its letterhead and the Society of
Friends could silently endorse while waiting to be moved by the spirit
to speak. - Listening for Silence by Mark Slouka, Apr. 1999, Harper's

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)