Three Access Language Paradigms

Tue Nov 18 20:15:42 GMT 1997

I have been thinking intensely about several issues these
past few days, and I've been trying to put them all together
into a coherent whole.  So far I'm not succeeding, so I'm
initiating a series of discussions to help me make sense of
things.  Here's the first...

We would like clients to be able to remotely manage documents
residing on servers.  Clients need to be able to both query
and edit those documents.  This might be done via OMG CORBA
interfaces, or it might be done via a human-readable query
language.  Whatever the mechanism, I'd like to call the
mechanism a "document access language" or just an "access
language" for purposes of this discussion.  In this posting
I explore three different access language paradigms.

It seems to me that so far the W3C has focused on using DOM
as the language by which clients remotely access documents.
Under DOM, clients view documents through CORBA interfaces
that make the document look like a tree of XML objects.
Once the W3C has established all of the necessary interfaces,
a client will have full control over a document's contents,
subject to DTD and access control constraints.

More recently, we have discussed possibly supplementing the
DOM approach with a human-readable access language.  A
streamable access expression would be shipped to the server,
and the server would provide a streamed response.  Document
content would have to transfer between client and server,
and the form of the content would be constrained by the DTD
that defines the document.  The syntax of the human-readable
language is undecided.  It might be OQL or it be SDQL with
extensions or it might be XML with embedded content.

I'd like to present still another form of access language.
This approach is based on a different way of thinking about
documents.  Instead of asking document repositories to look
like XML documents to the external world, we only ask that
the repositories speak XML with the external world.  DTDs
would be defined for the protocols that repositories might
care to speak.  The DTDs would define the structure of the
protocol messages rather than the structures of documents.
One repository might speak several protocols (e.g. 'Patient
Records Protocol V.152' or 'Bank Transaction Protocol 2A').
If the repository were capable of containing arbitrary XML
documents, the repository might speak a specific protocol
called 'XML Document Protocol V.1.0'.

Under the third approach, XML documents would appear less
often as persistent repositories and more often as transient
messages between clients and servers.  It would still be
necessary to define the base DTD for all of these protocols
since one server port must be able to parse them all well
enough to identify the protocol.  It may even be possible
to define the syntax for queries, insertions, and updates,
so that the individual protocols have less inventing to do.

Briefly consider the benefits of the third approach.  The
most significant benefit is that it completely frees the
repository from having to conform to an XML object model.
We could expose a legacy database to the world through one
of the protocols with only a thin wrapper around the
database.  New databases could restrict the protocols they
support and specialize their structures according to the
kind of data they care to represent.  They could be based
on custom object-oriented schemas or relational schemas.
This approach also lowers the entry level into the data
repository server world.  We could think of servers more
as information warehouses than as virtual documents.

The most significant drawback of this approach is that it
doesn't give us a single access language.  It probably
gives us a different access language for each protocol.
(Somebody please let me know whether this need not be so.)
One of those access languages would be defined in the
'XML Document Protocol,' and this is the language that we
have been looking for so far.  Ideally, the access
languages for all of the protocols would have the same
syntactic substrate, so that the only new additions to
each protocol would be elements that are specific to the
information being represented.  However, it is not
immediately apparent to me that this will be possible.

Yet, there are so many ways to represent data in XML and
in other formats such as relational and persistent OO.
The database vendor should not be constrained to use an
architecture that will export the repository as something
that looks like XML (such as DOM).  For example, many
different DTDs can be invented to represent a given set
of data, and no standard should constrain a vendor to use
a specific DTD for organizing the information.  A standard
should exist for how to query and update information and
for how to represent the data of concern (e.g. patient
records or transactions) -- that's what the DTDs should
define.  Hence, I came to the protocol proposal.

Now it's time to talk about SQL and OQL.  To a large degree
these languages expose the representation underlying the
database.  SQL exposes tables and columns, while OQL
exposes the persistent classes and their methods.  These
access languages are defined based on the schemas, so that
once the schemas are defined, voila, so are the access
languages.  We save ourselves a lot of time.

The SQL and OQL approach has one extremely significant
drawback: compatible databases have identical schemas.
Where are the clients that speak 'Patient Record Schema
V.2.1,' and where are all the databases that are
compliant with this schema standard?  Everybody uses
generic database backends, and no little guys can come
in to compete by specializing for a given standard.  If
we had based these older query languages on protocols,
it wouldn't have been much of a problem for object-
oriented vendor X to come in and replace relational
vendor Y's server implementation of a standard; there
would have been no need to replace the clients.
Shouldn't we be building that sort of flexibility into
our new XML-compliant databases now, so that we will be
able to accomodate tomorrow's unexpected architectures?

I do not believe that it is necessary for an access
language to expose the database's architecture.  In our
case, I do not believe an access language must assume
that the database is architected in a way that allows it
to appear externally as an XML document.  It might be
desirable to do this, since it could keep us from having
to extend the query language for each protocol, but I do
not think that it is necessary.  It is only necessary
that the client and the server agree on the structure
and the meanings of messages sent between them.  We ought
not place constraints on our servers that need not be
there.  I think DTDs for persistent documents are going
to be over-constraining.

I have more issues to discuss regarding DOM and the
required nature of an XML-document query language.
Everything seems related to everything else, but I'll
end this topic here just to get things started.
--
Joe Lapp (Java Apps Developer/Consultant)
Unite for Java! - http://www.javalobby.org
jlapp at acm.org

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)