Access Languages are Tied to Schemas

Joe Lapp jlapp at acm.org
Thu Nov 20 14:28:02 GMT 1997


I have been searching for the properties that a repository access
language must have.  Here I present an argument for why an access
language must be tied to a repository's architecture in the manner
analogous to how SQL and OQL are tied to database schemas.  I
infer what this implies for XML DTDs and then ask a question whose
answer I think has important repercussions.

Let's say that a "repository" is any software object that contains
information and that provides a way for clients to read, write,
and modify that information.  A client must know how to talk to
the repository in order to get the repository to do anything.
We'll call the language that the client must speak the "access
language."  The client uses this language to submit requests and
to understand responses.  The server uses this language to make
sense of requests and to submit responses.  Both the client and
the repository must house knowledge of this access language.

(The access language may use distinct subset languages for requests
and responses, but both software objects would still have to contain
knowledge of both subset languages.  For simplicity, I assume that
requests and responses use the same language, but my argument should
hold even if they are different.)

The access language must convey information in two directions.  In
order for the information to be comprehensible, it must be conveyed
in recognizable units.  Both the client and the repository must
know how to generate and parse these units.  Hence, a standard must
exist to which both sides conform.  This standard says what kind of
information units there are and what they look like.

Information units usually have relationships with one another.  A
client often cares about accessing units that have a particular
relationship with some other unit.  For example, a client might
care to retrieve all liens on a particular property.  The access
language must allow a client to select units according to their
relationships with other units.  In particular, a client must be
able to identify the relationships of concern.  Both the client
and the repository must now be in agreement about the kinds of
relationships that may exist among information units.  We find we
also need a standard that says what kinds of relationships there
are and what kinds of information units participate in them.

It seems that the standard has quite a bit to say.  It says what
kinds of information units there are, what kinds of information
they contain, what kinds of relationships there are, and what
information units participate in those relationships.  What we
have is an object model.  This is the kind of thing that OMT and
UML are very good at expressing.  We have learned that both the
client and the repository must have knowledge of the same object
model.  Moreover, in the spirit of object-oriented design, each
side should harbor some representation of this model.  That is,
both sides have components that share a common architecture.

In retrospect, this makes sense.  Were the two sides working with
different models we'd have a case of the infamous impedance
mismatch.  We normally think of impedance mismatch as occurring
between an object-oriented application and a relational database,
but it can also occur between two object-oriented applications.
One organization may decide that liens are not useful entities in
themselves and so bottle them up with their associated properties
(i.e. properties would be aggregates containing liens, and liens
would not be classes of the schema).  Another organization may
want to store liens separately so that they can select all liens
that meet a given criterion (i.e. properties would be associated
with liens, and liens would be classes of the schema).  When the
second organization decides to hook its client up to the first
organization's database, the client can neither select among
liens nor properly interpret property objects.

Okay, so we've established the need for industries to standardize
on object models.  These standard object models would only say
what the repositories need to look like through an access 
language.  Any given repository is free to transparently translate
that model into a more suitable internal one.  We've also
established the need for access languages to reflect these object
models.  SQL and OQL conform to this requirement by having clients
use the language of the database's persistent storage schema.  XML
introduces another way to model information, a way that is
distinct from the relational approach but somewhat similar to the
object-oriented approach.  XML repositories have schemas too, and
these schemas are defined by the DTDs.

Before concluding I'd like to ask a question whose answer may
have significant repercussions.  It seems that by asking an XML
repository to manage information for a particular industry, we
are asking ourselves to create DTDs that model the industry.  The
question is this: to what extent are DTDs to specify the object
model of a given industry?  More specifically, do we intend for
the following capabilities to fully implement an object model:
(1) the ability of a repository to ensure that the information it
contains is always in conformance with the DTDs, and (2) the
ability of the clients to properly interpret the informational
units and the relationships that the DTDs declare?

In conclusion, it seems that that an access language must impose
architectural constraints on at least a component of a repository
and that these architectural constraints will apply to all
repositories that conform to a particular industry standard.  In
particular, it does not seem possible to create individual access
language protocols that won't to some degree constrain the
architectures of the repositories.  Such languages are probably
feasible only when we can think of a repository as a flat file
of unrelated information units.  Since an object model will have
to be developed for each industry, we might as well standardize
on a way to access object models in general.  This way we won't
be asking industries to perform the additional work of inventing
an access language for each object model.
--
Joe Lapp (Java Apps Developer/Consultant)
Unite for Java! - http://www.javalobby.org
jlapp at acm.org

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list