Access Languages are Tied to Schemas

Thu Nov 20 16:03:52 GMT 1997

At 09:26 AM 11/20/97 -0500, Joe Lapp wrote:

>I have been searching for the properties that a repository access
>language must have.  Here I present an argument for why an access
>language must be tied to a repository's architecture in the manner
>analogous to how SQL and OQL are tied to database schemas. 

Ideally, the logical model exposed by an SGML repository should be the
structure of the document itself, not the implementation details used for a
particular repository architecture. An SGML DTD defines structures in the
same way that the table declarations do for SQL, and in the same way that
the class declarations do for object databases that use OQL.

This is in keeping with the fundamental idea behind object persistence in
object oriented databases: if you use an object oriented database with C++,
your C++ class declarations are your schema. In the same way, if you use a
repository with SGML or XML, the logical model is declared by the DTD. 

>A client must know how to talk to
>the repository in order to get the repository to do anything.
>We'll call the language that the client must speak the "access
>language."  The client uses this language to submit requests and
>to understand responses.  The server uses this language to make
>sense of requests and to submit responses.  Both the client and
>the repository must house knowledge of this access language.

If we're talking traditional databases, that means that both sides must know
SQL, or both sides must know OQL, or whatever. Since we are talking SGML or
XML repositories, that means that both sides must know SGML or both sides
must know XML.

>The access language must convey information in two directions.  In
>order for the information to be comprehensible, it must be conveyed
>in recognizable units.  Both the client and the repository must
>know how to generate and parse these units.  Hence, a standard must
>exist to which both sides conform.  This standard says what kind of
>information units there are and what they look like.

For an SGML repository, these recognizable units are SGML elements.

Of course, for any particular SGML application, there would also be a DTD
that defines the schema for the applications, and the clients may well have
knowledge of this schema. The server might not need to have this knowledge
in some cases, as long as it knows how to manage SGML in general. And there
may be some clients that do not need this knowledge, either - e.g. a general
purpose querying and browsing client should be written to work for any DTD,
as should a formatting and printing engine, etc. 

In order to make general-purpose clients possible, clients must have some
way of asking the repository for the schema - either the DTD schema or the
structure of a particular document.

>Information units usually have relationships with one another.  A
>client often cares about accessing units that have a particular
>relationship with some other unit.  For example, a client might
>care to retrieve all liens on a particular property.  The access
>language must allow a client to select units according to their
>relationships with other units.  In particular, a client must be
>able to identify the relationships of concern.  

The relationships among objects often express much of the semantics of any
system - "it's not what you know, it's who you know". SGML/XML has two kinds
of relationships: containment and links. Queries should be able to handle
both. This has proven invaluable in OQL and SQL-3.

>We find we
>also need a standard that says what kinds of relationships there
>are and what kinds of information units participate in them.

But this can be quite general, e.g. the definition of SGML/XML. Again, this
is analogous to using C++ or Java to define schemas in object oriented
databases.

>It seems that the standard has quite a bit to say.  It says what
>kinds of information units there are, what kinds of information
>they contain, what kinds of relationships there are, and what
>information units participate in those relationships.  What we
>have is an object model.  

An object model of the kind you discuss here seems like the object model of
a particular application.

>Moreover, in the spirit of object-oriented design, each
>side should harbor some representation of this model.  That is,
>both sides have components that share a common architecture.

In the spirit of object oriented systems, metadata is the way one system
finds out about another system, unless they belong to the same application,
in which case they share class declarations. The same should hold for
SGML/XML repositories: programs that are part of the same application may
have knowledge of the DTD, but metadata is the way to write general purpose
programs, and writing general purpose software as much as possible is
usually a big win. 

>We normally think of impedance mismatch as occurring
>between an object-oriented application and a relational database,
>but it can also occur between two object-oriented applications.
>One organization may decide that liens are not useful entities in
>themselves and so bottle them up with their associated properties
>(i.e. properties would be aggregates containing liens, and liens
>would not be classes of the schema).  Another organization may
>want to store liens separately so that they can select all liens
>that meet a given criterion (i.e. properties would be associated
>with liens, and liens would be classes of the schema).  When the
>second organization decides to hook its client up to the first
>organization's database, the client can neither select among
>liens nor properly interpret property objects.

That depends, of course, on how the programs function. As long as I have
access, I can log into anybody's database, browse it, formulate queries to
find information, etc., because I use a general-purpose browsing and query
facility. If I have programs dependent on the classes defined in a
particular schema, then my programs do need to know the schema, e.g. the DTD.

One of the great advantages of architectural forms is that they make it
possible to write programs that work only on an agreed-upon abstract
representation of the schema, and each individual organization can build on
that abstraction to build documents that meet their own needs. This is a
real strength of the HL7 Kona proposal for medical record attachments, which
would allow parties to interchange information based on a set of
well-defined architectural forms, yet allow freedom for each party to
implement their own DTDs based on these architectural forms in order to
accomodate their own needs. This is, of course, analogous to the "design
patterns" approach of object oriented design, which strongly encourages
writing programs that use the abstract base classes which define the
interfaces rather than write programs that use the concrete classes that
implement them.

Jonathan
________________________________

Jonathan Robie
Email: jonathan at texcel.no
Texcel Research, Inc. ("http://www.texcel.no")

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)