Query Languages for XML

W. Eliot Kimber eliot at isogen.com
Mon Nov 17 17:48:41 GMT 1997

At 08:04 AM 11/15/97 +0000, Richard Light wrote:
>One important thing about "Standard Query Language" is that it doesn't
>just query.  It is actually a complete language for "defining, accessing
>and otherwise managing relational databases".

In other words, SQL, in addition to enabling *queries* (that is, request
for information about tables) is *also* an editor scripting language where
the documents are relational tables.

The SGML/XML world view can be thought of as a place where there are two
fundamental types of activity: query and edit.  A query is always
read-only. An edit results in a new document.  This also suggests that
there is no fundamental difference between editors and document management
systems that manage abstractions of documents (like Crystal's Astorial or
Texcels Information Manager).  In other words, a document management system
is just a very beefy editor with a poor user interface or editors are weak
document management systems with poor persistence but good interfaces.

Thus, SDQL is a "pure" query language in that it's only purpose is to
return the results of queries on the properties of nodes in groves.
However, the DSSSL transformation language can be thought of as an editing
scripting language because the result of applying a DSSSL transformation to
a document is a new document.  

Note that it doesn't matter how the creation of the new document is
*implemented*. Whether you literally generate an entirely new grove from
scratch or simply add and remove nodes and properties from the one you
have, the result is the same: a new grove, which means a new document.
DSSSL simplifies its abstract processing model by making groves static *in
the abstract*.  However, implementations are free to make groves dynamic
*under the covers*.

Remember also that unless you're talking about SED scripts or Perl hacks,
it's not meaningful to talk about operations on XML documents--it's only
meaningful to talk about operations on abstractions of XML documents, i.e.,
groves.  This is why both the DSSSL and HyTime standards are defined in
terms of operations on groves, not operations on SGML documents.  

If we define "editing" as the process by which the abstraction of a
document is modified and a new document is created (here using the term
"document" as it's defined by SGML and XML, that is, a character string
conforming to the syntax defined by the standard), then *any process* that
creates a new document is an editor.  The only question then is whether or
not the editor is interactive or batch, which is really a question of user
interface, not functionality.

All editing languages must include a query language because you must be
able to examine the properties of the objects the editor is manipulating,
but I think that it is confusing to call an editing language a query
language just because SQL is incorrectly called a query language.

Or said another way: given a robust query mechanism, such as SDQL, it is
possible to create an infinite number of editing languages that provide the
appropriate interaction and convenience characteristics needed for a
particular editing application.  When the tasks of querying and editing are
kept separate, it becomes clear that it is not necessary to bind them
together (although doing so may have advantages in some environments).

Thus, the argument that SDQL is insufficient for complete XML processing
and is thus not useful misses the point that what was asked for was not a
query language at all, but an editing scripting language, which SDQL is
not.  However, SDQL could be of service to any number of scripting
languages by providing a ready-made syntax and set of semantics that can be
used directly.  

I can easily imagine creating a simple set of DSSSL expression language
functions that provide the grove manipulation actions needed: delete node,
add node, set property, delete property.  Implementing these would be easy
enough to do once you had code that managed groves (i.e., a DOM-based
read-write browser), which we have in both Netscape and IE4 and will likely
have in SGML/XML editors in the near future.


<Address HyTime=bibloc>
W. Eliot Kimber, Senior Consulting SGML Engineer
Highland Consulting, a division of ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 95202.  214.953.0004

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list