Opinions requested
W. E. Perry
wperry at fiduciary.com
Fri Mar 5 07:23:19 GMT 1999
Marcelo Cantos wrote:
> "Jeffrey E. Sussna" wrote:
>
> > There is not (AFAIK) yet any such thing as an XDBMS
>
> I am continually surprised to hear remarks such as this. SIM _is_ an XDBMS (it is also an SGML, MARC, RTF, etc. database with structure and full content query capabilities). As an XDBMS it has weaknesses (it only supports predefined indexes and limited structure querying), but in some ways provides a model that is even richer than XML (it provides structure below element level, and has the concept of fields
In addition to this vision of an XML database, there has been much discussion of XML as a front end or a query-and-response framework for data stores, but I would argue that such applications of XML markup are not an XML database. A true XML database is shaped by the essential characteristics of XML itself: it should be freely eXtensible; it should be defined and manipulated by Markup; and it should be cast in a Document Structure within which Elements identify Data Constructs, and Attributes provide Data Characterization.
Like XML itself, the XML database is fundamentally mismatched to the familiar storage and transmission frameworks of filesystem, relational table, object serialization or data stream. In the first case, any item--document, data table, or executable--whether 'text' or binary--which is committed to storage in a filesystem is treated as a file: that is, as unitary and indivisible within the perspective and capabilities of the filesystem. A word processing program may, by opening a document, be able to identify and to manipulate as individual elements the sentences, paragraphs and chapters of that document. By contrast, the filesystem in which that document is stored reads, writes, renames, searches for or deletes the document as a whole. In XML terms, the filesystem sees the document as a single element--a root. Regardless of how many subelements we might mark up within that <root>, the
filesystem--designed for a generic 'file-like' document, is capable of manipulating only one.
In a similar way, a relational table--and the database engine behind it--can store, index, or construct joins upon only those data records which correspond to the schema of the table. While it is possible to use SQL or proprietary database tools to rewrite an existing table to a different schema, that is substantially different from submitting to a database engine, as an entry to a particular table, a single record which follows a unique schema of its own.
In the terms of both filesystem and relational table, an XML document is effectively a BLOB, in that its specifically XML structure is outside the ability of either to discern or to make any use of. Just as, for example, with audio or video content more commonly recognized as BLOBs, the filesystem or relational database engine is obliged to invoke a particular, content-specific processor in order to understand, and then to implement, the structure conveyed by markup in every XML document. Yet this need for pre-defined, content-specific handlers obviates the benefits of XML as a general solution. Indeed, it is not really XML at all if the markup possibilities are circumscribed by the need to conform to what a pre-defined handler can implement.
XML, by definition, is freely extensible. This fundamental characteristic trumps any hoped-for convenience in processing to be achieved by defining 'standard' tagsets, industry-wide 'domain' procedures, or normative namespace references. That this essential capability of XML is irreconcilably mismatched to conventional filesystems and relational databases means that if we are building true XML tools we are obliged to create new equivalents of the filesystem and the database which do conform to the extensible nature of XML. 'Internally' extensibility means that the structural definition of existing XML documents may be altered at any time by indicating, in a document instance, new subelements of the elements previously defined or, occasionally, consolidating--and eliminating--previously defined elements in favor of more general ones. This is not simple re-arrangement of the elements of an XML
document, but a fundamental re-definition of its structure. 'Externally' the extensibility of XML means that documents, arriving from any number of (not necessarily well-known) sources, may claim recognition by our XML database engine and expect, for example, to be accepted as input data, solely because the document root element has a tag which matches one defined in our system. Of course, below that apparently familiar root element may lie subelements whose type we have not seen before, or which are structured in a different hierarchy than we expect, or whose tag names are unfamiliar variants of what we use 'internally'.
A true XML database engine must inherently and efficiently handle the demands of both this internal and external extensibility. Effectively this means that the data schema must (potentially) be rewritten with every new 'record' accepted, or altered, in the database. That is, if we posit that those 'records' are XML documents then, as XML documents, they may be marked up at any time to a finer (or coarser) elemental granularity, and a true XML database engine must respond by reading, writing, querying, and generally processing them in sync with the markup. In the case of 'external' itemseffectively data entry submitted to the XML databasethe database engine must identify the schema with the data source. That is, it must understand that the markup of items originating from one source may be aliases of the markup in documents from another source and, again, may present a finer or coarser
elemental granularity than analogous documents from a different source.
What is missing in this, of course, is the traditional role of the DTD for validation. It is omitted because XML 1.0 defines two very different markup and processing disciplines, distinguished by whether there is a DTD, and in order to build XML tools it is necessary to choose which of these definitions we are following. XML is routinely introduced as both of its very different selves. Newcomers are usually first lured in with the promise of unlimited markup: define your own tags which exactly suit your unique situation. Only after they have bitten for that bait are they told about the limitations imposed by the DTD. Yet the fact is that XML 1.0 defines one XML in which the DTD is omitted, and a simple and logical projection of that definition leads to an XML where markup is freely extensible and the data schema is what the sum of the markup in the system at any moment implies.
Respectfully,
Walter Perry
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list