Data manipulation languages for XML (was Query Languages
...)
W. Eliot Kimber
eliot at isogen.com
Tue Nov 18 15:22:16 GMT 1997
At 08:14 AM 11/18/97 +0000, Richard Light wrote:
>If so, we need mechanisms to specify changes to existing documents. I
>don't really buy the model that says that every change to an XML
>document produces a completely new document. You will certainly have a
>hard time selling that idea to an end-user who changes one word in a
>document, or to a database vendor who has to take back the complete
>document and work out for themselves what (if anything) has changed, in
>order to update the relevant nodes.
I think you're misunderstanding my use of the term "document" and my
reference to the *abstract* processing model of DSSSL and groves, as
opposed to how an implementation might work or how a user might perceive
the result.
By "document" I mean what SGML and XML mean by document: a character string
conforming to the rules of the standard. Identity for documents is defined
by no differences in the character string. If I change one character *I
have a new document*. However, when using the term "document" to mean "an
abstraction of a container for information", which is the usual everyday
meaning of "document", then the document is not a new document, unless the
user considers it to be one.
Note the difference: I'm talking about the mechanics of data manipulation
as related to the formal definition of SGML and XML, users are thinking
about the abstractions of information creation. These are two different
domains.
For the purpose of thinking about standards for defining document
processing, it is a very useful simplification to think of every change as
creating a new *grove* (which, if used to generate an SGML or XML character
string, would result in a new SGML or XML document). Obviously, in an
implementation, you would probably not literally create an entirely new
grove, but would simply modify the one you have and, presumably, remember
the actions that transformed grove[0] to grove[1]. But that implementation
approach doesn't change the truth of the abstract model, which is that
grove[1] *is a different grove* from grove[0]. That's all I'm getting at.
>Also, in the real world you need access control (c.f. GRANT in SQL).
>The very nature of XML documents means that this control needs to be at
>the node rather than the document level, if only to deal with entities.
Not a problem. Remember that we're talking about *editing* here, which
*can only happen* on groves, which consist of nodes, which can therefore be
individually locked if your editor provides that function. There is
nothing in the definition of groves or the DSSSL expression language that
precludes node-level access control within an editor. That's an editing
issue, which is outside the scope of SGML, XML Lang, or DSSSL (as they are
only data representation languages, not editor specifications).
>Also, you need to know which parts of the document you are allowed to
>change as you start editing - it is not good enough to be told some time
>afterwards that certain changes should not have been made!
Again, not a problem as long as your editor provides some system for
associating access policies with nodes, either directly (by addressing
individual nodes) or by algorithm (e.g., elements in context). Again, this
is an editor design issue, not a data representation issue.
>I agree that you can perfectly well define changes to an XML document
>via its representation as a grove, but this grove needs to be linked
>back to the physical objects that gave rise to it. For example, if you
>edit a phrase that happens to be within an entity that is referenced
>more than once within the document you are editing, then perform an
>UPDATE, in principle _all_ references to that entity should be updated.
What's your point? A grove that includes information about the text
entities used to organize it has enough information to correlate references
to entities to their content. How could it be otherwise? A grove has to
enable *complete* representation of the original document. In a complete
grove (one that includes all the properties defined in the property set),
the original document can be recreated byte for byte because the original
document string is stored as part of the grove (using the so-called
"markup" properties).
I'm afraid I don't see how using groves as the fundamental abstraction for
editing is inconsistent with satisfaction of any of the requirements. All
that's needed on top of what DSSSL provides are functions that represent
the editing actions needed (as opposed to modeling editing as a transform,
which is probably not a useful approach). If SQL provides a useful model
for defining such functions, we should use it.
Cheers,
Eliot
--
<Address HyTime=bibloc>
W. Eliot Kimber, Senior Consulting SGML Engineer
Highland Consulting, a division of ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 95202. 214.953.0004
www.isogen.com
</Address>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list