Topic Maps on SQL

W. Eliot Kimber eliot at dns.isogen.com
Wed Nov 25 00:45:42 GMT 1998


At 03:00 PM 11/24/98 -0800, Tim Bray wrote:
>At 10:40 PM 11/21/98 -0600, len bullard wrote:
>>Steve brings up the point that I do wish would be looked at 
>>seriously by other language communities:  the potential of 
>>using property set/grove concepts to create information 
>>standards that are independent of lexical/syntax representation 
>>and implementation.  
>
>You know, this goes straight to the core of a deep issue.  Where
>I have often felt out of sync with the grove/property-set evangelists
>is that I perceive syntax as fundamental and any particular
>data model as ephemeral.  I guess I must admit that I don't believe in 
>"information standards that are independent of lexical/syntax
>representation".

I have said at various times two statements that appear to be contraditory:

1. Syntax standards like SGML and XML are critical to our lives and
   livelihoods
2. Syntax is unimportant

But these statements are contradictory *only if* you consider them in the
same context.  They are not intended to be so considered.

Syntax standards are critical because they provide a reliable mechanism for
data storage and interchange that is both easy to verify and easy to use,
for all the reasons Tim says.  So Tim's belief and confidence in syntax is
both well founded and reasonable.  Without 8879 and XML, we can't ever be
sure if what we have is both correct and complete.  With them, we can at
least tell if what we have is syntactically correct, even if we can't
verify semantic correctness.  Even if we're operating in the
abstractosphere, we can always drop down to syntacta firma to check our work.

But...

Once you enter the realm of semantic processing, that is, applying business
logic to the manipulation of information and knowledge, you are working in
the abstract domain where syntax becomes irrelevant. This is the domain in
which processing standards like HyTime, DSSSL, Xlink, and XSL operate.
Except for SED and AWK, computers do not operate on syntax, they operate on
abstractions derived from syntax.  *As soon as* you parse a document into
elements, attributes, and character data, it is an abstraction. 

To operate on an abstraction reliably over time, you must either own its
definition (because you write the code) or the abstraction must itself be
standardized. When you write a one-off program that takes syntax as input
and does something more than string substitution, you are creating a
private abstraction in your program and then operating on it.  Nothing
wrong with that, but it's important to remember that *there is an
abstraction there*.

If you want to have interchangable and predictable processing on things
like XML documents, there must be an accepted standard for the abstraction
that the processing operates on. Without that, there can be no hope of
predictable behavior because the data is too complex and there are too many
arbitrary choices that can be made about its abstract representation.  In
addition, different types of processing will need different views of the
same fundamental abstraction, so you'll need ways to configure the
abstraction formally.

Both the STEP standard and the grove standard attempt to provide a
standardized framework for defining standardized abstractions.  They were
both independently designed to meet the same requirements: different
software operating predictably on the same data sets.

While I have no great faith in standardized APIs, I do have faith in
standardized data models.  But given a standardized data model, it's much
more likely that a standardized, or at least conventionalized, API for that
data model will appear. And even if it doesn't, it's much easier to adapt
code to different API views of the same data model then it is to different
data models. Thus, even though, for example, SP, GroveMinder, and the DOM
all provide different APIs to access XML groves, it's easy for me to map
all of them into PHyLIS' grove API because the underlying data model is the
same.  My life would be even easier if they all used the same API, but not
*that much* easier, because the cost of managing the API mappings relative
to the total cost of the system is small.

Thus, I think that there is lots value in standardized data models, even in
the absence of standardized APIs.  I think the DOM is useful and good, but
it's not sufficient because it represents a particular set of optimization
and API design choices. By definition, it can never be complete or
completely satisfactory for all tasks for which we might need an API to XML
documents.  So we should *never* expect to have 100% standardization of
APIs even when we do have standard data models.

And any grove or subgrove can always be viewed as a string simply by
providing the appropriate iterator over the nodes, so the syntax is never
far away.

And I should point out that part of the grove standard is a canonical
string representation of a grove (the "canonical grove representation (cgr)
document")  for express purpose of allowing simple string-based comparison
of groves (while you could, in theory, use it for processing, the result is
so verbose as to be silly for that purpose).

Cheers,

E.
--
<Address HyTime=bibloc>
W. Eliot Kimber, Senior Consulting SGML Engineer
ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 75202.  214.953.0004
www.isogen.com
</Address>

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list