Linking and Query question

W. Eliot Kimber eliot at isogen.com
Tue Aug 5 20:10:29 BST 1997


At 09:54 AM 8/5/97 -0700, Andrew Layman wrote:
>Can I create a link that, in effect, contains a query so that it
>references one document among a set? For example, if I know that several
>versions of a document exist, and I want to reference the latest
>version, but I'm willing to accept either of the two prior versions, can
>I express that?  If so, how?  Thanks.

If the reference to a document is via an entity reference, the query can be
part of the system ID for the document.  As system IDs in XML are always
URLs, if you have a way of expressing the query in an URL, you can do it
that way.  If not, then the short answer is "no" (unless there's some
aspect of URLs or TEI extended pointers I've overlooked, which it quite
possible).

In a general SGML system, there are three basic approaches:

1. Define your own application-specific addressing syntax and semantics and
use it, hoping tools will support it or providing your own support (because
the scope of use is totally within your control).

2. Use Formal System Identifiers and make the query part of an entity's
system ID.

3. Use query addressing and make the query part of a direct or indirect
address  (that does not use a declared entity).

The only difference between these three approaches is that two and three
are done within the framework of standardized definitional mechanisms
defined by ISO/IEC 10744:1997 while one is not.  In all three cases you
still have to implement support for the query and provide the necessary
integration with the tools you're using (browser to repository, editor to
repository, etc.).

The Formal System Identifier Definition Requirements (FSIDR) facility of
ISO/IEC 10744:1997 (Annex A.6, reviewable at
http://www.drmacro.com/hythtml/clause-A.6.html) provides a syntax for
associating repository-specific attributes with system IDs.

For example, say you have a repository with a "version" property for
storage objects.  You can refer to this property by declaring the
repository as a "storage manager" and providing an attribute (or
attributes) for specifying the version you want, something like this:

<!-- Use "FSISM" PI to identify the names of storage manager notations: -->
<?IS10744 FSISM MyDocManager>

<!-- Declare notation for storage manager.  Serves to provide local name
     for repository so generic system can call repository's API or 
     human observer can tell what the repository is. -->
<!NOTATION MyDocManager PUBLIC "-//ME//NOTATION FSISM My Document
Manager//EN" >

<!-- Declare attributes for passing parameters to the repository: -->
<!ATTLIST #NOTATION MyDocManager
   version -- The required version.  Syntax is "([<>][=]?)?[0-9]+(\.[0-9]+)?"
              Prefixes for version number:
              <    Anything less than specified version
              >    Anything greater than specified version
              <=   Anything less than or equal to specified version
              >=   Anything greather than or equal to specified version 
              If no prefix specified, only specified version is used.
           --
     CDATA #IMPLIED  -- Default: latest version --
>

Obviously, these declarations can be provided by the storage manager
provider and used by reference from documents--you wouldn't expect authors
to type these things themselves (or even necessarily be aware of their
presence or use).

You then invoke the storage manager by treating the notation name as an
element type name within the system ID:

<!ENTITY A-Doc SYSTEM "<MyDocManager version='>1.2'>mydoc.xml" CDATA SGML >

As the semantics of the tags within a system ID are well defined by the
FSIDR, it is probably reasonable for XML systems to treat the tag name as a
repository notation name even when the formal declarations are not present.
 If the storage manager name is well understood (e.g., "URL"), there's no
problem.  It's probably also reasonable to assume that storage manager
names are generally unique and therefore processing can be associated with
the names directly (rather than by requiring a notation declaration with a
public ID).  This is analogous to being able to map entities by entity name
within an SGML Open catalog.

A processor would provide a way to associate the storage manager notation
MyDocManager with that storage manager's API (i.e., the integrator of the
storage manager would register a DLL or DLL entry point with the notation's
public identifier).  The processor would then pass the value of the version
attribute and the data following the MyDocManager start tag to the API.

If you're not addressing the document as an entity but using some other
query, I don't think XML Link provides a way to do this (because it doesn't
generalize the notion of addressing by query).

The HyTime architecture does generalize addressing by query such that you
can declare a query notation with whatever semantics you want and then use
that query.  The only requirement is that the result of the query be a list
of nodes in groves.  In DOM terms this would mean you get back objects
conforming to the DOM model, rather than the unparsed data of the document
addressed. (All addressing is in terms of the results of parsing, not the
unparsed source.)

For example, to create and use such a query, you could do something like this:

<!-- Declare a notation for my query.  The syntax and semantics of 
     this query are presumably documented somewhere.  The public ID
     of the notation should get an observer to this documentation. -->
<!NOTATION MyDocQuery  PUBLIC "-//ME//NOTATION My Document Query//EN" >

<-- Now declare an element type that uses this query notation for 
    addressing: -->

<!ELEMENT DocLink  -- A hyperlink to another document using a query --
  - - (#PCDATA) -- Content is title of document linked to --
>
<!ATTLIST DocLink
    document CDATA #REQUIRED -- Contains query of document to link to --
    loctype  CDATA #FIXED "document QUERYLOC MyDocQuery" 
      -- Associate referential 'document' attribute with query
         notation 'MyDocQuery' (uses "reference location address" facility)
      --
    HyTime   NAME  #FIXED "hylink" -- This is a HyTime hyperlink --
    anchrole CDATA #FIXED "refmark document"
      -- Roles of the anchors of this link.  DocLink element is reference
         mark. --
    anchcstr CDATA #FIXED "self required" 
      -- Indicate that the first anchor role (refmark) is a "self anchor",
         that is played by the link element itself. --
>

...
<p>See <doclink document="mydoc.xml[version 1.2+]">My document</doclink>...

A HyTime aware processor interprets the above as follows:

1. Sees that Doclink is a hyperlink.  Looks for the required (by HyTime)
"anchrole" attribute, from which it will determine the names of the
attributes used to address the anchors (they are the same as the anchor
role names).

2. Sees that "refmark" is a self anchor, so no addressing attribute is
needed for it.  Sees that second role is "document".  Looks for attribute
named "document".

3. Finds attribute named "document".  Looks for attribute named "loctype"
(location type) to see if a location type has been associated with this
attribute (without location type, the HyTime engine has no way of knowing
what form of addressing is being used [unless the attribute is declared as
IDREF(s) or ENTITY/ENTITIES]).

4. Finds a loctype attribute and sees that the document attribute is a
query location that uses the notation named "MyDocQuery"

5. Looks to see if a notation named MyDocQuery has been declared.  It has.

6. Passes the value of the document attribute to the MyDocQuery API (again,
registered using whatever integration API the browser provides).  The
processor (my document manager in this case), interprets the query and
provides a response.

7. Waits until it gets a response, which had better be a list of objects in
an object model it understands (e.g, grove nodes, DOM objects, etc.).

8. Assuming it gets a response, enables traversal to the returned objects.

XML Link removes the need for the above general processing by providing a
fixed set of query notations that XML Link recognizes (URLs and TEI
extended pointers).  However, this limits your ability to do things these
two query notations don't provide for.  Note also that the XML Link
specification can be defined in terms of the HyTime generalizations such
that any general-purpose HyTime engine can process XML Link documents (and
you would expect HyTime engines to have built-in support for XML Link so
that there would be no additional integration required to process XML Link
documents).

The HyTime mechanism has no "magic"--it just provides a framework within
which the integration you'd have to do in any case can be done.  It simply
provides a way to name things (queries, storage managers) with
universally-unique names (public IDs) and associate these universal names
with local names (notation names).  This framework standardizes the formal
declaration of what you're doing and (hopefully) makes the integration
mechanism consistent across tools, which shoudl make integration easier.
It doesn't remove the need for tools to be plugged together by humans
(either directly or through the definition of API standards like the DOM or
CORBA or ODBC).

Cheers,

Eliot


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)




More information about the Xml-dev mailing list