Resolving links

W. Eliot Kimber eliot at isogen.com
Sat Sep 6 01:00:55 BST 1997


At 11:35 AM 9/5/97 -0700, Andrew Cogan wrote:
>Introductory apology: I'm a newcomer to XML, so forgive me if this topic
>has already been covered.
>
>Does/will XML include a way to resolve links by using a mapping table
>external to the originating document, or alternatively by calling a
>process?
>
>In this scenario, there would either a "library manager" process, or a
>registry file containing a list of symbolic document names along with
>their physical locations. This would enable a link in document "A" to
>refer to document "B" without concern for whether document B's location
>is on a CD-ROM, a hard disk, or the Web. It would also allow document
>B's location to change over time without invalidating the link in
>document A.

This problem, that of changes in the resource pointed to requiring changes
in the documents that point to it, is one of the fundamental weaknesses of
URLs as a form of address.  You cannot have "industrial strength"
addressing without some form of indirection that lets you isolate
references in A from changes in B.  SGML provides one fundamental form of
indirect address, the entity reference, which when used with public IDs
(rather that system IDs) protects the entity declarations from changes in
the system identifiers of storage objects.  However, entities alone cannot
protect you from changes inside a storage object, so you must have some way
to indirecting references to objects inside storage objects.

The current XML Link spec does not allow entity references as a form of
resource address.  It also does not provide any other form of indirection.
However, you're not limited to using only XML Link with XML documents--you
can use anything you want, including normal SGML mechanisms and other
public addressing architectures, such as the TEI and the HyTime architecture.

Here's how you do entity-based indirection:

<?XML Version=1.0?>
<!DOCTYPE MyDoc [
  <!NOTATION XML "-//W3C//NOTATION eXtensible Markup Language//EN" >
  <!ENTITY YourDoc PUBLIC "-//You//DOCUMENT Your Document//EN" 
           CDATA XML >
  <!ELEMENT Link EMPTY >
  <!ATTLIST Link
     resource   ENTITY #REQUIRED
  >
]>
<MyDoc>
 <link resource="YourDoc"/><!-- NOTE: this isn't legal XML syntax today -->
</MyDoc>

Somewhere else, you'd have a mapping for the public ID to the system ID:

-- SGML Open catalog --
PUBLIC "-//You//DOCUMENT Your Document//EN" 
       "/home/you/docs/mydoc.xml"
-- End of catalog --

You could imagine a service analogous to DNS that would resolve public IDs
to storage IDs (or rather, would resolve owner IDs to public ID servers,
that is "-//You" would be associated with your public ID server, which then
takes the rest of the public ID and resolves it to a storage object).

XML Lang, of course, does allow you to declare ENTITY attributes, as I've
done above, it's just that XML Link does associate any particular semantic
with ENTITY attributes.  So you can do the above, but you can't depend on
systems that only support XML Lang and XML Link to help you (but any
existing SGML system should handle the above).

Both the TEI spec and the HyTime architecture provide indirect addresses
that you can use to isolate a reference from the ultimate location of the
target.  For example, using HyTime indirect addresses, you could have a
separate document that provided the mapping of persistent object names to
URLs for those objects:

<?XML Version=1.0?>
<!-- URL of this document is "http://www.me.com/docs/urlmap.xml" -->
<!DOCTYPE URL.map.Table [
 <?IS10744 ArcBase HyTime ?>
 <!ELEMENT URL.map.Table (URLloc+) >
 <!NOTATION URL PUBLIC "-//IETF//NOTATION Uniform Resource Locator//EN" >
 <!ELEMENT URLloc  (#PCDATA) > <!-- Content is a URL -->
 <!ATTLIST URLloc 
    ID     ID   #REQUIRED
    HyTime NAME #FIXED "queryloc"
    notation NOTATION (url) #FIXED "url"
 >
]>
<URL.map.Table>
<urlloc id="my.document.1">http://www.me.com/docs/mydoc1.xml</urlloc>
<urlloc id="my.document.2">http://www.me.com/docs/mydoc2.xml</urlloc>
</URL.map.Table>

You could then use the mapping by making references to the URLloc elements:

<?XML Version=1.0?>
<!DOCTYPE MyDoc [
 <?IS10744 ArcBase HyTime ?>
 <!NOTATION URL PUBLIC "-//IETF//NOTATION Uniform Resource Locator//EN" >
 <!ELEMENT Link (#PCDATA) >
 <!ATTLIST Link
     href  CDATA #REQUIRED 
     HyTime NAME #FIXED "clink"
     loctype CDATA #FIXED "href queryloc URL"
     HyNames CDATA #FIXED "linkend href"
 >
]>
<Mydoc>
<link href="http://www.me.com/docs/urlmap.xml#my.document.1">Click here</link>
</MyDoc>

The HREF in my document points to a URLloc in the URL map document, which
then gets us to the real URL, which may change at any time.

One advantage of the entity approach is that you can use different catalogs
without changing any of the documents involved (because the entity
declaration and public ID provide an additional level of indirection, which
is outside of any documents, namely in the public ID mapping catalog).

As the XML Link specification is not yet finalized, its possible that we
may include a way to address entities as resources of links and do indirect
addressing.

It should be clear from the above that the mechanism at work is pretty
simple: given a two part address (storage object and ID within that
object), use it to look up the next stage in the address (i.e., the URL in
the content of the URLloc elements).  That's all there is to it, and the
above is 100% HyTime conforming (and if you implemented the above, you
could call your system a conforming HyTime application).

Cheers,

Eliot

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)




More information about the Xml-dev mailing list