Web Resource Identity

Thu Jun 24 22:21:39 BST 1999

At 10:59 28/05/1999 -0700, Tim Bray wrote:
>At 10:01 AM 5/28/99 -0500, Paul Prescod wrote:
>>I believe that the Web needs a concept of a canonical URL, if it doesn't
>>already have one. Retrieving a document or the HEAD for the document
>>should describe the canonical URL. I wouldn't mind if the canonical URL
>>was a totally unreadable UUID as long as I can take two URLs and figure
>>out whether they refer to two things that happen to have the same content
>>or actually refer to the SAME THING.
>
>It's pretty crystal-clear that at the moment, given the existence of 
>content negotiation, the web has no built-in concept of a canonical URI.
>While the idea has been attractive ever since Ted Nelson postulated (30 
>years ago) that in a networked environment there ought really only to be 
>one instance of each object, all the attempts that I know of to address 
>the issue of canonically naming things have shuffled down the path to 
>dusty death, either quickly or slowly.

The reason for this is IMHO that these attempts have either 1) been based
on syntactic equivalence of the name, or b) a fixed definition of what 'the
same' means.

Content negotiation is really just an attempt of defining what 'same'
means: This document is the same but in two different languages, or in two
different encodings, etc.

I think there are several major stumbling blocks why conneg hasn't been
used any more than it is: HTTP/1.0 caching didn't support it, it requires
hard-coded configuration of popular servers like Apache, and there is no
way to edit relationships remotely. Metadata can change this by making the
relationships explicit and allows for these relationships to be shipped
around over the Net.

You could then define a canonical (or generic) URI as having the identity
relationship 'this' with itself:

	http://www.w3.org/Overview.html.fr	--french-->	http://www.w3.org
	http://www.w3.org/Overview.html.da	--danish-->	http://www.w3.org
	http://www.w3.org			--this-->	http://www.w3.org

Without these relationship being explicit, there is no way that we can
compare URIs except at a purely syntactic level.

>This doesn't worry me.  As Dan Connolly will tell you until your ears
>bleed, if you are an organization that cares about persistence, uniqueness,
>and managing your web space properly, there's nothing about plain ol' URLs
>that gets in the way.

As always this is a truth with modifications - from the point of view of
not moving your files around, it is true. From the point of
changing/evolving the access mechanism, it is not. Just think about the
interactions between https: and http: - there is no graceful way to
changing access mechanisms in the Web. This is not a question of being
careful with your names, this is a bug in the infrastructure.

What we do need is a mechanism for discovering and negotiating different
ways to access resources dynamically. Note, however, that this is different
from the discussion of generic URIs.

>  Empirically, it is the case that a lot of 
>organizations who should know better are shoddy about the design of
>their web spaces in a way that, as Paul points out, is going to make
>it hard for them to take advantage of RDF.  Maybe if we're lucky, since
>URLs are the only credible thing to hang Web metadata on, and since the
>need for ubiquitous Web metadata is becoming mind-numbingly obvious, people
>will be motivated to start doing the right thing.  But we in the computing
>profession, as with all other professions, are all idiots at least some
>of the time... I am doubtful that any canonical-addressing scheme can
>combat the human propensity to screw up sometimes. -Tim

Unless metadata can be made easy enough to handle so that people don't care
about the name of the resource but only about the relationships then I
would agree. In fact, I do believe it can and that it is possible not to
have to think of a name when you want to create a new resource but instead
can think about how you want this resource to be related to the rest of the
world.

Henrik
--
Henrik Frystyk Nielsen,
World Wide Web Consortium
http://www.w3.org/People/Frystyk

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)