Web Resource Identity
Henrik Frystyk Nielsen
frystyk at w3.org
Thu Jun 24 22:19:43 BST 1999
ps: I just realized that my responses to the resource identity thread was
rejected because I wasn't subscribed to the list - for the sake of
completeness, here it is again.
pps: I would like to point people at the W3C Web Characterization Activity
 mailing list  which is the intended place to discuss the Web Terms
Working Draft . You must be subscribed to the list in order to post -
follow the directions at  or subscribe directly by following .
 mailto:www-wca-request at w3.org?subject=subscribe
At 10:01 28/05/1999 -0500, Paul Prescod wrote:
>It is encouraging because it is long needed.
Great to hear!
>It is disturbing because I
>believe it identifies a key problem with the Web (or with my understanding
>of the Web).
>This document refers to the URI specification in its definition of
>"resource": "...anything that has identity." This is troubling because
>there is no definition of identity. In the HyTime and object oriented
>worlds, I believe that the defining characteristic of things with identity
>is that you can take two references and determine if they refer to the
>I do not see how to do this on the Web. Consider the following URLs:
>Do they refer to the same resource? Let's try the answer both ways:
The only way these resources can at all be considered to be related is if
there is an explicit relationship that describes their exact relationship
(in this case what exactly "same" means). In most servers, this is done in
a global config file and is known by the publisher but not by anybody else.
Metadata can be used to describe these relationships in a way that is
accessible to parties (not necessarily the whole world) outside the local
server serving the resources.
>I believe that the Web needs a concept of a canonical URL, if it doesn't
>already have one. Retrieving a document or the HEAD for the document
>should describe the canonical URL. I wouldn't mind if the canonical URL
>was a totally unreadable UUID as long as I can take two URLs and figure
>out whether they refer to two things that happen to have the same content
>or actually refer to the SAME THING.
I think it is important to realize that the canonical (or generic) URI
doesn't have to be linked to the syntax of the URI - it is a question of
how the resource it identifies relates to the rest of the world. In your
example above, it could be either of the names that is considered the
Note, btw, that the last two examples are equivalent at a syntactic level.
This is of course also has to do with trust - which would you most likely
trust to provide the authoritative W3C host page URI among these URIs
(where 'none' is a valid answer):
Without a mechanism for identifying who the authoritative publisher is, it
is hard to talk about a generic URI. This is the reason why we in the WCA
terminology draft  define a publisher as "The principal responsible for
the publication of a given resource and for the mapping between the
resource and any of its resource manifestations".
So, in summary, I would argue that the concept of a generic URI is useful
but that it isn't related to syntax but rather relationships and trust.
Henrik Frystyk Nielsen,
World Wide Web Consortium
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev