Identity

Hunter, David dhunter at Mobility.com
Wed Jun 23 18:26:22 BST 1999


Lars Marius Garshol [mailto:larsga at ifi.uio.no] writes:
> | In another post on this thread, Lars Marius Garshol asked if the
> | following two URLs denote the same resource:
> | 
> | <URL: http://www.stud.ifi.uio.no/~larsga/linker/XMLtools.html>
> | <URL: http://birk105.studby.uio.no/linker/XMLtools.html>
> | 
> | My question is, does it matter?  Is there a case where we need an
> | application to know or think that these two URLs are the same? 
> 
> Definitely! When people do a search for 'Free XML software' on Google
> I want them to get a result more or less like:
> 
>   <li><a 
> href="http://www.stud.ifi.uio.no/~larsga/linker/XMLtools.html">
>       Free XML software</a> (<a 
> href="http://birk105.../">alternative</a>)
> 
> and not to see these as two completely unrelated sites.

But in this case the search engine isn't treating them as "the same thing";
it is treating them as two distinct "things", which are in some way
<em>related</em>.  ("ThingB" is a mirror of "ThingA".)  Having two "things"
which are related is a much different kettle of fish then having two
"things" and trying to figure out if in fact they are the same "thing".  (If
they were, in fact, "the same", then there would be no need to have a link
to the second "thing".)

> | OTOH, are THESE two URLs the same:
> | 
> | <URL:  http://a.server.com/dir/page.asp>
> | <URL:  http://a.server.com/dir/page.asp?param1=5&param2=6>
> | 
> | This, in my [small] mind, is a much more difficult question to
> | answer, but again, is there a case where we need an application to
> | know or think that these refer to the same thing?
> 
> Sure! Lots! Some examples:
> 
>  - a server log analyzer that provides a referral report should merge
>    references from these two

But to the web server itself, i.e. a.server.com, there really would never be
such a "thing" as "page.asp?param1=5&param2=6"; there would only be a
"page.asp", and anything else is just a parameter to the one "thing".  (This
is strictly when talking about ASP; if we talk about CGI I would be in over
my head, not having dealt with it, but I have a feeling that it would be
similar:  to the web server, there would only be one [executable?] which
would be our "thing", and anything else would be parameters.)

OTOH, if we move our point of reference to an external computer somewhere,
which I guess is where I've been talking from, if it is "merging" references
from the two, then it is treating both as different "things".  (If they're
both the same "thing", then there's nothing to merge.)

>  - a search engine should know whether they are the same, just as with
>    my example above

See the point I'm about to make below...

>  - software that builds an offline copy of a web site should know
>    whether to make separate copies for these two URLs
> 
>  and so on...
> 
> And, BTW, it's by no means obvious that those two URLs really refer to
> the same thing. I'm sure you'll agree that these two URLs refer to
> different resources, for example:
> 
> <URL: 
> http://www.80s.com/cgi-bin/valley.cgi?url=http%3A%2F%2F208.206
.40.209%2Fmyfamily%2Froad.html>
<URL:
http://www.80s.com/cgi-bin/valley.cgi?url=http%3A%2F%2F207.200.30.120%2F%47o
ver%6Eor%2F%42ush.html>

> --Lars M.

Right, but this is kind of my point.  If two URLs (or URIs) are
character-for-character identical, then they're the same thing.  If they're
different <em>in any way</em>, then perhaps they should be treated as
different resources, or perhaps "different but related" resources.  i.e.

<URL:  http://a.server.com/dir/page.asp>
is the same as
<URL:  http://a.server.com/dir/page.asp>

and is different from
<URL:  http://a.server.com/dir/page2.asp>

and is different but related to
<URL:  http://a.server.com/dir/page.asp?param1=5>

(I readily admit that this may be a gross over-simplification.)

(And I heartily wish that I could remember how this discussion got started,
so that my examples could be more relevant.  Did it start with namespaces?
Or Schemas, and their use of namespaces?  Or something completely unrelated?
Even the very first "Identity" email was in reference to ANOTHER thread, so
I can't even trace it back...)

David Hunter
david.hunter at mediaserv.com
MediaServ Information Architects
http://www.MediaServ.com

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list