All this buisiness about namespace URNs...

Fri Jun 4 19:32:39 BST 1999

I see how Paul's proposal meets the "forever unique" requirement, but how is it usefull for retrieving the namespace information?

Reguarding the collisions, I for one would be willing to live with either 1) a "forever unique" ID or 2) taking my chances with the odds on a more descriptive ID. 

Finnally, I think that you missunderstood me on my third question.  I never suggested making the URL part of the document.  I am proposing that it be part of the ID.   For example xmlns="xxxxxxxxx:http://www.novell.com/myproject/mynamespace"  where xxxxxxx is a "forever unique" identifier and the rest is an indication of where it can be retrieved.  The point is that the second part can be used to retreive, while the unique identifier can be used to identify and possibly perform a search in case the information is moved.  It would be replacable/ignorable without breaking the essence of the identifier. 

>>> Steve Dahl <sdahl at goshawk.com> 06/03/99 04:20PM >>>
Kent Sievers wrote:

> It amuses me to see how far off topic my posting got.  And so quickly.  With everyone fixating on a poorly worded clause of mine, I though I better try again, to see if I could improve on my rambling.  This time, I will pose it as 3 simplified questions:
>
> 1) If (as has ben debated on this list) we are worried about URLs because they are not unique and change over time and do not have "ownership" without adding the HTTP protocol, then why don't we invent an ID for name spaces that   IS   globally unique?

Paul Prescod's proposal for identifiers of the form    "urn:urn-22:19990603:xml-dev at ic.ac.uk:e-mail"

..would be exactly the sort of ID you're talking about. Of course, we won't know whether it's "urn-22" or "urn-7" or "urn-1003" until the IETF assigns him a number. And I'm not sure whether I've used the date format correctly, but this is an example of his proposal, at least in the general structure. So it's been invented, but we need to wait for the IETF to ratify it.

> 2) Are we really that worried about collisions?  If I just used "mynamespace.myproject.Novell.com" as a name space ID, what are the odds that I will ever see any accidental collisions?  And if someone intends to collide with me, then what recourse would I have with any other identifier?

In your lifetime, maybe you'll never see a collision. Part of this issue is protecting not just ourselves, but our heirs, so to speak. We're trying to avoid the next Y2K problem.

Let's hypothesize that you use the above namespace ID. Further, assume that for some reason which you have not yet anticipated, someone wants to archive your XML files for long periods of time (maybe 100 years). Assume that 30 years from now, Novell (by acquisition or merger) changes its name, and sells "novell.com". Another company buys that DNS name, and starts creating namespace names. There is some remote probability that by accident, they could accidentally collide with your namespace name.

Here's a list of obvious objections, and my responses.

"My data / namespace will never last 100 years." Maybe, but this is the same mentality that led programmers to use 2 digits for the year number, and they've been proved wrong in a grand way. After the scare created by Y2K, I suspect a lot of people are going to pay a lot more attention to long-term issues, especially if they're not hard to account for.

"The odds of an accidental collision are very remote." Yes, it's very unlikely that your specific namespace will encounter a collision. But assume that over time billions or trillions of names will be generated--some of these will identify XML namespaces, but other resources will get named as well, and if we want to use the URN to retrieve documentation about the namespace, we don't want to collide with the URN for some other resource.

So we imagine trillions of names generated over hundreds of years. While the probability of a given name being used twice is extremely low, the probability that *some* name will be duplicated becomes quite high, when you have that big a data set. To make a comparison that's not too far off the mark, imagine that through poor planning, some reserch center released a disease that cannot be controlled, and where the odds that I will be killed by this on any given day are 1:100,000,000. I personally would not be very afraid that I might
catch it. But I think most people would be infuriated if every day 60 people world-wide (including 3 Americans) were killed due to poor planning by this center. It's a contrived example that ignores how diseases spread, so don't pick too hard at the details, but it points up how small probabilities become big ones when the dice are rolled often enough.

"And if someone intends to collide with me, then what recourse would I have with any other identifier?" No recourse at all--no naming scheme can prevent collisions if one of the parties is malicious. This is all about avoiding collisions between people who honestly don't want collisions. If everyone follows the rules, how can we *guarantee* no collisions, without depending on a central naming authority, or on (as someone suggested) the serial numbers of specialized hardware.

> 3) Why isn't our name space identifier two parts:  1) the name spaces name/ID that uniquely identifies it, and 2) the last known, or probable location of the information about the name space.  The advantages of this are: a) I can still uniquely identify the name space no matter where it is moved, b) I can retrieve information about the name space and c) if that information is moved, there may be some hope of searching for it and updating it's location.  Think of it as an ID/URL combination with the URL part optional, but a true URL.

Because, if we have some server that knows how to map the namespace ID to an URL, why should we embed the same URL in the document. If we do a good job of choosing the URN (the namespace ID) so that it's permanently unique, we can guarantee that it will never be out of date. But we know there's a pretty good chance that the URL will change over time--it would be nice if the document didn't contain out-of-date URLs. By letting some server hold the mapping from URN to URL, and by updating that server when the URL changes, we make sure
that any software would always be able to find, not the last known, but the *current* URL of the documentation.

Assuming that such a server exists, it's a much more robust mechanism for finding information about the namespace. But the lack of such a server shouldn't necessarily keep us from using URNs. In fact, the more we use URNs, the more push there will be to create such servers. And if you need to be able to retrieve documentation *right now*, and not wait until this magical server exists, then you should use an URL rather than an URN as your namespace ID.

--
- Steve Dahl
sdahl at goshawk.com 

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)