Public Identifiers

Mon Sep 21 06:20:21 BST 1998

At 10:54 AM 9/20/98 -0500, Paul Prescod wrote:

>If I may paraphrase: "FPIs only provided
>a reliable way to interchange SGML data between heterogenous systems for
>the last 15 years, and will continue to for the next 5 that it takes
>symbolic linking to become popular on Microsoft platforms." To me, the
>word "only" is out of place in such a statement.

I would argue that it has not in fact ever been generally possible to
interchange SGML data among heterogeneous systems in the way I think you
mean.  If you send me a package of SGML entities, it is up to me, the
receiver, to make sense of them, including reworking any entity
declarations and/or public identifier mappings that may be necessary. I
spent the last years of my tenure at IBM transferring SGML documents
between OS/2, DOS, and VM/CMS systems, and it was non-trivial to manage.
To make it work, I had to set up what was essentially a homogenous system.
The only hope SGML ever had for interchange was the SDIF mechanism, which
defines a standard for packaging entities together so that automatic
processes can unpack them on the target system--but even there the standard
assumes rewriting of entity declarations to update system identifiers.
Unfortunately, nobody has ever fully implemented SDIF in a
publicly-available system (even though it shouldn't be that hard to do and
would be mighty useful if done).  [NOTE: ISO 9070 *DOES NOT* require the
use of ASN1. Do not be fooled. You can use any mechanism you want for
representing the package, even tar or Zip.]

The idea that you might be able to interchange documents that refer to
public entities (that is, entities that are somehow publicly available) is
a nice one, but without a generally-available networking system for
accessing those entities, it's only an idea.  Today, only URLs come close
to providing a useful way to name truly-public entities.  Which is not to
say that URLs are the best choice, just that they're our only option at the
moment.

It is the use of entity declarations that provides even a hope of
interchange for SGML, not formal public identifiers.  Public identifiers
(formal or not, doesn't matter) help by giving you the option of being even
more indirect but that's not requirement for deriving most of the benefit
from entity declarations (centralizing the mapping from references to
storage objects in the document instance to the storage objects
themselves--that is, avoiding "embedded filenames" in instances).  

Paul is right that if SGML hadn't required some form of indirection,
vendors never would have provided it, certainly not with the level of
consistency we have today with SOCATS. But even there, I don't have a
complete solution, because not all useful tools support SOCATs and not all
support the latest version (ADEPT*Editor, for example, only supports the
first version of the SOCAT spec, while SP supports the second).

Steve Newcomb asks if there is a difference between FPIs and URNs generally
and the answer from John Cowan was, correctly, "no, there's no difference".
 Public identifiers, and formal public identifiers in particular, are just
a special case of URN. They have no unique properties beyond those of URNs
generally (except, see below). They are not magic. They do nothing special
(except part you with 80 or 90 US dollars if you want to have a registered
owner name that is not an ISBN publisher prefix).  ISO 9070 does
standardize owner name registration mechanisms and there are three such
currently implemented: ISBN numbers, ISO registered owner names
(administered by the GCA, see <www.gca.org>), and Internet domain names
(with TC 2 to ISO 8879).  This has value because it does provide a pretty
solid infrastructure for management of name ownership, one of the
requirements for URNs generally.

That said, I must admit that Paul's arguments, along with things others
have said, have made me rethink my original statement that there's no
useful distinction between URLs and URNs (but see below).  It is still true
that URLs can be just as persistent as URNs. However, it is also the case
that we need a formal mechanism for associating names with name spaces,
which is what URNs do. URLs have a built-in name space, namely the universe
of resources on the Web (which is tautologically defined by those things
you can address by URL, but no matter).

In other words, we need to be able to say where to go to look up a name. It
doesn't matter how direct or indirect that lookup is.  Indirection isn't
the issue (because we always have some amount of it, regardless of the
addressing scheme--even a phone number is an indirection even though we
tend to think of it as a direct address).  It is always up to the machine
doing the resolution of names in a particular space to provide appropriate
optimizations--we shouldn't care what they might be when we specify a
pointer to something.

Thus, the concept of URN as a binding of name-space name to name is very
useful, in fact, essential.  Because we need to be able to point to things
that exist in different universes (as Steve wants to do in his Topic Map
example) and we want different ways of naming things (FPI, ISBN number,
URL, etc.).

But...

I think that several errors of design have been made getting here:

1. The expectation a name engenders as to its persistence is a function of
the name, not its use. Therefore, the PUBLIC/SYSTEM distinction made by
SGML (and XML) is inappropriate as a matter of syntax.  A name is a name
and there should be exactly one declared for each entity.  Within an SGML
context, the formal system identifier mechanism (Annex A.6 of ISO/IEC
10744, see
<ftp://ftp.ornl.gov/pub/sgml/wg8/document/n1920/html/clause-A.6.html>)
could be used to distinguish formal public IDs from other forms of name, e.g.:

<!-- Declare notations that represent storage managers: -->
<?IS10744 FSIDR IS9070>

<!-- Declare a storage manager, in this case, formal public identifiers: -->
<!NOTATION IS9070 SYSTEM "ISO 9070//DOCUMENT ...//EN" >

<!-- Now declare an entity that uses that storage manager: -->
<!ENTITY foo SYSTEM "<is9070>+//IDN drmacro.com//..." NDATA SGML >

2. URLs are a special case of URN. Thus the term URI, meaning "URN or URL"
is unnecessary and misleading. There are only URNs, of which URL is a
special case where the prefix "urn:url:" can be omitted.  URLs can be
recognized because they don't start with "urn:", which all other URNs must.
 URLs are really an optimization of URN where the name space resolver is
already known and all Web browsers must know how to resolve URLs (thus
there's no need to apply the more general "look up the name space resolver"
mechanism you must use with any other form of URN).  If this design had
been used from the start on the Web, then "urn:url:http://www.drmacro.com"
would be recognized by all Web clients.  

Of course, URLs have this special status only within the context of Web
browsers and data formats that give special meaning to the syntactic things
that hold URLs (e.g., the "href" attribute of HTML).  Outside this context,
a URL would be no more privileged than anything else. In a different
context, other forms of names could be privileged (as public IDs are in an
SGML context).

Finally, note that URNs as currently defined are simply *a syntax* (of an
infinite possible number of syntaxes) for representing the binding of
name-space to name.  The formal system identifier example above is another
and my suggestion of a few days ago for a <urn:name> element is a third.
The current URN syntax is appropriate for use in HREF attributes, but it
shouldn't be seen as the one and only way to do this binding.  URN
resolution mechanisms should be independent of the syntax used for the
binding--they should simply expect two arguments, a name-space name and a
name in that name space. How the client that makes the resolution request
gets those two arguments is its business.  Particular data representations
can then define their own conventions for representing the binding, whether
it's the current URN syntax or something different.

3. We've confused the persistence of names with the persistence of
resources, which has lead us to think that URLs (and system IDs) are
somehow fundamentally different from URNs (and public IDs).  We've set the
expectation that the naming method can solve problems when in fact it
can't. The evidence that this expectation has been set is the fact that
everything I read about so-called "persistent names" has gone out of its
way to stress that names alone can't guarantee persistence. They wouldn't
have to say this if people didn't expect it to be the case.

Given that my analysis is correct, here's what I'd like to see happen:

1. A general recognition of the need for name-space/name bindings in data
representation standards, regardless of the kind of data.  If these
bindings are further standardized along the URN lines (its semantics, not
its syntax, necessarily), so much the better.

2. Given item (1), data management systems (including operating systems and
networking systems) providing generalized name-space-to-resolver services
that reflect the general approach defined by item (1).  For Internet-based
resources, the DNS proposal is probably appropriate and reasonable.

3. Web clients upgraded to accept "urn:url:" as a prefix to otherwise
normal URLs.

4. People and enterprises providing non-URL name resolution servers.  These
could be along the lines of the PURL services currently being provided (and
could probably be implemented with the existing PURL software).  For
example, Oasis could fund a couple of public identifier servers.  Note that
these services needn't be free--it costs money to maintain machines and it
would be reasonable to charge people who wanted to provide published names
for their resources a reasonable fee for it.

And now, having said that SGML formal public identifiers have no special
properties, let me point out that the fact that registered formal public
identifiers are registered means that you could use owner names to direct
public ID resolution to servers maintained by the name owner, rather than
relying on a central FPI resolution server (that is, "DNS for FPIs"). If I
understand the DNS-for-URN resolution proposal (which I very well may not,
not being an Internet expert by any stretch), the ability to do this is
inherent in the proposal.

If we could do these things, and none of them seem to me to be that
onerous, then we would, I think, be well on our way to realizing the dream
of "universal names" with some hope that persistence, whatever you want
that to mean, could be provided by those that care to. [As Robin Cover
pointed out in private mail to me, we will always be dependent on human
nature for these systems to work, and it is not always human nature to
provide persistence for names, at least not outside the scope of your own
Web server.]

Cheers,

Eliot
--
<Address HyTime=bibloc>
W. Eliot Kimber, Senior Consulting SGML Engineer
ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 75202.  214.953.0004
www.isogen.com
</Address>

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)