Public Identifiers

W. Eliot Kimber eliot at dns.isogen.com
Fri Sep 18 16:50:20 BST 1998


At 03:05 PM 9/18/98 +0100, Michael Kay wrote:

>I think this response is referring to SGML rather than XML.
>There is no SGML declaration in XML. There is no normative
>link between XML and SGML, and therefore no normative link
>between XML Public Identifiers and SGML FPIs.

This is not true. From the published recommendation (italics mine):

1. Introduction

Extensible Markup Language, abbreviated XML, describes a class of data
objects called XML documents and partially describes the behavior of
computer programs which process them. *XML is an application profile or
restricted form of SGML, the Standard Generalized Markup Language [ISO
8879]. By construction, XML           documents are conforming SGML
documents.* 

>XML does not require a Public Identifier to be either public
>or an identifier; you can put anything in there that you
>like, and it has no defined meaning. Tim Bray's annotated
>XMl spec (on www.xml.com) has this to say:

Here I agree. Public identifiers are, conceptually, the same as URNs, that
is, they are names that are intended to be indirected to their actual
system ID, rather than being direct references to storage locations, as
URLs normally are. However, as Dan Connoly and Tim B-L have argued, there's
no *functional* difference between a URN and URL because persistence is
always a function of the owner of the resource and cannot be guaranteed
simply by the choice of name. Thus, at most, the URN/URL or public
ID/system ID distinction can only express *intent*, it cannot guarantee
results.

It is a fact of life that any storage addressing scheme (or, in fact, any
addressing scheme at all) must include some notion of indirection. Both
SGML and HTTP do this and *neither* define the mechanism by which the
indirection is implemented or managed. In SGML, there is a requirement that
entity managers provide some mechanism for resolving public IDs to system
IDs, but ISO 8879 does not define a mechanism. Likewise, HTTP provides a
mechanism by which a server can report that a URL has been redirected (the
300-series messages) but doesn't define the mechanism by which a server
actually manages the redirection itself.

Thus, the unavoidable conclusion is that system IDs can be just as
indirect, and just as persistent, as so-called "public" IDs.  The only real
difference is what bit of software gets the value of the ID to resolve.
There is a useful notion of "published" names, that is names that the
resource owner or name owner (they may not be the same entity) assert will
be persistent, but there is no standard or even convention for making that
assertion.  The original idea in SGML was that public IDs would be used for
the names of "published" things, that is, resources that are available
beyond the local scope of the resource owner. However, that original intent
got lost in the more immediate need for general name indirection that
public IDs provided (because SGML systems are required to provide some sort
of mechanism). 

My conclusion at this point is that the URN/public ID distinction is not
helpful because it merely confuses the issue without actually solving any
problems. The only thing public IDs did was force vendors to provide *a
way* to do name indirection, which you do need on brain-dead operating
systems that lack something like symbolic links (which includes both VM/CMS
and DOS/Windows). If operating-system filename indirection was a universal
service, you'd just use that to manage redirection of entity storage IDs.
At the time SGML was developed, it certainly wasn't universal and it may
not have even been known outside of Bell Labs (I don't remember precisely
when Unix went public).

In hindsight, it's clear to me that we never should have allowed public IDs
in XML.  Oh well.

This is not to say that the URN idea is totally useless--it's very useful
to have a syntax for saying what name space a particular name is unique
within, which is really what URNs do.  However, I do have a problem with
putting all of that information in a single string--it too severely limits
your choice of syntaxes.  I would much rather have some sort of name
structure, such as:

<urn:address id="local-id-for-remote-resource">
<urn:name-domain>ISBN</urn:name-domain>
<urn:name>ISBN 0-1233456-123-0</urn:name>
</urn:address>

The "name-domain" element names the domain of names in which the name is
unique (e.g., ISBN numbers in this example). The "name" element holds the
name itself. By using element content rather than an attribute, there are
no syntactic restrictions on the name (it could even have structuring
subelements).  You could also combine names together to form larger,
multi-part addresses, if necessary.

Now I can refer to any resource in any name space regardless of the syntax
the name-space uses for its names. Of course, there is still a problem with
naming the name spaces, but that can be solved either by providing a
general "name space registration service" ala DNS or by simply defining in
the relevant standards what the naming authories are (as ISO 9070
does--9070 being the standard that defines the rules for SGML public
identifiers). [Note that I don't use the term "naming authority"--the same
name space may recognize several naming authorities, as is the case for
SGML public IDs.]

Remember: there's no magic to URLs or URNs--they're just identifiers that
some piece of software has to map to bytes at some point.  The only real
question is "is the pointer to the bytes also meaningful to humans or is it
only for machines?" URLs are intended to be "opaque", meaning that there is
no reliable intelligence in them. URNs are intended to be "meaningful" such
that a human observer might have some clue as to what the resource is at
the other end of it.  This is a useful distinction but it doesn't require
making the distinction at the point of reference (e.g., the PUBLIC/SYSTEM
distinction SGML and XML make). It is sufficient to have the distinction be
inherent in the form of address you're using, which means you need a way to
declare what the form is, which is what my example above does.

Cheers,

E.
--
<Address HyTime=bibloc>
W. Eliot Kimber, Senior Consulting SGML Engineer
ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 75202.  214.953.0004
www.isogen.com
</Address>

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list