Public Identifiers

W. Eliot Kimber eliot at dns.isogen.com
Thu Sep 24 09:28:20 BST 1998


At 11:56 AM 9/23/98 -0400, Deborah Aleyne Lapeyre wrote:
>>At 98/09/18 06:49 -0500, W. Eliot Kimber wrote:
>>>In hindsight, it's clear to me that we never should have allowed public IDs
>>>in XML.
>
>As Ken Holman has said, this was a long and bloody discussion.  But to sum
>up ONE of the critical points "in-favor-of" fpis:
>
>I need to get information to London, and you are taking away my quill pen
>and paper and giving me a radio, but telling me there will be no radio
>broadcasts until next year.

I don't understand this statement. FPIs don't do anything that can't be
done in the context of "system" identifiers. In particular, the SOCAT
mechanism, which you would presumably use to define the mapping for FPIs,
will work just as well for this "system" identifier: "urn:is9070fpi:+//IDN
drmacro.com//DOCUMENT some doc//EN".

All that removing the distinction between PUBLIC and SYSTEM in entity
declarations does is remove the possiblity of ambiguous redundancy of two
identifiers for the same resource (remember that ISO 8879 doesn't define
which identifier takes precedence).  It doesn't remove the ability to use
names that have the characteristics of formal public identifiers.

The problem with SGML and XML with respect to public identifiers is that it
distinguishes the kind of pointer as a function of the referencing syntax,
not that it also provides a form of managed, explicitly system-independent
name.

If XML required *formal* public identifiers, then the argument that they
are useful would be more compelling because at least you'd know that the
value following a PUBLIC keyword conformed to some known and useful rules.
However, XML doesn't require formal public IDs, so you could put any valid
literal following the PUBLIC keyword, including normal URLs. Given that,
there's no useful difference between putting a resource identifier in the
"PUBLIC" slot or the "SYSTEM" slot, except that if you use the PUBLIC slot,
you'll still have to specify a value for the "system" identifier, which is
silly. [It's also silly for XML to not be consistent with notations, but
nobody listened to me on that one.]

I hear Debbie saying "public identifiers" are valuable because I know I
need to have different mappings for the same resource and SGML's public
identifier mechanism gives me that.  Reasonable enough, but it's not
compelling because you don't need public identifiers to get the result.

I hear Ken and Steve saying "*formal* public identifiers" are valuable
because they are managed name spaces that let manage my names and trust (or
at least evaluate) names I get from others in a way that is independent of
the facilities of any operating system.  I can't agree more, but as we've
seen, FPIs can be used in a URN or formal system identifier context, so
again, you can get the benefit without having to preserve the PUBLIC/SYSTEM
distinction at the entity and notation declaration level.

My observation, based in part on my own reactions in the past, is that SGML
practitioners have been using entity declarations with their PUBLIC/SYSTEM
distinction for so long that we have lost sight of what the different parts
of the system are.  We've taken particular implemenations as the definition
of what the standard and/or its intent is, which is not necessarily the
case.  I would urge everyone to revisit the wording of the standard. It is
very fuzzy.  The distinctions between PUBLIC and SYSTEM identifier are
highly semantic and subjective.  In particular, the term "system" is not
(and cannot be) crisply defined.  I could argue that public identifiers are
just as system specific as any other kind of identifier because they are
dependent on there being a system that knows how to resolve them, just that
this system may span individual computers and may have humans as necessary
components [get document, call sender of document, ask them what the
various public IDs map to, update local mapping tables].

NOTE: I am *not* arguing against well-managed, human-meaningful names. I am
*not* arguing against having lots of indirection between reference to
resource and data of resource.

All I am arguing against is the *syntax* of entity and notation
declarations that lets you specify two identifiers for a resource, that is,
the PUBLIC/SYSTEM keywords. The reason I make this argument is because
names *always* convey the name space to which they apply and it is the name
space that defines how direct or indirect its names are.  [Note: the name
space may be implicit in the processing context for the document and not
explicit in the syntax of the identifier itself, e.g., URLs used in a
Web-access context.]

Thus, given a syntax for fully qualifying names, there is no need for the
PUBLIC/SYSTEM distinction to be made outside the context of the name
specification itself.  We have at least two standardized schemes for fully
qualifying names: formal system identifiers (ISO/IEC 10744:1997, Annex A.6)
and URNs.

While the distinction SGML made was well intentioned and a reasonable
approach at the time, given that neither formal system identifiers nor URNs
had been invented, I still contend that there was no excluse for carrying
that mistake into XML.  The argument that "there is no URN resolution
facility" is incorrect. Certainly with respect to people who are today
using and want to continue using formal public identifiers it is not true
because implemented SOCAT-based systems can remap system identifiers and
can therefore remap system identifiers that are URNs, and in particular,
system identifiers that are FPI URNs.

Remember that SGML effectively requires that all general SGML processors
provide customizable entity managers. It is certainly the case for all the
SGML tools I use that the entity manager can be customized with more or
less effort, so that even if these tools don't support the latest SOCAT
specification (which, for example, ADEPT*Editor 7.x does not), you can
still modify them to resolve URNs of any sort (or formal system identifiers
of any sort, if you're not constrained by XML's URI-only requirement).

Because XML allows URIs and because public IDs can be used as URNs, the
argument that XML needed the PUBLIC keyword in order to allow the use of
FPIs is clearly bogus.  The most you can complain about is the need to
escape characters in URNs, which I grant is ugly, but not so ugly as to
compel the inclusion of the PUBLIC keyword in XML (especially if you agree
that tools should be handling the escaping at transmission time).

Cheers,

E.
--
<Address HyTime=bibloc>
W. Eliot Kimber, Senior Consulting SGML Engineer
ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 75202.  214.953.0004
www.isogen.com
</Address>

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list