Rick Jelliffe ricko at
Thu Jul 9 18:30:46 BST 1998

> From:   Michael Kay
> Perhaps I'm being pedantic, but I think it's worth pointing
> out that there's no such thing as an FPI in XML. The closest
> there is is a "Public Identifier", and the only things that
> the spec says about it are (a) that certain spaces within it
> are insignificant, and (b) that the processor can try and
> convert it to a URI (but it doesn't say how).

There are various kinds of public identifiers. The thing that makes an
identifier public is generally that it relies on registration with a body,
which is why there is an owner part. (If the document is private or limited
circulation, this registration can be in-house or by arrangement between the
parties involved rather than some external convention. SGML FPIs and MIME
media-types have the simplification by having "registered" and "unregisted"
owners, where registered ownership guarantees unique naming.)

The kinds of public identifiers around are:

* ISO 9070: uses "::" to delimit a hierachy of names

* URNs: starts with urn:

* URIs: you know

* SGML FPIs: formal public identifiers start with "-//"
(unregistered=private), "+//" (registered: ISBN, or IDN for internet, or a
name registered with the designated registration authority") or "ISO" (or
"IEC" or "ISO/IEC")

* MIME media types.

There are moves to extend urn syntax to encompass MIME types--I dont know
the status, perhaps they already do.

It may be surprising that MIME media types are actually public identifiers
currently.  But the RFCs define a mechanism for allowing other "registration
trees" apart from IETF.  One thing this may allow is for an ISO registration
tree:  e.g.  text/iso-8601  (the "-" is a significant delimiter).

Anyway there is a general expectation in XML that system identifiers should
be URIs  and that public identifiers should be SGML FPIs.  However, until
this is defined, there is no choice but to use MIME media types for SYSTEM
identifiers. Even though MIME media types are, strictly speaking, public
identifiers, they belong in the "WWW" slot not the "ISO" slot (i.e. the
SYSTEM identifier not the PUBLIC identifier). I guess there might be
differing views on this as a policy, but there should be an agreed approach.

But for future proofing, can I suggest that it might be best if software
which interprets the system identifier would also accept whatever the likely
future urn syntax for MIME media types might be: e.g.
	"(urn:.*:)?.*/(.*-)?.*" such as

If you write your software so that it accepts the following notation
	"ISO 8601:1998//NOTATION
		Text elements and interchange formats -
		Information interchange -
		Representation of dates and time//EN">
then you would have to make sure it accepted all syntaxes for dates which
that standard defines. If your software only accepted a subset, you would
have let it accept some other public identifier
		simple date (subset of ISO 8601:1998)//EN" >

I am not sure if ISO8601 has made it into the current version of ISO/IEC TR
9573-9:1997 "Standardized Data Notation". In ISO 10744 (HyTime) there are
also FPIs for time and distance.

(If you are interested in notations, I give several chapters on them, with
lots of listings for useful and common notations, in my book. I certainly
think that XML-DEV should get behind (Tim Bray's) collection of database
notations which is in XML-data.)

Rick Jelliffe

The XML & SGML Cookbook, by Rick Jelliffe
Charles F. Goldfarb Series on Open Information Management
656 pages + CD-ROM, Prentice Hall 1998, ISBN 0-13-614223-0  > Book Search > "Jelliffe"

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as:
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list