Namespaces in XML and XHTML

Jon Bosak Jon.Bosak at eng.sun.com
Sun Sep 5 05:10:49 BST 1999


Several people have suggested to me that a recent article of mine on
XML namespaces might serve as a useful reference in the current
discussions of namespaces in XHTML.  An edited version of that article
appears below.  W3C members can find the original, dated 21 August, in
the archives of the W3C XML Plenary (w3c-xml-plenary).

It should be understood that this edited version represents nothing
more than my personal opinion.  It should also be understood that I am
speaking here neither in my role as Chair of the W3C XML Coordination
Group nor in my role as Chair of the W3C XML Plenary, but simply as an
interested individual.  W3C members should read the original to regain
its original context.

With the exception of quotations from other writers, used here by
permission, and excerpts from publicly available documents, this work
is my own.  I must acknowledge the kind assistance of Tim Bray, James
Clark, Dave Hollander, Eve Maler, Murray Maloney, David Megginson,
Makoto Murata, Bill Smith, C. M. Sperberg-McQueen, and Lauren Wood in
preparing this article.  However, this acknowledgement should not be
taken to imply their endorsement of its conclusions, in whole or in
part.  With the exception of material explicitly attributed to others,
all opinions expressed in the following are mine, and I am entirely
responsible for any errors or omissions.

Jon Bosak
Los Altos, California
4 September 1999


-------
Sources
-------

The Namespaces Recommendation is at

   http://www.w3.org/TR/1999/REC-xml-names-19990114/

The best explanation of namespaces was written by James Clark and can
be found at

   http://www.jclark.com/xml/xmlns.htm

James's paper should be required reading for anyone involved in a
discussion of namespaces.

------------------------------
What are namespaces all about?
------------------------------

Namespaces are about unique identification; they are not about
meaning.  Identification is necessary to the establishment of meaning,
but it does not constitute meaning in itself.

-----------------------------
What are namespaces good for?
-----------------------------

It might be thought that mere identification is not worth much, but
the designers of the namespace recommendation thought otherwise.
Namespace declarations give us a way to confer upon element types and
attribute names a unique identity.  This does not tell us anything
about what such names mean, but it does provide a way to unambiguously
associate an unlimited number of different kinds of processes with
objects having the names so distinguished.  It allows us to say, "this
thing here has a certain quality (its type or name) that is identical
to the corresponding quality of the thing in that place over there,"
regardless of where, when, and in what context these things are
actually discovered, together or separately.  On the basis of that
ability, we trust that we can apply processes to elements having a
particular type, or to attributes having a particular name, and be
reasonably sure that such processes will not be applied to other
elements or attributes that seem to have the same type or name but
really belong to some other category.  This is considered to be a Good
Thing.

My favorite description of the function of namespaces comes from one
of the editors of the Namespaces Recommendation, Tim Bray: namespaces
are a device "for allowing programmers to write programs that will
reliably find the items of information that they are designed to
process even in a distributed and heterogeneous environment."

Another, I think equally valid, view of the function of namespaces as
they are defined in the Recommendation was given to me by another one
of the editors, Dave Hollander: allowing us to avoid name collisions
is an essential step toward being able to form new documents
consisting of pieces from other documents without forcing us to change
the element types or attribute names.

----------------------------------
What kind of thing is a namespace?
----------------------------------

A namespace is a collection of names.  Each name in this collection is
a pair consisting of the namespace name and a colon-free XML 1.0 name.
The namespace name has the form of a URI reference.  URIs were chosen
for this purpose primarily because they (or at least their URL subset)
are under the control of a naming authority.  A URL in this role is
really performing the function that URNs are intended to perform; URLs
were allowed to serve in this place because it's possible to
administer URLs so that they work as URNs, and because URNs were an
unproven technology.
 
A namespace is an abstract object.  Namespaces live in the same place
as other abstract objects such as the set of letters of the alphabet,
the set of words in the English language, and the set of integers.
The namespace itself is distinct from anything that might be
associated with it and belongs to a level of reality different from
the kinds of things that we typically would want to associate with it.

(Whether names have real existence is an interesting question.
Persons wishing to pursue this matter can start with the article in
any large encyclopedia under the heading "Realism" or "Nominalism,"
where they will discover that this issue, while not completely
resolved, has already received some attention.)

--------------------------------------
What kind of thing is not a namespace?
--------------------------------------

Anything that is not a name is not a part of a namespace; a namespace
can have nothing but names in it.  Here are some important examples of
things that are not namespaces:

 - A list of the names in a namespace

 - A sequence of characters that can be interpreted as a list of the
   names in a namespace

 - A description of the things named by the names in a namespace

 - A set of constraints on the things named by the names in a
   namespace

 - A set of procedures associated with the things named by the names
   in a namespace

-----------------------------------
What does a namespace URI refer to?
-----------------------------------

The URI in a namespace declaration may or may not refer to an actual
resource that can be retrieved by a computer; if it does, the resource
so identified may or may not have useful things to tell us about
things labeled with the names associated with the URI.  The statement
in the Namespaces Recommendation that "It is not a goal that it [the
namespace URI] be directly usable for retrieval of a schema (if any
exists)" is not mere rhetorical fluff but rather represents a concrete
position taken by the XML Working Group after months of debate and in
direct opposition to an equally concrete point of view to the
contrary.  In the word's of James Clark's paper (cited above):

   The role of the URI in a universal name is purely to allow
   applications to recognize the name. There are no guarantees about
   the resource identified by the URI.

Some future Namespaces Recommendation _could_ say that namespace URIs
are expected to resolve to a resource or are required to resolve to a
resource.  However, the Namespaces Recommendation that we actually
have chose, very deliberately, to say that the utility of Namespaces
in XML does not depend on what (if any) resource is at the other end
of a URI.

As James notes in the same paper:

   It would of course be very useful to have namespace-aware
   validation: to be able to associate each URI used in a universal
   name with some sort of schema (similar to a DTD) and be able to
   validate a document using multiple such URIs with respect to the
   schemas for all of the URIs. The XML Namespaces Recommendation
   does _not_ provide this.

A good reason for not providing an explicit association between
namespaces and schemas is to allow namespaces to be associated with
things other than schemas and to allow them to be associated with
multiple expressions of the same schema, as for example a schema
expressed as a DTD and the same schema expressed using the eventual
W3C XML Schema language.

--------------------------------------------------------
What kinds of things can be associated with a namespace?
--------------------------------------------------------

Just about anything can be associated with a namespace.  Some of the
more useful kinds of things that can be associated with namespaces are
stylesheets, Java classes, DTDs, Active-X objects, extended linking
groups, schemas, and prose descriptions.

----------------------------------------------------------------------
Is a there a one-to-one correspondence between namespaces and schemas?
----------------------------------------------------------------------

With the understanding that this question has meaning only in the
relatively small class of all possible uses of namespaces in which
they are being used in conjunction with schemas, the answer is no,
there is not a one-to-one correspondence between namespaces and
schemas.  One namespace may be associated with multiple schemas; it
has been common practice for years to apply multiple schemas (DTDs) to
documents in order to check them against different sets of
constraints.  Conversely, it is easy to imagine cases where multiple
namespaces might be used within documents that conform to a single
schema.  David Megginson has provided a simple example:

   a geographical markup language...might use separate Namespaces
   for land elevation, coastline information, and land use to help
   avoid naming conflicts...

And, of course, a DTD can be generated for any arbitrary well-formed
document that happens to include names from multiple namespaces.  This
scenario is admittedly artificial, but it proves that the association
of multiple namespaces with a single schema is always possible.

Since there is no necessary association between a namespace and
anything in particular, there is no necessary association between a
namespace and a schema, and it follows from this that all arguments
based on the assumption of a necessary mapping from a single namespace
to a single schema are invalid on their face.  This includes in
particular the argument that XHTML should use more than one namespace
because it specifies more than one DTD.  This is not to say that the
conclusion that XHTML should have multiple namespaces is false, it's
just to say that you can't adduce a one-to-one mapping between
namespaces and schemas as a premise for that argument.

-------------------------------------
Should XHTML use multiple namespaces?
-------------------------------------

I agree with David Megginson that "the HTML WG [should] maintain a
single HTML Namespace as long as practical and find another mechanism
to indicate flavours and versioning."  Among the reasons that can be
brought for this view, I find the following most convincing:

 - The main argument for specifying three namespaces for XHTML rests
   on the assumption that there is a one-to-one association between
   namespaces and schemas.  This is not true.

 - A second argument for specifying three namespaces is that it's
   intended to indicate that XHTML actually specifies three different
   tag languages and that <h2> in one of these languages means
   something basically different from <h2> in the other two.  In my
   opinion, <h2> means basically the same thing in all three
   versions.  This is why we call all three of them "XHTML."  HTML 4
   has three DTDs, too, but no one has suggested that HTML 4 is
   actually three different languages.  Lauren Wood has pointed out
   to me that SoftQuad's HTML authoring tool has used something like
   15 different HTML DTDs in the lifetime of the product.  It would
   seem strange to say that the people using the product were
   actually working in 15 different languages, all of them called
   HTML.  This is not what we usually mean by the word "language."  I
   think that the HTML WG should reconsider whether it's really
   defining three different languages, being careful to distinguish
   between a machine-readable structural description such as that
   provided by a DTD and a complete human-readable statement of its
   meaning such as that provided by a complete and accurate prose
   description.  If the HTML WG decides to maintain the position that
   XHTML is defining three different languages, then it should be
   ready to explain how an <h2> in one would materially differ in
   meaning from an <h2> in another, "meaning" here being expressed in
   terms of the intention of the person who causes elements to have
   the type "h2."

 - Another way of making what I believe to be essentially the same
   point is that distinctions between a strict <h2> and a
   transitional <h2> are not reflected in actual machine processing
   outside of validation.

 - If XHTML really is several languages and the similarly named
   elements of those languages really are different from each other,
   then those different languages are going to require different HTML
   DOMs.

Jon

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list