Why namespaces?

Fri Sep 3 14:51:02 BST 1999

Mark Birbeck wrote:
> 
> I understand "XHTML inside another document" to be something different,
> and is illustrated by the MathML example in the proposal. As with any
> mixing of vocabularies in a document, it requires a namespace for each
> vocabulary. XHTML 1.0 *is* intended to address that, although through no
> fault of its own, it cannot create combined documents that can be
> validated.

I must point out that the SGML architecture mechanism enables exactly
the ability to mix element types from different "vocabularies" while
retaining the ability to validate the document against any of the
vocabularies in isolation. 

Documents are mapped to architectures (document type *definitions*).
Each architectural mapping defines a "view" over the document that
reflects only those parts of the document mapped to the architecture.
This view, which is logically another XML document, can then be
validated against the architectural DTD without interference from any
other view.  This "view" idea in architectures is very similar to the
concept of views in relational databases: it's an abstraction of a
document expressed as another document, just as a database view is an
abstraction over a set of tables expressed as another table or set of
tables.

There are a few constraints, the most severe being that each different
view must be rooted at the document element (and in the examples below
I've invented an "xlinkdoc" element type so that XLink can be used as an
architecture--XLink editors please take note).

In the abstract processing model for architectures, each view over the
base document maintains pointers back to the base document--imagine a
DOM environment where you have the base DOM tree and then one additional
DOM tree for each different architectural view where each architectural
DOM node points back to the base ("client") node from which it was
constructed, and visa versa (each base node points to all of the
architectural nodes constructed from it). This enables a processor to
work in any and all views and navigate to the others.

For example, you might find the HTML view DOM, walk about in it until
you find something of interest, then navigate back to the base element
to get its local details, and go from there. Or you might walk the base
DOM and check all the architectural mappings (follow the pointers to its
architectural DOM nodes) to see if its mapped to anything you
recognize.  The key is that you have all the data all the time--you
don't just create the architectural view and throw away the base
document (that's why you can't think of architectures as simply a
transform in the A-in-B-out sense (although as Paul Prescod points out,
architectural mappings are transforms in the mathematical sense)).

If I'm using XHTML and Xlink in the same document, the same base element
could be both an XHTML "a" element and an XLink "simple" at the same
time, which I might indicate like so:

<xhtml:a xlink:type="simple" href="#something">this is a link no matter
how you slice it</xhtml:a>

[or like so:

<xlink:simple (oops nothing defined for doing it this way! Help! My
document's been taken over by an autocratic spec hobbld by an
overly-simple maping mechanism that imposes element type names on me
with no recourse.)>]

>From an architecture view point, the local element type "xhtml:a" is
mapped to two architectural elements: "a" in the XHTML architecture and
"simple" in the XLink architecture. From a DOM perspective, this could
be represented like so:

(Within the same overall DOM memory space:)

Base DOM:
  ...
  Node [node000] Class = ElementNode
     TagName = "xhtml:a"
     ArchitecturalNodes: node001, node002

XHTML View DOM:
  ...  
  Node [node001] Class = ElementNode
     TagName = "a"
     ClientNode: node000

XLink View DOM:
  ...
  Node [node002] Class = ElementNode
     TagName = "simple"
     ClientNode: node000

A processor can easily see that the base element "xhtml:a" is both an
HTML A element and an XLink simple element and do whatever processing it
thinks is appropriate.  [Note that for this to be completely rigorous,
there must be some formal and machine-validatable binding between each
DOM and its governing architecture (see below)]. 

With the architecture mechanism I can also use my own element type names
and still have a clear mapping. E.g., I could do this:

Note that you have *exactly the same information* about the element
"mylink" that you had about the "xhtml:a" element, namely that it is
both an "A" element in XHTML and a "simple" element in XLink. Actually
you have more information, because it's also a "mylink". The syntactic
difference is trivial: I've moved the mapping to XLink from the GI to
another attribute. But I've also freed up the local GI to be whatever I
want it to be.

One way to think of namespace prefixes on element types is as a
shorthand for attribute-based architectural mapping. 

Note, for example, that XLink *cannot require* the use of namespace
prefixes because one of its requirements is that you can use your own
element type names and map them to the XLink semantics (that is, it
allows you to have a "mylink" element and not always use
"xlink:simple"). This is, I think, the general case, which of course the
namespace mechanism cannot support.  [Which suggests that namespace
syntax must be a special case of a more general mechanism which the W3C
has not yet defined.]

Alternatively, the XHTML spec could define "A" as a specialization of
"simple" (which simple was designed to enable), which would make the
architectural DOMs for "mydoc" look like this:

(Within the same overall memory space:)

Base DOM:
  ...
  Node [node000] Class = ElementNode
     TagName = "mylink"
     ArchitecturalNodes: node001

XHTML View DOM:

  Node [node001] Class = ElementNode
     TagName = "A"
     ClientNode: node000
     ArchitecturalNodes: node002

XLink View DOM:

  Node [node002] Class = ElementNode
     TagName = "simple"
     ClientNode: node001

That is, I can have multiple levels of specialization: mylink ISA "A"
ISA "simple" (therefore, mylink ISA simple).

The architectural mapping rule is that elements in the base document
that are not explicitly mapped to an architecture are simply ignored for
that view, thus, if I have this document:

<?xml version="1.0"?>
<?IS10744 arch name="xhtml" system-id="http://w3.org/.../PR-XHTML.xml"?>
<?IS10744 arch name="xlink" system-id="http://w3.org/.../DR-XLink.xml"?>
<mydoc xhtml="html" xlink="xlinkdoc">
  <foo>This is purely mine</foo>
  <metadata xhtml="header">
    this is mine and xhtml's
  </metadata>
  <stuff xhtml="body">
    <p xhtml="p"><mylink xhtml="a" xlink="simple" href="foo">both HTML's
and XLink's</mylink></p>
  </stuff>
</mydoc>

We now have three views of this document:

1. The base view, looking only at the element type names. These map to
my private semantics, 
whatever they might be.

2. The HTML view:

  <html><header></header><body><p><a href="foo">Both HTML's and
XLink's</a></p></body></html>

  This maps to the XHTML semantics.

3. The XLink view:

  <xlinkdoc><simple href="foo">Both HTML's and
XLink's</simple></xlinkdoc>

  This maps to the XLink semantics.

With appropriate pointers back and forth, of course.

The "<?IS10744 arch" PIs in the example above are the "architecture use
declarations" and do essentially what the namespace declarations do
except they do more:

1. They define a local name for the architecture (analogous to the
namespace prefix). This name is then taken to be the name of an
attribute that defines the mapping to names in the achitecture (e.g.,
"xhtml='A')

Note that when DTD declarations are present, these mappings can be fixed
in the DTD rather than having to be specified on each element instance,
which can be a significant space savings in non-trivial documents. You
can't do this with namespaces.

2. They point to the resource that serves to define the architecture as
a whole (in this case I've pointed to the W3C-managed specs for XHTML
and XLink, which are the one true definitions of these architectures).
This binds the document to a set of *semantic* definitions. The
architecture mechanism does not (and cannot) say how the semantics are
defined (apart from presuming some use of prose). Therefore, this could
be any number of things, formal and informal, including XML Schemas,
program code, UML drawings, EXPRESS schemas, whatever.

3. To enable architectural validation, they can point to a set of DTD
declarations:

<?IS10744 arch name="xlink" 
  system-id="http://w3.org/.../PR-XHTML.xml"
  dtd-system-id="http://w3.org/.../xhtml-strict.dtd"
?>

Note that in this example I've got one architecture, "XHTML", but I'm
suggesting that there are different architectural DTDs that I can choose
to validate against.  This reflects my view that there is one set of
general semantics with different structural constraints that the
architecture allows.

This part of the declaration binds the document to a machine-readable
set of names (the names defined in the DTD declarations). That is, if I
use "xhtml='foo'" I can now use an automatic and completely generic
process to detect that the name "foo" is not in the XHTML vocabulary
(that is, I don't have to have built-in knowledge of XHTML rules to do
this validation, unlike namespaces, where there is no defined mechanism
for defining vocabularies and therefore no way to have a generic
vocabulary validator).

The architecture standard stipulates DTD-syntax declarations because
that's all we had, but once XML Schemas are standardized by the W3C,
they would be an alternative--the key is machine processibility and
validatability.

Note that there is no requirement that the DTD declarations be fetched
unless validation is actually requested--this has the interesting side
of effect of letting you eat your DTD-less document and still have the
cake of DTD-full validation.  I think that's pretty cool.

4. To enable attribute name mapping, you can declare the name of an
attribute for remapping attribute names unique to that architecture (by
default, attribute names map to architecture names if they're the same
as in the architectural DTD--the use of explicit mappings eliminates the
need to read the architectural DTD or build knowledge of it into an
architecture-aware processor):

<?IS10744 arch name="xhtml" 
  system-id="http://w3.org/.../PR-XHTML.xml"
  dtd-system-id="http://w3.org/.../xhtml-strict.dtd"
  arch-renamer-att="xtml-names"
?>

There's more stuff you can specify, mostly having to do with controlling
various optional features that enable automapping of names to
architectures in the context of DTD-full processing, so these features
are really not practical in a general XML environment.

Architecture-aware processing is already implemented for SAX through the
SaxArch package and has been in James Clark's SP parser for years (and
therefore, in anything built from SP as long as the facility is
exposed).  

Because architectural mappings are defined using attributes, there's
absolutely no difficulty in writing code that constructs architectural
views in a DOM-type setting--it's just normal XML processing where you
key off of attributes.  I have personally implemented architecture-aware
processing in a variety of processing contexts, including ADEPT command
language, perl, Python, VB, etc.

Finally, note that the features provided by SGML architectures through
the use of attributes could be provided in a much more convenient and
complete way in an XML Schema mechanism--I fully expect the final XML
Schema mechanism to do this and will not consider any mechanism that
doesn't do this to be either complete or useful for my needs as a
practitioner and integrator. That's because *none* of the problems I
solve in my work can be solved effectively or completely without the use
of architectures for the simple reason that the information systems my
clients have and work with are complex and require a sophisticated
representation system. Namespaces are of little or no value in these
environments. They don't hurt, particularly, but they don't help either
and therefore simply add unnecessary complexity to an already complex
system.

Cheers,

E.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)