XHTML and the Three Namespaces

Andrew Layman andrewl at microsoft.com
Tue Sep 21 21:43:03 BST 1999


Here is my understanding of the XHTML namespace problem and its proposed
solution:

There is a vocabulary and syntax called "Strict". There is another called
"Transitional", and it includes elements and attributes with the same names
as those in strict, plus some additional elements and/or attributes, and the
content models and attribute lists of Transitional are such that any
document valid under "Strict" is also valid under "Transitional". There is
also a third grammar called "Frameset" having a similar, superset relation
to Transitional.
 
One thing that people would like is to be able to clearly define which
documents are valid per the Strict, Transitional and Frameset rules. This is
currently done via three DTDs.

Another thing people would like is to be able to indicate in a document
which set of rules the document is intended to conform to. This is done by
giving each of the three grammars a namespace, and saying that the elements
in each namespace are to be validated against the syntax in the DTD
corresponding to that namespace.

This avoids having namespace transitions within a document (as would
presumably occur if the Transitional namespace contained only those elements
additive to Strict). This approach also permits each grammar/syntax to be
described by a DTD, because we know that DTDs do not handle mixing of
namespaces gracefully. Finally, it allows that existing HTML documents
without any indication of namespaces, and consequently no namespace
transitions, may be interpreted according to the broadest grammar.

But it has the drawback that an element name from Strict and the matching
element name from Transitional have different URIs, so a namespace-capable
processor will treat them as of different element types.
 
The actual difference between the elements is not sustantially their
meaning, but only their content models and attribute lists.

That is, the intention of the three definitions of various elements is that
the more elaborate definition permits more elaborate content, but that an
element conforming to Strict would have the same processing consequences as
one conforming to Traditional, excepting only that a slightly different
validation test would have been performed.

Browsers, and other software specifically designed to deal with XHTML, could
deal with this fairly easily by hard-coding the relationship between the
definitions attending the three namespaces. However, generic software would
find no machine-readable connection between the namespaces, and this would
lead to awkwardness. For example, a search for documentw with "A" tags that
specified the Strict namespace would miss all documents containing
Transitional "A" tags.

Are three namespaces the right answer?  Here is a provisional phrasing of
the problem we need to solve: How can we reliably distinguish elements
requiring slightly different processing, while at the same time permitting
them to be processed similarly to the degree that the differences do not
matter?

Let us look an an example to clarify the issue.  Generalizing, suppose we
have an element such as 

  <a:X>
    <Y/>
    <Z/>
  </a:X>

in one instance, and an element such as

  <b:X>
    <Y/>
    <Z/>
  </b:X>

in another.  Although 'b' might permit additional subelements in the
content, they are not in fact there in this instance.  We intend that, in
nearly all respects, a:X and b:X are processed as equivalent.

At the same time, b:X in another instance document could appear as

  <b:X>
    <Y/>
    <Z/>
    <W/>
  </b:X>

The content model of b:X permits subelements not permitted in a:X.

So I ask myself "Is this problem unique to XML, or has it appeared in other
contexts, and if so, how was it solved there?"

What I notice is that a very similar issue appears in languages such as Java
or C++, and is solved in the following manner:

Package A;

Class X {
  Object Y;
  Object Z;
  }

Package B;

Class Y extends A.X {
  Object W;
  }

>From this I conclude that if we had a way to declare the extended content
model of B as an extension of that of A, then we would be able to express,
in a machine-readable form, the relation between b:X and a:X.

Given that, it would be proper to have three namespaces, each designating a
slightly different set of validation rules.

So our present difficulty appears to be a timing problem: the three
namespaces distinguish the different validation rules of the three
categories of elements, but there is at present no machine-readable way to
express their relationship.  What we have now is readable by humans, but not
by validation programs, and what we will have eventually that is machine
readable is still under design by the Schemas working group.

Three namespaces, and the consequent mapping and other conversion processing
is certainly more expensive than if we had only one namespace, yet that
expense must be compared against other alternatives that actually solve the
problem we set out to address: How to reliably distinguish elements
requiring slightly different processing, while at the same time permitting
them to be processed similarly to the degree that the differences do not
matter?

Certainly, interpreting a document incorrectly because we did not read the
relevant definitions and mappings is not attractive.  Nor is it attractive
to label different element types indistinguishably so that the relevant
definitions cannot be determined.

Of the alternatives that I have seen, only the proposal for three distinct
namespaces seems to have sufficient information in it.  Perhaps I have
overlooked a proposal that also works, but at this point I conclude that the
burden of proof should rest with those who assert that the three namespace
approach is faulty, and any such proof should include a demonstration of a
workable, better alternative approach that actually solves the same problem.


















xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list