[Fwd: ATTN: Please comment on XHTML (before it's too late)]

Sun Aug 29 15:31:41 BST 1999

This is from Eliot Kimber (<eliot at isogen.com>):

Oren Ben-Kiki wrote:

> > It's no more of a "threat" than the "threat" of people creating their own
> > namespaces for any purpose, which is indeed the entire idea behind
> > namespaces.
>
> That depends on what you feel "the entire idea of namespaces" to be. To me,
> the main idea is to allow applications to distinguish between tags with
> different semantics. By qualifying a tag with a namespace, the document
> writer essentially informs the application that the semantics of the tag is
> that associated with the namespace. The fact that this semantics is defined
> outside the XML standards is besides the point.

No--namespaces have *absolutely nothing* to do with semantics. The only
possible purpose of namespaces is to disambiguate names. *ANY
ASSUMPTIONS MADE ABOUT SEMANTICS BASED ONLY ON NAME ARE UNJUSTIFIED AND
UNSUPPORTABLE*. For example, I may have many names in different name
spaces that map to the same semantic--there is no way to describe this
using name spaces alone.  A given name may have different semantics
based on its context (either its structural context or its use context).

Because the name space mechanism provides neither a formal binding
between name space names and semantic definitions nor name space names
to vocabulary definition bindings, it is impossible to make *ANY*
reliable inferences about the meanings of names in an XML document.
That is, you either know what the semantics of a particular name are or
you do not. The binding is always to the entire name, not just to the
namespace prefix.  Making assumptions based on the prefix alone is at
best a guess.

First, unless you personally defined the name space, you have no way of
knowing if a given name is in fact a "valid" name in the name space
because there is no definition of how one defines the set of names in a
name space. If I find the name "myspace:foo" there is no standardized
way to validate that "foo" is a member of the name space "myspace"
because there is no standardized definition mechanism for the vocabulary
of which "foo" may or may not be a member.

Thus, all the name-space prefix is doing is ensuring that "myname:foo"
will not collide with any other name that ends in "foo". AND THAT IS
ALL.

Namespaces were intended to solve the problem of *name collision*, which
they do. But they explicitly do not have anything to do with binding
names to semantics and therefore you are *never* justified in infering
semantics from namespace use.  You may know that a given name space *has
been bound* to a given set of semantics, but that's different. This
knowledge of the binding comes from some mechanism outside the namespace
mechanism itself [see below].

By this reasoning, it doesn't matter how many different name-space
prefixes XHTML uses because *none of them* give you a way to know that
what you are processing is in fact an XHTML document (or XHTML-specific
element). Rather, the binding between documents and their *governing
semantic definitions* (e.g., schemas, architectures, etc.) must be
provided by some other mechanism. In the absence of a generalized
mechanism for doing this binding, it can only be done in documentation
of the semantics.

Another thing to keep in mind is that there is not necessarily a
one-to-one binding of schemas to name spaces (or name spaces to
schemas).  The same abstract types could be mapped to many different
names (short and long, English and French, domain A and domain B,
enterprise 1 and enterprise 2, etc.). The same names could be mapped to
*different* semantics in different contexts (the element type
"myns:employee" maps to both "person" in schema A and "bold" in schema
B).  Assumptions about name-to-semantic bindings seem to be based on the
idea that there is always an exact one-to-one binding of names to
semantics. But of course this is not always (or even usually) the case.

For example, I would probably want to provide different name lengths or
national language bindings for the same abstract element types. Thus, I
would have one overall abstract schema, "MyElementTypes", and several
name spaces that provide specialized names for elements derived from the
abstract types.

Thus, while name spaces are mildly useful for disambiguating names, they
can do nothing, by themselves, to provide a reliable or complete binding
of names to semantics and therefore provide no basis for infering
semantics based on name space alone.

> So according to this idea, applications are built under the assumption that
> 'my:foo' and 'your:foo' are completely different, with nothing whatsoever in
> common. The fact they both have the name 'foo' is considered accidental.
> _That's_ the whole idea.

But this assumption is completely unfounded--"my:foo" and "your:foo"
could in fact be mapped to exactly the same semantic--there is no way to
know from the namespace usage itself and nothing in the namespace spec
justifies the single mapping assumption.

> Providing three different namespaces which have the same semantics would
> force application writers to abandon this assumption. In XHTML,
> 'traditional:p', 'strict:p' and 'frameset:p' are the same thing. This would
> seriously mess XHTML applications up - put another way, it would cause
> generic XML applications to fail on XHTML documents.

Why would three name spaces cause more failures than one name space?
Either you know what the names mean or you don't. In this case, all I
have to do is know that the names in all three spaces map to the same
base type. Since there's no W3C-defined mechanism for this, the authors
of the XHTML spec can define an obvious one: use the base name as the
base type. Once I've implemented this mapping in my code, there's no
problem (unless someone uses a bogus base type name, which of course
there's no way to formally validate in the namespace universe).  That
is, my code looks like this:

def process_element(node):
   nsp = get_ns_name(node.TagName)
   gi  = get_base_part(node.TagName)
   if nsp == "XHTML strict ns URN" |
      nsp == "XHTML traditional ns URN" |
      nsp == "XHTML frameset ns URN":
      apply_XHTML_semantics(node)
   elif nsp == "Some other namespace":
      apply_someother_semantics(node)
   else:
      raise UnknownNamespaceException(node)

I don't see where the problem is, unless the concern is the amount of
typing one has to do.

[But what is looks like to me is that the really have three different
*DTDs* (or rather, architectures) for the same base names. If this is in
fact the case, then the XHTML authors have inappropriately confused name
spaces with DTDs and they should fix that.  In fact, I think there are
four architectures at work here: the base architecture that defines the
types and imposes the minimal structural rules, then three derived
architectures, one each for "strict", "traditional", and "frameset",
which impose different detailed structural rules on documents.  There is
no way, using W3C-defined mechanisms alone to define this system today
(you can do it with SGML Architectures). This may change when the XML
Schema work is finished if (and only if) it satisfies the same
requirements for type classification and constraint that SGML
Architectures satisfy (ideally it would satisfy more that what SGML
Architectures satisfy, but I'll settle for just having the equivalent of
architectures).]

> For example, consider that a generic XML application must never mix up a
> 'commercial:order' with an 'administrative:order', no matter what.

You say "must": do you mean that in the absolute "law of nature" sense
or the in the "for this example, this is the business rule that applies"
sense? If the former, then the use of "must" is entirely unfounded.
Maybe commercial:order and "administrative:order" are in fact
specializations of a more general type "order" and there are processing
contexts in which they *must* be processed in exactly the same way.
Without knowing the semantics of all three types, there's no way to know
what the business rules are, but in any case, the business rules cannot
be inferred from the use or non-use of name spaces.

                                                            On the
> other hand, one would expect that a 'strict:p' element would be
> interchangable with a 'traditional:p' element. For example, in an XHTML
> editor, I'd expect to be able to cut one and paste it in replacement of
> another. That seems like a messy issue, unless I'm missing something.

Whether it is meaningful or not to replace one element type with another
can only be defined at the schema or application level. The use or
non-use of name spaces cannot tell you that. The mess is no different
from knowing whether or not "p" and "pre" are interchangable.

A name is just an identifier and in the absence of a formal, verifiable
binding of names to semantics you cannot make any inferences from the
names. The fact that we have a body of knowledge about what we think "p"
means is a red herring. The only way to know if "strict:p" is
interchangable with "traditional:p" is to read the XHTML docs, because
that's the only place the semantics could possibly be defined because
that's the only defined mechanism we have at the moment. So there's no
mess because you *always* have to read the docs.

Now, the docs can say "there is a binding between the names in namespace
X and the semantic types defined in this document"--that's fine, but
that is not a computer processible statement--it's a directive to
programmers and document authors.  But since you can't know about this
statement unless you've read the docs first, if you see the namespace
first and make assumptions about semantic bindings, you are living
dangerously at best and may make wildly incorrect or inappropriate
inferences at worst.  The first thing you must do when you see a new
name space is chase down *all semantic documentation* that references
that name space to see what the possible semantics are.

Of course, this is impossible in the general case *BECAUSE THERE'S NO
BINDING FROM NAME SPACE NAMES TO SEMANTIC DEFINITIONS*. Oops. That is,
unless you *already know* what semantics are bound to a given name
space, you cannot find it out reliably.

Here's an experiment: find *the complete list* of semantic bindings for
these name spaces:

xmlns = "urn:schemas-microsoft-com:xml-data"
xmlns:dt = "urn:schemas-microsoft-com:datatypes"
xmlns:xa = "www.extensibility.com/schemas/xdr/metaprops.xdr

I want documents, formal, machine-readable specifications, etc. such
that there can be no argument about what the set of valid bindings is.
I believe it is impossible to do.

Either the name space declaration must also bind to one or more semantic
definitions, or the document must bind to sematic definitions and then
bind those definitions to name spaces.

With the SGML Architecture mechanism defined in ISO/IEC 10744:1997, you
have the first: an architecture use declaration binds to both a semantic
definition (the architecture documentation) and a name-space definition
(the architectural DTD, which serves to define a vocabulary of element
types and attribute names). Local names are bound to architectural names
as part of the element type definition.  The same local name can be
bound to any number of architectures and multiple local names can be
bound to a single architectural name.  Name colisions from different
architectures are handled by using different local names (that is, given
two architectures that both define the element type "p", I might use the
local names "p1" and "p2", each mapped to the appropriate architectural
"p").

I mention architectures merely as an example of an existing,
standardized mechanism for solving the name-to-semantic binding problem.
I would except the eventual XML Schema mechanism to provide the same
sort of mechanism that is at least as complete as the SGML Architecture
mechanism and, hopefully, more complete and convenient to use (the SGML
Architecture mechanism is limited by the fact that we had to do
everything within the constraints of DTD syntax).

> > If three namespaces present such an insurmountable problem, perhaps again,
> > the current "implementation" of namespaces is at fault.
>
> The problem is not with the namespaces implementation (or definition, or
> design). It is with using them to a different purpose then they were
> designed for.

Namespaces were designed for exactly one purpose: to lexically
disambiguate names within the global name space of URN-identified
things. They do that.  They do nothing else. Therefore, the XHTML use of
name spaces, whatever it is, must be correct.

NOTE: I have no opinion on XHTML's use or non-use of multiple name
spaces. It is entirely irrelevant to the usability or processibility of
XHTML documents. Fundamentally, there is no difference between
"strict:p" (or rather "urn:xmlnamespace:XHTML:strict:p") and
"urn:xmlnamespace:XHTML:strictp". They are both unique names from which
you can infer exactly the same amount of semantic information, which is
to say, none.  In both cases, I have to know, as an author and
programmer, what the element type means and the *only way* to know that
is to read the XHTML spec. Once I've read the spec, the names used in
documents are irrelevant as long as the mapping is implemented
correctly.  At best, the use of name spaces can provide a convenient
memory jog for remembering what the mapping is.

Cheers,

Eliot

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)