More on Namespaces (also long, but also optimistic)

Sun Aug 16 17:26:06 BST 1998

Peter Murray-Rust wrote:
> 
> There seems to be a certain amount of gloom over namespaces. I think this

I hope that neither the tone nor the content of my posts has contributed to
this impression. Upon further reflection, I am actually more pleased with the
present draft that with the former version. Since the recent draft eliminates
the namespace pi, and makes no claim as to the universal name which coresponds
to a given name in the DTD, it no longer requires that a prefix/uri binding
include the entire dtd within its scope. Presuming that the application is
informed of the declarations, this permits an application to employ an
analogous pi to rebind prefixes as needed to be able to peform the operations
which you identified as STAGE7/c.

> [...]
>
> STAGE 4 and 5

These sections discuss issues of DTD semantics - in particular reuse, model
combination, inclusion, and evolution - which arise even in cases which
exhibit no name ambiguity. Shouldn't they be titled something like 'STAGE I',
and 'STAGE II' to indicate that thay have no direct relation to namespaces. 

> [...]
>
> STAGE7
> 
> If there *is* a tag collision, then one of the DTDs has to yield its
> prefix. An example might be:
> 
> <VanityHG:Garden>
> [...]
> </VanityHG:Garden>
> 
> This is a syntactic problem and requires tools. The tools have to do the
> following:
>         - recognise the names from each DTD in a document and convert them
>         - recognise the same names in the DTD and convert them in precisely the
> same manner
> 
> We still don't need namespaces at this stage, just unique names. But we do
> need the tools. And these tools may have to convert DTDs quite often and
> keep track of which ones are used.

This point is not clear. You describe exactly the purpose which namespaces (in
general) serve. Given attribute-based bindings, there is no reason to yield
the prefix.

> 
> The problems ONLY arise when we start trying to put some semantics/meaning
> on the names. Since we don't have much experience at this for unqualified
> names it's not surprising we find it hard. The main namespace problems
> therefore are:
>         (a) - what does FOO mean? (in <FOO> and FOO="baz")
>         (b) - can I attach meaning through algorithms (schemas, stylesheets, Java)
>         (c) - can I identify the same FOO in different documents even if it has
> different prefixes?

(c) is incomplete: both sameness and difference matter to identity;
                   intra-document identity matters.
          (c.i)  - can I identify the a FOO even if it occurrs have with
different prefixes?
          (c.ii) - can I identify different FOO in a given documents even if
they occur
                   with the same prefix?
> 
> Namespaces ONLY address the third concern. This may not even be important
> if (a) and (b) are not solved.

If (c) is not addressed, it is not possible to address (a) and (b). One cannot
say that the something in the given examples means "what FOO means" until one
can say that the something "is" "FOO".

Enabling architectures, for example, provide the ability to remap attributes.
Despite the fact that the word "namespace" never appears in the standard
document, the remapping mechanism performs exactly that function.

> 
> [There is another aspect to namespaces - scoping. IMO this is simply a
> minimisation procedure.

The issues scope and extent are not directly related to minimisation. That it
is permitted to bind the null prefix is related. Once that is permitted, the
given rules for the scope and extent of that binding determine whether and
which universal name corresponds to a given unprefixed name. This is, however,
only a specific instance of their application.

>                        Personally I think it's unnecessary and dangerous

If a syntactic form is to have an effect with respect to other forms within
its document and over a process, it is necessary to specify that part of the
document within which the the effect is observed (the scope) and the duration
over which the effect holds (the extent).

> [...]
> 
> There is a real problem with multiDTD documents. If we have a document
> which reasonably includes:
>         - DC
>         - RDF
>         - DCD
>         - XSL
>         - XLL
>         - HTML/IBTWSH
>         - Application1
>         - Application2
> I cannot see, with the best will we have, how we can possibly build a DTD
> that can validate this (unless it's a VERY formal document - legal, patent,
> safety, etc.) It's an n-squared problem.

It is not an n^2 problem. It is a On problem. Each DTD-(fragment) must be
permitted to bind its own prefixes to its own URI(s). Each document must be
permitted to bind those URIs which it uses to prefixes which it chooses. This
can be accomplished by partitioning the names into as many regions as there
are URIs, and binding the URI's to the specified prefixes for the duration of
the parse of the respective entity (document entity, external subset, internal
subset, external parsed entities). At any given time, there are at most n
effective bindings, where n is the total numer of namespaces which appear in
the document.

This permits a form of "qualified" validation. "Qualified", since it could be
performed by a processor application only given the universal names. That is
the stages need to be
 parse -> namespace-filter -> validate
whereby the namespace filter can actually be integrated into the parser.

>                                            So I don't accept that 'namespaces
> have broken validation' but rather that complex monolithic XML documents
> are inherently unvalidatable except for expensive vital 'in-house'
> requirements.

'Namespaces' cannot not "break validation". Ambiguous names preclude anything
other than trivial validation. Given ambiguous names, either the failure to
match or the resulting duplicates lead to trivially invalid documents. The
present draft does not "fix validation" because it does not ensure a 1-1
correspondence between names in the DTD and universal names, but it can't be
said to "break" it. The previous draft (if followed strictly wrt. the
placement of namespace pi's) actually did break it.

I suggest an alternative to your current approach. If one assumes, that the
respective DTDs are present at the locations given below, and that the names
in the respective DTD, for this example, are unqualified (I understood that
they do not reference each other), then the following is sufficient to encode
the names unambiguously and to permit a processor to determine the qualified
validity of the document.

Note that the binding which effects the VHG:termEntry needs to be in the
containing element. Otherwise a consistent mechamism cannot be used to
determine its corresponding universal name unambiguously, given possible
attribute defaults.

<?xml version="1.0"?>
<!DOCTYPE VanityHG [
  <!ENTITY %VanityDTD
           SYSTEM "http://VanityHouseAndGarden.org.uk/DTD/VHG.dtd" >
  <!ENTITY %VHGDTD
           SYSTEM "http://VHG.org.uk/DTD/VHG.dtd" >
  <!-- assert the prefix for Vanity... here -->
  <?namespace  prefix=''  ns="http://VanityHouseAndGarden.org.uk" ?>
  %VanityDTD
  <!-- assert the prefix for VHG... here -->
  <?namespace  prefix=''  ns="http://vhg.org.uk" ?>
  %VHGDTD
  <!-- i suggest that the scope of the preceeding namespace end here -->
]>

<?namespace prefix='VanityHG' ns="http://VanityHouseAndGarden.org.uk" ?>
<VanityHG:Garden>
  <VanityHG:Vegetable>carrot</VanityHG:Vegetable>
  <VanityHG:Vegetable href="#artichoke01">artichoke</VanityHG:Vegetable
     xmlns:VHG="http://vhg.org.uk">
  <VHG:termEntry id="artichoke01">
    <VHG:term>Jerusalem artichoke</VHG:term>
    <VHG:definition xml:lang="en">A  delicious tuber which causes
flatulence</VHG:definition>
    <VHG:seeAlso href="#artichoke02">globe artichoke</VHG:seeAlso>
  </VHG:termEntry>
  <VHG:termEntry id="artichoke02">
    <VHG:term>globe artichoke</VHG:term>
    <VHG:definition xml:lang="en">A delicious thistle-like
plant</VHG:definition>
    <VHG:seeAlso href="#artichoke01">Jerusalem artichoke</VHG:seeAlso>
  </VHG:termEntry>
</VanityHG:Garden>

If the parser reports attribute and element declarations, a processor would be
able to validate this document. The application would use the namespace
instructions, in addition to the namespace attributes, as the basis to rewrite
ALL names to universal names in a manner similar to that described by Mr Bray
some time back. That it is possible to assert prefix bindings for the DTD
solves the problem left open in his description. 

The document might not be "xml-1.0 valid", since
1. the DTD could contain more than one element declaration for the same
qualified name;
2. there may be duplicate attribute declarations with conflicting constraints
for a given qualified name;
3. the DTD may contain no element or attribute declarations for a given
qualified element
or attribute name.

Despite this, it would be possible to determine conformance, since
1. it would be possible to assert that there should be a 1-1 correspondence
between the universal names which correspond to element tag names and those
which correspond to the names appearing in element and attlist declarations;
2. it would be possible to assert the same for element tag names and the names
which appear in content models;
3. it would be possible to do the same for attribute names in tags and those
which appear in attlist declarations;
4. if 1-3 were satisfied by the specific document entity and DTD, then it is
possible to determine whether attribute and element content comply with the
DTD's constraints.

No gloom and doom here. I'm actually very optimistic.
Given this result, it is now possible to think about STAGES I and II.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)