More on Namespaces (long, but optimistic)

Peter Murray-Rust peter at ursus.demon.co.uk
Sat Aug 15 13:01:29 BST 1998


There seems to be a certain amount of gloom over namespaces. I think this
is because they appear to offer lots of exciting new possibilities, but
it's not clear how or whether these will work. There is a lot we can do
without namespaces, or by using them in simple manners, and I have tried to
separate out some of the parts of the confusion... Personally I remain very
optimistic.

At 17:53 14/08/98 -0500, len bullard wrote:
[...]
>
>HTML spawned a community that believed it could do multimedia without 
>code.  While that group may mainly only consist of marketing wonks 
>now, we have to deal with the issue that XML 1.0 is just the syntax 
>piece of a much broader system that taken altogether is more 
>complicated than SGML because it attempts to do more than SGML 
>attempts.

Len has put this precisely and accurately. The point is that in principle
XML holds out the opportunity to combine information components from
multiple sources. Even more, it suggests that we may be able to do this
without specific coding for each type of component. Put another way, the
dream is that an XML-unaware person (client) can receive a complex
multicomponent document and it will automatically 'do what is required'
without effort on the client's part. That is a year or two away at least,
but many of us are pointing in that direction. It's enormously ambitious
and will change the human race. What  is described here is more modest.

We have to address lesser dreams at present. The current perceived problem
is, I think, that the current namespace draft might give the impression
that the dream is realisable today. We all know it will require software to
make it work but above all it will require a change in the way that we
think communally. [An analogy is e-mail - how long has it taken to become
universal? How many directors still get their secretaries to type up their
e-mail?]

I suspect that implementing the whole of the namespace draft - with its
links into the other required components (RDF, XSL, XLL, XPointer, DOM,
DCD) is going to take a considerable time to work through. Since these are
parallel activities it's not easy (or possibly desirable) to plan
everything to the last nut and bolt. I am confident that - as with HTML -
locally optimal solutions for important problems will be found. Some will
be elegant, some will be kludgy. Many will depend on communally available
software. The more of this we can stimulate here, the more experiments can
be tried.

I'll like to suggest a spectrum of activities where prefixes and possibly
namespaces should be quite tractable.

I make it clear that I am in favour of XML validation in the right
circumstances. As evidence of commitment I have authored an XML DTD for
terminology (VHG , http://www.vhg.org.uk) and I use it to validate
documents. Validation matters since I'm working on an industrial strength
project with it. Essentially the DTD functions as a contract, especially
where components have to be handed over, or signed off. I'll try to show
how it may incorporate namespaces. 

STAGE 1

When I started namespaces were still under wraps so early versions looked
like:
<termEntry>
  <term>XML</term>
  <definition>The world's number one markup language</definition>
</termEntry>

and a number of glossaries - which contained only VHG element types - were
created. They are validatable satisfactorily, can be updated and continue
to be validatable. 

These glossaries can be used in standalone mode (e.g. for reference or
bedtime reading). So the full power of parsers, stylesheets, JUMBO, etc.
can be unleased on them *now*. We should not underestimate the importance
of well-crafted, standalone validatable monoDTD XML documents. 

The next step was that the glossaries might contain information from other
DTDs, such as HTML for hypertext, or molecules. There are two possibilities:
	- planned and regulated
	- unplanned and unregulated
By planned and regulated I mean that every instance contains a precisely
known set of elementTypes from 2 (or more) DTDs in known contexts. It would
therefore be possible to write

STAGE 2

<termEntry>
  <term>XML</term>
  <definition><P>The world's <B>number one</B> markup
metalanguage</P></definition>
</termEntry>

where <P>,<B> are from the HTML DTD (say 2.0), or IBTWSH. Since there is no
tag collision between VHG and HTML (deliberate!) we don't have a problem.
We can construct a content model for <definition> which is validatable.

<!ELEMENT definition (#PCDATA | P | B ...)*>

Now, suppose I have a tag collision between VHG and HTML I can use prefixed
elementNames. e.g.

STAGE 3

<VHG:termEntry>
  <VHG:term>XML</VHG:term>
  <VHG:definition><HTML:P>The world's <HTML:B>number one</HTML:B> markup
language</HTML:P></VHG:definition>
</VHG:termEntry>

Note - I don't (think) I need to have any namespace declarations yet, do I?
This is well-formed, validatable XML. The content models include:

<!ELEMENT VHG:definition (#PCDATA | HTML:B | HTML:P ...)*>

STAGE 4

Now, suppose I want to include molecules in <VHG:note>. [Example:

<!ELEMENT VHG:note (#PCDATA | CML:Molecule | HTML:B | HTML:P ...)*>

 I have the following possibilities:
	- think of all possible content, hardcode it into the DTD and ban anything
else until the next revision.
	- revise the DTD every time I want to include a new type. 
	- use a content model of ANY
	- use XLink to link to the actual information, e.g.:

<!ELEMENT VHG:note (#PCDATA | VHG:Link | HTML:B | HTML:P ...)*>

where VHG:Link is a cunning pointer to another document and includes the
attributes show="embed" actuate="auto"

Although we have prefixed names we still haven't used namespaces.

Now, suppose someone else wishes to use VHG terminology in their own
document. They want to write something like:

STAGE 5

<Order>
  <Para>Please send five hundred <Link href="#widget02>widgets</Link> by
tomorrow.</Para>
  <VHG:termEntry id="widget01">
    <VHG:term>widget</VHG:term>
    <VHG:definition xml:lang="en">A specification for part of a graphics
screen</VHG:definition>
    <VHG:seeAlso href="#widget02">Beer widget</VHG:seeAlso>
  </VHG:termEntry>
  <VHG:termEntry id="widget02">
    <VHG:term>widget</VHG:term>
    <VHG:definition xml:lang="en-uk">A gasification apparatus in a beer
can</VHG:definition>
    <VHG:seeAlso href="#widget01">Screen widget</VHG:seeAlso>
  </VHG:termEntry>
</Order>

If the creator of this document wants it to be validatable (I shall use
'validatable'  as 'something more than ANY') they have to think where the
VHG stuff is going to occur in the document.  If the VHG stuff matters (and
it probably does since its role is to make precise identification) then
this effort has to be taken. XLink remains an alternative throughout:

STAGE5a

<Order>
  <Para>Please send five hundred <Link
href="terms.xml#widget01>widgets</Link> by tomorrow.</Para>
</Order>

terms.xml
<VHG:VHG>
  <VHG:termEntry id="widget01">
    <VHG:term>widget</VHG:term>
    <VHG:definition xml:lang="en">A specification for part of a graphics
screen</VHG:definition>
    <VHG:seeAlso href="#widget02">Beer widget</VHG:seeAlso>
  </VHG:termEntry>
  <VHG:termEntry id="widget02">
    <VHG:term>widget</VHG:term>
    <VHG:definition xml:lang="en-uk">A gasification apparatus in a beer
can</VHG:definition>
    <VHG:seeAlso href="#widget01">Screen widget</VHG:seeAlso>
  </VHG:termEntry>
</VHG:VHG>

None of this has required anything other than standard XML1.0. The problem
only surfaces when people start mixing material in which
	- there will be tag collisions
	- they want the namespaces to mean something.

Tag collisions are unavoidable (I first invented 'XML' as an SGML
application before the current XML).

Suppose Vanity Homes and Gardens wishes to produce a catalog. And suppose
that they already use other DTDs. And because of this they have already
produced their own DTD as a prefixed one, e.g.

<VHG:Garden>
  <VHG:Vegetable>carrot</VHG:Vegetable>
  <VHG:Vegetable>artichoke</VHG:Vegetable>
</VHG:Garden>

and, because people don't know what sort of artichoke it is, they use a
glossary:

STAGE 6

<VHG:Garden>
  <VHG:Vegetable>carrot</VHG:Vegetable>
  <VHG:Vegetable href="#artichoke01">artichoke</VHG:Vegetable>
  <VHG:termEntry id="artichoke01">
    <VHG:term>Jerusalem artichoke</VHG:term>
    <VHG:definition xml:lang="en">A  delicious tuber which causes
flatulence</VHG:definition>
    <VHG:seeAlso href="#artichoke02">globe artichoke</VHG:seeAlso>
  </VHG:termEntry>
  <VHG:termEntry id="artichoke02">
    <VHG:term>globe artichoke</VHG:term>
    <VHG:definition xml:lang="en">A delicious thistle-like
plant</VHG:definition>
    <VHG:seeAlso href="#artichoke01">Jerusalem artichoke</VHG:seeAlso>
  </VHG:termEntry>
</VHG:Garden>

Note that this is perfectly OK, since there are (coincidentally) no tag
collisions between the two DTDs. It may not be pretty, but it's perfectly OK.
None of this has required namespaces.

STAGE7

If there *is* a tag collision, then one of the DTDs has to yield its
prefix. An example might be:

<VanityHG:Garden>
  <VanityHG:Vegetable>carrot</VanityHG:Vegetable>
  <VanityHG:Vegetable href="#artichoke01">artichoke</VanityHG:Vegetable>
  <VHG:termEntry id="artichoke01">
    <VHG:term>Jerusalem artichoke</VHG:term>
    <VHG:definition xml:lang="en">A  delicious tuber which causes
flatulence</VHG:definition>
    <VHG:seeAlso href="#artichoke02">globe artichoke</VHG:seeAlso>
  </VHG:termEntry>
  <VHG:termEntry id="artichoke02">
    <VHG:term>globe artichoke</VHG:term>
    <VHG:definition xml:lang="en">A delicious thistle-like
plant</VHG:definition>
    <VHG:seeAlso href="#artichoke01">Jerusalem artichoke</VHG:seeAlso>
  </VHG:termEntry>
</VanityHG:Garden>

This is a syntactic problem and requires tools. The tools have to do the
following:
	- recognise the names from each DTD in a document and convert them
	- recognise the same names in the DTD and convert them in precisely the
same manner

We still don't need namespaces at this stage, just unique names. But we do
need the tools. And these tools may have to convert DTDs quite often and
keep track of which ones are used.

The problems ONLY arise when we start trying to put some semantics/meaning
on the names. Since we don't have much experience at this for unqualified
names it's not surprising we find it hard. The main namespace problems
therefore are:
	(a) - what does FOO mean? (in <FOO> and FOO="baz")
	(b) - can I attach meaning through algorithms (schemas, stylesheets, Java)
	(c) - can I identify the same FOO in different documents even if it has
different prefixes?

Namespaces ONLY address the third concern. This may not even be important
if (a) and (b) are not solved.

[There is another aspect to namespaces - scoping. IMO this is simply a
minimisation procedure. Personally I think it's unnecessary and dangerous
and confusing and opens up the dream too early. I would recommend that we
include all prefixes explicitly to avoid confusion. This is a syntactic
concern which can be dealt with at parser or SAX level and should be kept
as far away from the application programmer as possible. I have had my say
repeatedly - please don't overcomplicate. We don't *have* to use the
complex bits of XML or namespaces.]

There is a real problem with multiDTD documents. If we have a document
which reasonably includes:
	- DC
	- RDF
	- DCD
	- XSL
	- XLL
	- HTML/IBTWSH
	- Application1
	- Application2
I cannot see, with the best will we have, how we can possibly build a DTD
that can validate this (unless it's a VERY formal document - legal, patent,
safety, etc.) It's an n-squared problem. So I don't accept that 'namespaces
have broken validation' but rather that complex monolithic XML documents
are inherently unvalidatable except for expensive vital 'in-house'
requirements. I think that XLink provides a solution, but only if we are
happy with passing bundles of documents over the WWW with the belief that
the integrity of the bundle survives. We haven't cracked that yet, have we :-)

Now it happens that I also want to solve (a) and (b) :-). I can't use
namespaces for this, because they deliberately don't solve it. My best hope
is XSchema (or RDF) which allows me to define - hopefully along with lots
of other people - ways of attaching meaning. 

My current approach is then something like:
<?xml version="1.0"?>
<!DOCTYPE VanityHG [
]>
<?jumbo:namespace ns="http://vhg.org.uk" java="jumbo.vhg.*Node"?>
<VanityHG:Garden>
  <VanityHG:Vegetable>carrot</VanityHG:Vegetable>
  <VanityHG:Vegetable href="#artichoke01">artichoke</VanityHG:Vegetable>
  <VHG:termEntry id="artichoke01" xmlns:VHG="http://vhg.org.uk">
    <VHG:term>Jerusalem artichoke</VHG:term>
    <VHG:definition xml:lang="en">A  delicious tuber which causes
flatulence</VHG:definition>
    <VHG:seeAlso href="#artichoke02">globe artichoke</VHG:seeAlso>
  </VHG:termEntry>
  <VHG:termEntry id="artichoke02">
    <VHG:term>globe artichoke</VHG:term>
    <VHG:definition xml:lang="en">A delicious thistle-like
plant</VHG:definition>
    <VHG:seeAlso href="#artichoke01">Jerusalem artichoke</VHG:seeAlso>
  </VHG:termEntry>
</VanityHG:Garden>

This breaks the namespace problem into its components. It says that certain
names (VHG: and possibly some scoped attributes - I'm not yet sure) are
mapped onto the *STRING* "http://vhg.org.uk". It I used a different prefix
(e.g. VirtualHG) in another document I could still relate them through the
ns URI.

The jumbo:namespace PI can be neglected. For those with a JUMBO browser and
jumbo.vhg.*.class it can add semantic enhancement - and I'll show this at
Montreal. But there can be other ways of associating semantics with the
*STRING* "http://vhg.org.uk" - stylesheets, other classes, etc. So it is a
semantic handle onto which anyone can map anything they like. The challenge
is whether we can come up with communal portable tools for doing that
easily and avoiding babelisation.

	P.

Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list