"Multiple" Namespaces? (but NOT for HTML)

Paul Prescod paul at prescod.net
Thu Oct 28 07:53:35 BST 1999


It seems that most of the "usual suspects" did not get around to
discussing your problem which is a pity because I think that it is one
of the most fundamental in the XML world. When I win the lottery, I will
spend a year or two studying it. Like other problems you've brought up,
it is at (er, probably past) the boundary of what we know how to do
scalably with modern technology. Sorry.

I claim that your problem is, in fact, intimately related not only to
the multiple HTML namespaces problem but also to the representation of
XLinks.

Let me first suggest that the solution to your problem is probably not
to put various element type names in one tag. I could be wrong on this
point so I'll trust you to set me straight if that's the case.

>         <DC:Creator GILS:Originator TEI:docAuthor>Tillich</DC:Creator
>                 GILS:Originator TEI:docAuthor>

Now you've said explicitly that your goal is to avoid duplicating the
data in your documents in multiple documents. But is duplicating the
semantic "author" better? I'm guessing that DC:Creator is *always* going
to be a synonym for TEI:docAuthor which means that saying so explicitly
in the document is redundant. It causes all of the usual problems of
database redundancy:

 * It increases the size of your database: it will quadruple (at least)
your indexes. 
 * It increases the possibility for error: authors or data generators
could "forget" to insert a TEI:docAuthor alongside a DC:Creator.
 * It reduces optimization opportunities because the database won't
cache "synonyms" properly.

Old fashioned SGML smelly-ness aside, architectural forms were designed
to solve exactly this problem. Proponents claim that one of their great
virtues is that they allow you to do the mapping in EITHER the document
(duplicating data) OR the DTD (centralizing it). I'm not really happy
with the fact that it allows the "inline" mode, but the "centralized"
mode is just what you need. If you can convince me that you really need
multiple element type names *in each and every tag* then you will be the
first to do so.

As far as your "standards based" requirement: you can't beat an
"International Standard". 

Architectural forms are expressed as attributes but they are supposed to
be INTERPRETED by an architectural processor (like nsgmls and jade) as 
if they were element type names (generic identifiers).  The syntax is,
IMHO, a hack to avoid violating XML's (and SGML's) rules. Note that
XLink borrows heavily from the hack.

I claim then, that what you need is a database that understands either
architectural forms or some similar technology. It would index in terms
of synonyms and recognize that asking for one synonym is as easy as
asking for another. As far as I know, architectural form indexing and
caching has never been implemented in a large-scale (multi-gigabyte) XML
database system but I could be wrong.

There is hope, however. "Out of line" architectural forms are about to
be reinvented as "archetypes." Once they are reinvented in a syntax that
is OO-friendly and W3C approved, it will become obvious that people will
need to do XPath-like queries based not only on element types, but also
on archetypes. Finally, search engine vendors are likely to "get it."
Whether they will be able to develop scalable algorithms to do it in the
general case is another question...

 Paul Prescod



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list