Namespaces, modules and architectures paper available

Wed Feb 4 08:50:48 GMT 1998

http://itrc.uwaterloo.ca/~papresco/sgml/namespaces.html

Why We Need Namespaces (Modules)
An SGML/XML Feature Proposal

Abstract

The World Wide Web Consortium has recently published a note called
Namespaces in XML. Not everyone has access to it yet, but they will
soon. It proposes a simple convention for allowing instances to have
elements whose type names come from many different schemas. According to
that note:

"We envision applications of XML in which a document instance may
contain markup defined in multiple schemas. These schemas may have been
authored independently. One motivation for this is that writing good
schemas is hard, so it is beneficial to reuse parts from existing,
well-designed schemas. Another is the advantage of allowing search
engines or other tools to operate over a range of documents that vary in
many respects but use common names for common element types."

Advocates of ISO architectural forms ("archforms") have noticed that
these requirements are very similar to those for archforms and have
proposed archforms as a solution. They are correct that the basic
underlying problems are related, but the problems are not identical. We
need both archforms and namespaces. The two ideas are actually very
complementary. This note demonstrates why neither architectural forms
nor the current namespace proposal really solve the "namespace problem"
satisfactorily.

Background

I will use the document [1]'A Proposal to Introduce "Module" Structures
Into SGML' as an example of a modules proposal which includes not just a
convention for namespace combination, but a syntax for actually
combining SGML DTD fragments. These fragments are the only standardized
schema for either SGML or XML.

Architectural forms allow a "client document" to declare that certain
elements conform to an element type in a DTD other than the document's
DTD. For instance you could say that a particular element is both a LINK
element in the document's DTD and a HyTime CLINK element in the HyTime
architecture. It is essentially both things at once. You can either
declare a particular element as having an architectural element type (in
addition to its ordinary element type) or you can declare that all of
the elements of a particular type adhere to a particular architectural
element type. For instance you could say that a particular "human"
element conforms to the "animal" architectural element type (if the
human was, for example, a "party animal") or you could say that all
"dog" elements conform to the "animal" architectural element type.

The Rub

A particular element can also conform to multiple architectural element
types. For instance the afore mentioned human could conform to both the
"programmer" and the "party animal" architectural element types (no,
those are not logically exclusive). My claim is that this increased
generality is a powerful feature in many contexts, but makes things way
too complex in the simple case for architectural forms to be the most
basic namespace management facility in XML. SGML and SGML tools are
organized around the idea that each element conforms to one and only one
element type. We have not yet re-thought the SGML processing idea in
terms of the concept of multiple element types. 

For instance, the most common form of SGML processing is validation.
SGML uses DTDs to define constraints on SGML documents. According to the
Japanese proposal, validation could be accomplished less like this: 

<!DOCTYPE MATH.AND.HYPERLINKS [
<!MODULE MATH SYSTEM "math.module.dtd">
<!MODULE HY SYSTEM "hyperlinks.module.dtd">
<!ELEMENT MATH.AND.HYPERLINKS (#PCDATA|HY::LINK|MATH::FORMULA)>
]>

Imagine that math.module.dtd and hyperlinks.module.dtd are hundreds of
lines long. Imagine also that they both had an element called "SET" (for
"mathematical set" and "link set"). As far as I know, there is no way to
accomplish this namespace merging operation with anything close to the
same ease with architectural forms. Yes, I can do it, by copying
math.module.dtd and hyperlinks.module.dtd into my document type. I can
then manually fix up the namespace clashes like my "SET" element. But it
is this sort of duplication of code that the modules proposal was
explicity designed to avoid. In fact, that is it's reason for existing.
We can see, then, that architectural forms do not solve the problem that
the modules proposal was meant to solve. They do not automatically merge
namespaces.

Let me define some terms to clarify. A namespace is a mapping from names
to objects, such as element type names to element types (explicitly or
implicitly declared). A namespace merge is the construction of a
namespace from two others that preserve all of the elements from the
originals. Architectural forms provide access to multiple namespaces,
but they do not merge namespaces.

I suspect that some with a long background in SGML will be a little
baffled trying to understand why someone would want to do this. After
all, combining document types is typically difficult work performed by
experts, tested on teams of users, tweaked to perfection with element
names remapped to fit the terminology of the user community. Mixing and
matching DTD fragments in an ad hoc manner might not seem like a good
idea. But the fact is that we live in a brave new world. End users want
to take control of their own document types in many cases. They want to
mix and match DTD fragments and they are not willing to spend the amount
of effort that we professionals are. Good for them! They will make all
of our lives easier. In fact, when authors say that they want to "get
rid of" DTDs, what they typically mean is that they don't want to be
constrained by someone else's DTD and making their own is too difficult!
If we can make DTD maintenance easier, more people will use them. 

Perhaps it would be possible update SGML that validation does not depend
so deeply on each element having a single element type, so that content
models could be expressed that combined elements from different
architectures. If we did that, my complaint might go away. Architectures
might regain some of the validatory simplicity of the modules proposal.
But this would require a much more fundamental change to SGML than the
modules proposal would.

Stylesheets

I will use stylesheets as another example of processing. The three most
interesting stylesheet languages right now are DSSSL, XSL and CSS. Each
of those has as its central organizing construct a rule triggered on an
element type name in a context. DSSSL has a feature that would allow
querying on architecture, but the feature is optional and is not
supported, for instance, by James Clark's Jade. Even where the feature
is available, the architectural form-based version of a stylesheet is
much more complicated than the equivalent based on a "flat" namespace
(such as a stylesheet for tradition SGML or SGML augmented with the
modules proposal). I invite architectural forms advocates to prove me
wrong by providing their stylesheets.

Here is what a module-enhanced DSSSL might look like:

<module target="mathml.dsl">
<module target="hyperlinks.dsl">
(element MATH.AND.HYPERLINKS (process-children))

As you can see, this has just enough lines to include the relevant
stylesheet modules and provide rules for the new elements. What would
the equivalent archform code look like? With DSSSL as it exists, it
would look quite ugly and convoluted. With some enhanced DSSSL it might
look reasonable (just as some enhanced SGML might be able to have
content models that span architectures), but nobody has yet proposed
what such a DSSSL would look like (just as nobody has proposed the
enhanced SGML). I am open to suggestions... 

I do not believe that either the current XSL proposal or CSS would allow
architecture based processing at all. Once again, the idea that every
element has a single element type is a fundamental organizing principle
of these stylesheet languages. It is also an organizing principle of
most SGML editors, DTD editors and formatting and conversion tools I
have used. In fact, almost every SGML tool in the world operates under
that principle. The best tools will give you access to architectural
forms (through their architectural attributes), but they will typically
use the element type name as the major organizing feature of the
stylesheets. Archform centric processing is typically awkward if it is
possible at all. 

The one element, one elment type principle is also central to every
course in SGML I have ever taken and any book on it I have ever read.
Even the SGML Handbook says that every element has a particular element
type (a single, particular element type). 

The Argument From Usability

Imagine that you are a typical end user and have used archforms instead
of a namespace merging mechanism to combine DTD fragments. Now imagine
that you know that a particular element type name appears in both DTD
fragments. I think that most people would be very surprised to learn
that the way to associate this element with one or the other DTD is to
add an attribute. Because the generic identifier (the name in the
start-tag) usually establishes the element type, you would probably
expect to change the generic identifier to change the association. But
using architectural forms, you would actually rather have to add an
attribute that would essentially disassociate the element with one of
the element types: "I may have the same name as that element type, but
it isn't actually one of my element types." I think that this is a nasty
case of making the common, simple case of merging DTD fragments more
complicated in order to make life easier for those of us who have to
solve problems that may actually require the full generality of
architectural forms. Once again, I invite advocates to send me code
samples that demonstrate that this is simpler than I think. 

Who was it that said: "Make the easy things easy and the hard things
possible." Architectural forms make hard things possible, but when
misapplied to the namespace problem, they make easy things unnecessarily
hard. Le me be clear: architectural forms (or something like them) have
an important role to play in SGML systems. We absolutely need some form
of semantic inheritance mechanism. But they work best when they work in
the environment they were designed for: they are typically used as an
underlying basis of a DTD designed by a professional. The professional
DTD designer renames elements to avoid clashes. That individual is the
real solution to the "namespace problem" in most environments. In
environments where such a person exists, archforms are really, really
useful. They are not useful because they allow you to merge namespaces
(they don't). They are useful because they allow you to combine
semantics from different DTD fragments in powerful ways (but more or
less manually). I think that a modules/namespaces proposal would
acutally be very useful for building architectures from DTD fragments. I
also think that architectural forms would be very useful on the Web. Not
every use of XML on the web will be ad hoc. Some XML applications will
need the robust multi-level validation that architectural forms allow.
Think about e-commerce for example.

But many users will not need or want architectural forms. Most people
just need a simple way to combine fixed DTD fragments so that there are
no name clashes. The Japanese module proposal provides such a mechanism.
Presumably Web-centric DTD-replacement schema languages will provide
mechanisms like this also. If these sorts of things are made much easier
in these schema languages than they are in SGML DTD syntax, people will
just avoid SGML DTD syntax. This would be a big mistake for all
concerned. Let's please just fix SGML through a proposal like the one
submitted by the Japanese in 1996. Some modules proposal should be part
of the SGML revision. This would in no way preclude the wide deployment
of architectural forms as a solution to a different problem.

 Paul Prescod
--
http://itrc.uwaterloo.ca/~papresco

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)