Reusing schema vocabularies

Sun Sep 13 15:55:46 BST 1998

  REUSING SCHEMA VOCABULARIES: THINKING OUT LOUD
================================================

INTRODUCTION
------------

I'm still struggling with trying to figure out why namespaces are
needed, exactly what they achieve, what the cost is and why they are
the current preferred solution of the XML WG. This is an attempt to
clarify my thoughts by writing them down and hopefully having others
inspect (and correct) them.

Note that this is not a well-written and polished paper, just a sort
of 'textual dump' of my thoughts. The headings are there to organize
the dump somewhat and make it easier to read. I'm probably also a
little bit too optimistic about architectures, but I haven't got the
time to modify those parts now.

I will use the term 'DTD' when I refer to XML schemas as they are
defined in XML 1.0, and will use 'schema' to mean 'a DTDs or a schema
in any XML schema language'. I use 'XML 1.1' to refer to 'XML 1.0 as
extended by the namespace WD'.

The namespace WD seems motivated by the need to be able to define
different schema vocabularies in a single document, or to be less
general: the need to be able to reuse element and attribute names from
different DTDs in a single document.

As far as I can see there are currently two ways to achieve this:
namespaces and architectures. I'll try to list the advantages and
disadvantages of each to see if I can understand why the WG has chosen
what it has.

EFFECTS ON OTHER STANDARDS
--------------------------

NAMESPACES

Namespaces, while superficially simple, are really a profound change
to the XML data model: one of the most basic concepts (the concept
'name') is changed from a string to a namespace identifier _and_ a
string. The reuse of schema vocabularies is enabled by this modified
concept of names, allowing processing software to pick out names
belonging to a specific schema/namespace and operate on them.

This is incompatible with the use of names in XML 1.0, which means
that validation and attribute defaulting no longer work as before. In
other words: both validating and non-validating parsers are affected,
but only in the interpretation of the names used in DTDs. (XML 1.0
documents will work with XML 1.1 parsers, but not vice versa for
namespace-using documents.)

To allow validation and attribute defaulting in XML 1.1 the schema
syntax will have to change, whether the new syntax is a modified DTD
syntax or some entirely new schema language. This means that XML 1.1
documents that use namespaces will not be valid SGML documents.

In XML 1.1 it is conceivable that different schemas can be combined
without needing to be rewritten. With the current DTD syntax this will
require a liberal use of ANY content models, which very much weakens
the benefits of validation and structured editors. It is conceivable
that a schema language with features for the extension of the content
model of elements from reused schemas. No such schema language is
available at present.

This also means that to support XML 1.1 parsers must be modified, as
must the DOM and SAX, since they depend on the concept of names, which
has changed. (DOM getElementByTag name should be namespace-aware, for
instance.) XSL and CSS2 will also have to take XML 1.1 into account if
they are to allow stylesheets written for one schema to be used with a
schema that incorporates the first schema. (XSL patterns must then
support the new names.)

XPointer will not need to be modified, since XPointers are designed to
be tailor-written to the document they address into. Any XML query
language will have to be designed for XML 1.1 (which includes XPointer
if XPointer is used as a query language, as it can be). [XLink?]

A last problem with namespaces is less technical and more practical:
namespace names are awkward to work with, since they have a complex
syntax and must be long. This means that all XML applications that
rely on namespaces will be awkward where names are concerned, which is
almost everywhere. 

XML ARCHITECTURES

XML architectures are superficially complex, but require no changes to
the XML data model. They enable the reuse of schema vocabularies by
remapping names from the original document to a new 'virtual'
document, the architectural document.

This means that XML architectures can be layered on top of current
parsers (as XAF and xmlarch.py do), and furthermore that they require
no changes to XML 1.0. This means that SGML compatibility is retained.
Furthermore, it means that DOM, SAX, XSL, CSS2 and possible query
languages will not have to take architectures into account (beyond
allowing users to declare the architecture they wish XSL/CSS2/queries
to apply to), since they operate as before, but on an architectural
document instead of the original one.

In short, XML architectures do not affect any of the standards
currently in use or under design. (As will be seen later the
architecture syntax may have to change, but the effects of this change
are very likely minor.) XML architectures do require schemas reused in
compound schemas to be rewritten.

MEETING THE NOTE-WEBARCH-EXTLANG REQUIREMENTS
---------------------------------------------

Requirement #1:
  "It must be possible to introduce a new vocabulary in part of a
   document in a way that requires changes only locally within the
   document."

Namespaces meet this requirement by allowing new vocabularies to be
introduced on each element.

XML architectures as defined in ISO 10744:1997 A.3 do not meet this
requirement. The interesting question is of course: can they be
modified to do so?

As far as I can see, the answer must be yes. One way to do it might be
to allow the declaring PI to appear anywhere in a document, but only
to have scope from its declaration until an ending PI is met.
Architecture scopes must properly nest within each other (and within
elements).

This modified version of XML architectures meets the two first cases
listed in the motivation for requirement #1 in Note-webarch-extlang,
but not the third. However, the third is not met by namespaces either
and can only be met by a change to the XML 1.0 grammar. Given such a
change, both architectures and namespaces would meet the third case.

Requirement #2:
  "The syntax must unambiguously associate an identifier in a document
   with the related schema without requiring inspection of that or
   another schema."

By using URIs as namespace identifiers namespaces meet this
requirement.

XML architectures do not meet this requirement as they stand, since
the names of two architectures may clash. The modification suggested
above enables XML architectures to meet this requirement just as well
as namespaces do.

Namespace names may not collide in the namespace documents, but
prefixes may. If prefixes collide the inner prefix shadows the outer
one. Prefix collisions do not concern applications, since they use
namespace names to identify elements and attributes.

XML architecture names may also collide, but can be specified to
shadow one another as with prefixes. To enable the unique
identification of architectures (even in the case of collisions)
architecture declaration PIs can be extended with a namespace
attribute that contain an identifying URI.

Requirement #3:
  "It should be possible to create an original document schema such
   that one can determine, without access to the extension schema,
   which uses of extensions to that document can be ignored."

I do not understand this requirement and so cannot comment on it.

SUMMARY
-------

>From this discussion I emerge believing that XML architectures are a
superior solution to the problem of reusing schema vocabularies. They
have far less impact on the XML family of standards than namespaces do
and do not require XML to be modified or that SGML compatibility be
forsaken for documents that reuse schemas.

The nesting of namespaces is slightly more natural than that of
architectures, but since this nesting is only designed for
automatically generated documents (and since heavily nested namespaces
are more or less unreadable for humans anyway) this does not really
matter.

The data model of XML architecures is also much simpler than that of
namespaces, and XML architectures provide far better control over the
data model presented to processors designed for the original schemas.

--Lars M.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)