XSD: Extensions

Thu May 28 07:06:26 BST 1998

Simon St.Laurent wrote:
> 
> Hopefully, in addition to the convenience factor of using the same parser for
> SDDs as is used for XML documents, we're building a base - a core module -
> which other standards can use to extend the kinds of information used to
> describes elements and attributes.  There are many concerns about the kinds of
> extensions that this will make possible, but hopefully we can rein them in
> enough to make XML more powerful without making it brutally chaotic.

I've been thinking about this issue at length.

There are three kinds of extensions I can imagine:

 1. Metadata extensions. These are harmless and quite powerful and useful.

 2. Constraint addition extensions. A constraint extension is an extension
to the basic specification which, when interpreted properly, rejects some
documents that the basic specification accepts. As an example, an
XLink-checker extension could reject documents with broken internal
XLinks. This is a little bit disturbing. Consider: my doctor sends me his
XSchema for "patient appointment requests." I fill it out and verify it
with "MSXSchema". MSXSchema gives the document a clean bill of health. I
send it to my doctor, who verifies it with "MozXSchema". MozXSchema
supports some proprietary constraint addition extensions. Unbeknownst to
me, the XSchema used those extensions. Now my supposedly valid document is
not.

 3. Constraint relaxing extensions. This form does the opposite. For
instance we might define NUMBER to take only numerals, as XML does.
Spyglass might make an extension that allows NUMBERs to have periods and a
leading minus sign...this makes numbers more flexible. Someone at Spyglass
sends me a document that they have verified with no problems. MSXSchema
chokes on it because it doesn't know that it was supposed to have relaxed
the attribute character constraint.

I'm starting to remember why I was so leery of schema extensibility in the
first place. Essentially, we are running into the verification version of
the "HTML Optimized for Netscape" problem. The solution to the HTML
problem was XML+XSL -- extensibility and a way to define the results of
extensions. The equivalent in SDD would have to be some way to "plug in"
schema language extensions in a portable way (ECMAScript, Java?). This is
way outside of our mandate. Thus, I think that these forms of extensions
are also outside of our mandate.

I would suggest that the only type of extension that should be explicitly
allowed in version 1.0 is metadata. The verification language's semantics
should not themselves be extensible. E.g. every XSchema 1.0 verifier in
the world should agree on whether a document conforms to a particular
XSchema or not.

There is an interesting point in all of this: prohibiting extensions to
the verification language's semantics is not the same as completely
disallowing extensions. By analogy, XML's semantics are not extensible
(elements are always elements, attributes are always attributes, a
validating parser's acceptance or rejection of a document cannot be
changed, etc.) but XML allows semantics to be layered, through processing
instructions and special attributes or elements. In fact, all of the
XML-specific processing instructions are semantics layered on top of
SGML's. So by the time you layer your own semantics on, you are three
levels deep (which is why I get distressed when people say XML has no
semantics...what is xml:lang...if it was in your favourite DTD, wouldn't
you say it has semantics?)

We can do the same in XSchema. The XLink verifier's rules can be opaque
"metadata" as far as an XSchema verifier is concerned, but another
process, for instance an XLink verifier can still do its job based on this
metadata. What does this subtle difference mean in practice? It means that
it would be illegal for MozXSchema to say:

"This document does not conform to the XSchema specification because of a
broken link." or "This document does not conform to the XSchema
specification because the 'credit card' value does not have a valid
checksum"

it would be required to say:

"This document verifies according to the XSchema specification."

but it could then add:

"But some links are broken."
"But a credit card value does not have a valid checksum."

This is no different than an HTML checker that says that the document
conforms to its DTD, but some URLs point to non-existent references. The
layering enforces a strict, unchanging definition of "XSchema
verification" that makes XSchema useful and worthwhile. We will still run
into the problem of having a document verify on one desktop and not
another, but it will not be inconsistent *XSchema* verification that is to
blame. It will be rather an unsupported "extra" non-XSchema layer. 

This is analogous to the way that some documents that validate fine on an
SGML parser will cause an error message on an XML parser, because of XML's
extra semantics and requirements. And of course, some XML documents will
not validate properly as CML documents, because of CML's extra layer of
semantics and requirements. Nobody claims it is SGML's fault when a random
non-XML SGML document fails in an XML parser, or XML's when a non-CML XML
document fails in a CML viewer. Each layer is itself non-extensible, and
the move from one layer to the next is fairly clear for those who know
what they are doing.

 Paul Prescod  - http://itrc.uwaterloo.ca/~papresco

Three things never trust in: That's the vendor's final bill
The promises your boss makes, and the customer's good will 
http://www.geezjan.org/humor/computers/threes.html

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)