What are schemata

Mon Jun 1 12:10:44 BST 1998

Let me say first of all that (after wading through the gazillion messages
posted during and after SGML/XML Europe) this is an incredibly exciting and
valuable effort. It was fascinating to see the discussion evolve from an
abstract debate into a concrete effort to actually produce something of (at
very least exploratory) value. Hats off to everyone who helped initiate
this!

Anyway, in regard to partial validation: this seems very much bound up with
the question of what to do about entities (in general) and parameter
entities (in particular). Entities are one of the things I wish had never
been included in XML (along with CDATA, notations, etc.). There have been a
few excellent postings about why a single, unified linking mechanism is
desirable. As I understand it, Simon envisioned this as one of the
advantages of using XML syntax for the schema definition (this was implied
but not explained in detail in his original paper). In any case,
XLink/XPointer are more than adequate to do this in a structured way. If
simple text substitution is needed then a separate orthogonal mechanism
should be used, as Paul pointed out.

So if we postulate that parameter entities will be replaced in XSchema by a
more generic linking mechanism, the problem of partial validation seems to
be simply and elegantly solvable. For example (modifying one of Peter's
previous posts):

<xschema>
  <ElementTypeDef id="greetings">
    <ContentSpec>
    <Seq optional="yes" repeatable="yes>
    <ElementType>politeform</ElementType>
    <ElementType xml:link="simple" role="XSchema"
href="http://www.xschema.org/library/person.xsc#id(person)">person</ElementT
ype>
    </Seq>
    </ContentSpec>
  </ElementTypeDef>
</xschema>

(I changed the outer element type name from "ElementType" to
"ElementTypeDef". Do we want the GI to be the same for both the definition
itself and for references to other element types within the content model?)

Instead of using an entity reference, the link attributes point to the
appropriate definition of the content type for the "person" element. This is
way better than parameter entities because it uses a unified link syntax
*and* is far more flexible. I can whip up a quick schema just by throwing
together a few elements and pointing to their appropriate definitions using
URIs of whatever flavor. Certainly better than copying and pasting parameter
entities.

Another interesting factor is that these links could just as easily be
included in the document itself. So if I have a document which lacks a
schema, but want to validate the "greetings" portion (for which I have a
schema), I could link into the schema at that point:

In this case, only the "greetings" element and its content would be
validated. This is really incredibly powerful considering its simplicity:
the content in question actually gets validated against two separate schemas
(actually just bags of element type definitions) in two different files.

This does complicate the spec a bit, although the principle is actually very
simple. The only real decision is whether it is kosher to use XLink/XPointer
in a schema definition. To me it would seem a crying shame not to allow
this, whatever the implications for the "layering" of the architecture. Of
course, for this we need a linking engine, but at least the effort can be
leveraged for both schemas and documents, demonstrating why a unified syntax
makes sense in the first place.

Matthew

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)