XSchema: ease of use (design goal 5)

Tue Jun 2 20:06:44 BST 1998

Michael Kay wrote:
>Trying to solve this problem my imagination started running
>away with me:
>
>* one of the limitations of XML is that we cannot constrain
>the content of
>character strings (in attributes or PCDATA) in a DTD
>* the ideal way to define such constraints would be with
>regular expressions or BNF production rules
>* let's call "XML with constrained character strings" Rich
>XML or RXML. Of course an RXML document is an XML document;
>it just has some "content validity" rules in addition to the
>XML well-formedness and validation rules.
>* we can imagine a "pre-parser" which takes an RXML document
>and automatically generates additional XML markup so that
>all the syntax is now fully accessible as elements and
>attributes. This pre-parsed document would no longer be
>easily readable, but it would be easily processable using
>standard XML tools
>* We could encode both the current DTD information and the
>additional constraints in RXML
>* if we use RXML rather than plain XML to encode our DTD, we
>can continue to use BNF-like production rules written as
>text, while still being able to process the thing using
>general-purpose XML machinery.
>
>In other words, I think we have here not only a solution to
>the usability dilemma posed at the beginning of this
>posting, but a generally useful extension to the
>capabilities of XML.

There are several issues here:

1) What is easiest to author?
2) What is easiest to view?
3) What is easiest to process?

The current content model syntax is terse and easy to author (once you have
mastered it). Nevertheless, the design goals of XML very clearly point
towards favoring ease of processing over terseness, as Rick points out in
his reply. This is why SHORTTAG and co. are not in XML and why calls for
empty endtags led to such hue and cry. So 3) should take precedence,
implying an XML-based syntax. The whole point of unifying document and
schema syntax is to take advantage of existing tools when working with
schemas; this applies particularly to editors and browsers (as has been
stated numerous times).

On the other hand, I wonder if Mike's suggestion regarding "pre-parsing"
could not be linked with Peter's early suggestion about binding logic to
element type descriptions. In the same way, I could (when XSchema is
completed) design my own element type for content models which is tied to a
pre-parser (e.g. in Java) which spits out the element and attributes defined
in standard XSchema. In other words, exactly what Mike describes except that
standard XSchema functionality rather than ad hoc mechanisms are used to get
the desired result. Something like:

In the metaschema for my XSchema variant:
<ElementTypeDef id="DTDContent">
<ContentSpec>
<!-- Don't suppose this Seq is really needed but it makes the point
clearer -->
<Seq optional="required" repeatable="no">
#PCDATA
</Seq>
</ContentSpec>
<Link role="behavior" href="http://.../contentmodelpreprocessor.class">
</ElementTypeDef>

In my schema itself:
<ElementTypeDef id="SomeElement">
<DTDContent>
FOO, (BAR | BAZ)
</DTDContent>
</ElementTypeDef>

So the use of DTD content models ends up being part of the "standard", but
in a highly generic way requiring only that someone write the
"ContentModelPreprocessor" class. Seem nicely self-referential to me...

BTW: Anyone who thinks traditional content models are so easy to write
should try to do so on a Czech keyboard. :-)

Matthew

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)