XSchema: ease of use (design goal 5)
ricko at allette.com.au
Tue Jun 2 17:45:15 BST 1998
> From: Michael Kay
> * one of the limitations of XML is that we cannot constrain
> the content of
> character strings (in attributes or PCDATA) in a DTD
> * the ideal way to define such constraints would be with
> regular expressions or BNF production rules
> * let's call "XML with constrained character strings" Rich
> XML or RXML. Of course an RXML document is an XML document;
> it just has some "content validity" rules in addition to the
> XML well-formedness and validation rules.
Actually, there is already a standard syntax to let you constrain the
content of character strings, using the SGML Lexical Typing Definition
Requirements (LTDR). In standard jargon this is called "lexical typing": the
stated intention of the ISO Working Group maintaining SGML is to move these
into the SGML standard--at the moment they are just parked as part of the
HyperText and Time Scheduling Languages standard (HyTime).
LTDR Lexical typing provides a syntax to let you use any syntax you like to
constain strings. In particular it allows POSIX regular expressions. There
is also a token-matching language, which can look quite like content models.
You can see the standard online at
<RJ:PLUG>You can also see this explained in my book "The XML & SGML
Cookbook" (just out this week !!!) on page 2-117 "Defining Data Types".
You can embed your own syntax inside elements using NOTATION attributes, or
using PIs, of course.
> * we can imagine a "pre-parser" which takes an RXML document
> and automatically generates additional XML markup so that
> all the syntax is now fully accessible as elements and
> attributes. This pre-parsed document would no longer be
> easily readable, but it would be easily processable using
> standard XML tools
If you are talking about a pre-parser, then the danger is that you are no
longer using XML. The ability to alias strings and delimiters to (entity
references which deference to) elements, PIs, entity references etc. was
part of SGML (i.e., SHORTREF, DATATAG) that was jettisoned for XML. If you
are talking about the ability to pre-process an entity according to a
(pipeline of) processes, that also is part of SGML Extended Facilities
(i.e., Formal System Identifiers) which was jettisoned by XML when they
decided to use URI syntax (It could be introduced again easily: or the URI
query syntax could of course be used to provide extra parameters too; or a
custom XML document-type could specify a tranformation itself as part of
storage/entity management--but then the data is not XML, and so is outside
the XSchema scope, I think.)
> * We could encode both the current DTD information and the
> additional constraints in RXML
> * if we use RXML rather than plain XML to encode our DTD, we
> can continue to use BNF-like production rules written as
> text, while still being able to process the thing using
> general-purpose XML machinery.
HyTime LDTR went down this track (i.e. a variant, specialized syntax) and I
think it did not work so well.
But I would be happy if the XML markup declarations were given explicit
specifications in terms of XSchema: that would allow a simple preprocessor
to convert existing DTDs into XSchemas.
We have had two constituencies here: one wants to have declarations using
element syntax, the other wants to add new constraints. I would again
submit that there is a third which we should not ignore: people who might
not want to either change to XML markup declarations or to something like
XData, but want to keep their old schema-definition systems. Anyone who has
played with Adobe's excellent EDD knows how mind-numbing things can get when
you have to support one schema syntax with another almost identical one,
even with sophisticated translators.
The XML & SGML Cookbook, by Rick Jelliffe
Charles F. Goldfarb Series on Open Information Management
656 pages + CD-ROM, Prentice Hall 1998, ISBN 0-13-614233-0
http://www.phptr.com/ > Book Search > "Jelliffe"
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev