direct/indirect/parallel structure modeling [RE: Douglas Hofstadter's 3-level message model]

Rick Jelliffe ricko at allette.com.au
Tue Jun 23 16:06:32 BST 1998


I think it is the business of international standards (e.g. SGML), vendor
profiles (e.g. XML), and interest group specifications is to feed into each
other ideas and solutions. There is a tremendous amount of potential for
interaction between them all.

Hofstadter's 3-level model is, to me, a more useful way to talk than
"syntax" versus "semantics". I would say that one primary purpose of
pan-developer groups such as XML-DEV is to try to push for the inclusion
into the XML effort of well-layered "outer messages", which give domain
experts powerful enough tools to construct their "inner messages"

One of the most useful "outer layer" messages that XML-DEV could push for
W3C recognition of (i.e. in the future developments of XML-Data or
RDF-schema or XSchema or whatever emerges) is a standard attribute to allow
you to say

* "when I give this ID or href, I mean that you use the document as if the
element located were physically included at this point"; or
* "when I give this ID or href, I mean that you use the document as if the
contents of the element which is located were physically included at this
point"; or
* "when I give this ID or href, I mean that you use the document as if the
attribute x of that element was specified for this element"; or
* "when I give this ID or href, I mean that you use the document as if the
value of the attribute x of that element was specified as the value of
attrubute y for this element".

In other words, to "redirect" location addresses from the direct value to
the indirectly-addressed value. The usefulness of having a standard way to
specify this should be intuitively obvious by analogy to any programmer if
they tried to imagine a programming language where the array item  a[1]
always meant "a[1]" the string and never the contents of a[1].  This is the
situation with XML at the moment.

Entity references are no good for this, because it can upset ID uniqueness.
XLL may have something of this in it, in that it has "embed" or "replace"
but I am not sure these are at level of structure modelling; they may be
more concerned with browser policy.

I think XML needs:

1) A richer set of standard notations (e.g. Tim's database data types) and a
way to define simple new notations (e.g. ISO's Lexical Typing Definition
mechanism for defining new simple lexical models
http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-A.2.html), with a way to
specify that these should apply to content (i.e. what the XML NOTATION
attribute does but people dont realise it) and attributes (e.g. HyTime's
Lexical Type specification mechanism
http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-A.5.4.html).

In other words, to be able to say "the value of this date attribute has this
syntax".

(ISO also has defined a new attribute type, DATA which it would great if XML
picked up. It also addresses this issue in a really simple way.)

2) A richer basic vocabulary to allow indirection, as mentioned above. These
things are called "value references" and are found as part of ISO HyTime
http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-6.7.html

3) A way to subtype IDREFs so that they can only point to particular IDs.
There is a standard way to do this using the HyTime General Architecture's
ireftype http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-A.5.5.html

4) A definite way to name and freeze the DTD used for one document.

5) An "opaqueness" mechanism to allow arbitrary elements to be interposed
without upsetting the content model. (This is in the XML-data proposal, and
also is part of the SGML General Architecture
http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-A.5.2.html, and also has
been the subject of several related proposals to ISO, e.g. an "#ANY" keyword
for content models, Paul Prescod's inheritance suggestions, and the calls
for "partial validation" or "partial content models"--I am not really sure
of the status of this at ISO).


To me, these things are more important than the introduction of a
generalized "Architectural Form" mechanism into XML or a new schema
language. I have great sympathy for the people who want a more powerful
schema system for SGML, but I would certainly hope that people do not think
that SGML ends with ISO 8879 content models: a great deal of work for the
last 7 years has been spent on exactly these kind of issues, and the HyTime
standard and its supporting annexes have a rich range of solutions.

In other words, these things are limitations of XML if we choose to ignore
the International Standards which layer appropriate solutions on top of
them.

What is required is to figure out what is simple to explain to journalists
(that is the test: if a blind-drunk journalist (BDJ) cannot understand
something in a couple of days, the public never will), what is powerful for
users, and how to extract this gold from from the mountain of material in
HyTime.  The efforts to reduplicate the XML content models in instance
syntax are, to some extent, just reinforcing their incompleteness.

I have recently been saying some negative things about Architectural Forms,
but I hope no-one will get the impression that my comments speak ill of
their power rather than of their relevance (wrt GI naming conventions) and
their priority in the list of things that are needed currently.  I happen to
think the needs to improve the direct/indirect structure-modeling
capabilities of XML are greater than the need to improve the
parallel-structure modeling capabilities which AFs allow. (What is worse, I
think that talking about AFs too much actually makes people shy away from
the other good stuff in ISO HyTime and its annexes.)


> From: Peter Murray-Rust
> At 17:36 22/06/98 -0400, John Cowan wrote:
> [...]
> >from them.  At present, the only way we know to create such
> >universally understood messages is to write messages in some
> >(human or programming) language which can be presumed to be understood
> >everywhere.  The fact that no such language(s) exist is the
> >rationale for struggling to create them.
>
> I think on this list we have to make the fuzzy assumption that we can do a
> lot with existing computer languages... [i.e. I suspect that - unless it's
> very clear - a 3-level message will confuse people. In a way this happens
> in o RDF - one can get layers of meaning.

Yes, but if there is not enough structural-modeling capability, then
something like RDF will become more complicated than it needs to be, because
people will be scratching there heads saying "I expect to be able to have
some indirection here, but I cannot see how to do it".  In this case, the
ISO answer is not "ooh that is up to you to define" as you might expect; on
the contrary it was "yes that is an important thing to do: here a standard
way to do it--value references".


Rick Jelliffe


(PS I have a discussion of some of these things in my book.
I was very concerned in my book that current SGML literature
has concentrated on the minutae of SGML syntax rather than
the next generation of ideas which build on it, and I think
XML books will follow the same pattern for the first few years.

==========================================================
The XML & SGML Cookbook, by Rick Jelliffe
Charles F. Goldfarb Series on Open Information Management
656 pages + CD-ROM, Prentice Hall 1998, ISBN 0-13-614223-0
http://www.sil.org/sgml/jelliffeXMLAnn.html
http://www.phptr.com/  > Book Search > "Jelliffe"
http://www.amazon.com/exec/obidos/ASIN/0136142230/002-4102466-3352420


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list