We do not need ampersand (was Re: XML-Data, "&" and inheritance )

Tue Apr 28 12:52:42 BST 1998

Andrew Layman wrote:
> So we have a funny situation in XML in which we've tried to make processing
> easier by forbidding certain things in the DTD, but the result is that
> people either avoid DTDs altogether or write bogus DTDs that don't fully
> describe the real syntax.  That is, we've simplified the implementation by
> being unable to express the intended syntax. 

Actually, we've made this decision dozens of times in XML. In fact, almost
every feature we left out of SGML fell under the heading of "simplified
implementation by being unable to express the intended syntax." And SGML
*already* left out hundreds of things that people have asked for to help
them model things more accurately (such as 5 to 10 occurrences of FOO,
mixed in with #PCDATA). These must be checked at the application level.

So I don't think that we *need* the ampersand operator, though we might
want it. Reasons why we don't are below.

> The
> motivating factor for including the ampersand operator in XML-Data is the
> significant number of customers who have asked for it.  In discussing DTDs
> with me and others, they showed examples like the following:
> 
>         <!ELEMENT person
> ((firstname|middlename|lastname|age|shoesize|hair|eyes|height|weight)*) >
> 
> When I've asked what this construction means, they said, in effect "What I
> mean is that the elements can occur in any order, but there isn't any good
> way to say that in XML DTDs."

This relates to the fact that SGML straddles the line between an object
representation system and a document language definition system. Here is
something from a paper I have been working on for a few months:

"SGML defines both a language definition system and a (simple) type
system."

"One of the results of SGML's dual nature as a type system and a language 
is the existence of attributes. Attributes are like properties
on objects. Context-free and regular grammars have no equivalent concept.
There is something called an attribute grammar, but those create 
attributes on the parse tree, not in the language itself. An SGML 
document is very  much like a parse tree, which is why attributes exist 
and work. But at the same time, they cause problems. Language-based query 
languages must be artificially enhanced to handle attributes (and 
Murata's does not yet). Automata must be enhanced to validate them as 
well. Their inherent lack of ordering (like properties on an object) 
makes them difficult to translate into a regular-language based 
framework.

The question is: are they useful or convenient? If they are merely 
convenient because they can be typed quickly, then we can invent a 
short-form syntax for elements that make those similarly short. The 
semantic of property-of can be emulated at the application level, 
as it is today when properties are too complex to be able to fit 
in attribute values.

The opposite view (which I have sometimes held in the past) is that 
SGML should move wholeheartedly to embrace the object model view 
of documents, and make attributes even more useful. Attributes could 
have content models, sub-elements, sub-attributes and so forth. In 
this view, sequence is only occasionally needed. Paragraphs should be 
ordered, but the title for a section could be encoded before
or after the content of the section, as long as the application can 
find it (based on its property name) when it needs it. This is similar 
to object oriented programming or knowledge representation languages 
where properties can usually be listed in any order. This stands in 
contrast to the document processing world, where ordering is almost 
always more important than property-of relationships.

Which view you hold probably depends on what your background is, and 
what problems you are trying to solve right now."

This dichotomy explains why some think that XML-Data "disappearing 
property" subtyping is good enough and others think: "that would only 
solve a tiny subset of the problem." XML-Data style inheritance is 
just fine for knowledge representation systems and almost useful for
documents.

Let me further say that if SGML and XML would move wholeheartedly 
into the language definition realm and out of the object/property 
definition world, then we could add language definition features 
that would allow the modelling of properties *at the application 
level*. For example, we could define content models that would 
allow:

<PERSON><PROP><HAIR>Blue</></>
        <PROP><WEIGHT>800k</></>
        <PROP><HEIGHT>2m</></>
</PERSON>

But not:

<PERSON><PROP><HAIR>Blue</></>
        <PROP><HAIR>Red</></>
</PERSON>

This is a contextual constraint on siblings and can be expressed in
Forest-Automaton based DTDs like those described by Murata-san at SGML/XML
97. But in my opinion, the context-sensitive view of the world is not very
compatible with the idea of SGML *as* a type system. As soon as you start
introducing contextual constraints on the content of elements, you
severely weaken the concept of an "element type." What is the content
model of the element type above? It depends on what is happening around
it! Should each prop element have the same attributes? Maybe not: maybe it
should depend on its content.

The Forest automata theory is very compatible, on the other hand, with
SGML being used *underneath* type systems, as I have demonstrated above.
The language enforces uniqueness and the application implies a type system
on top of it.

We could also move the other way, wholeheartedly into the object
definition realm. But then we would still not add the ampersand operator.
We should just allow attributes to have content models and structured
content. The problem with going in the other direction is that SGML and
XML are first and foremost for defining languages, so you can't really
deprecate the features that make them powerful in that way. Without that,
they are no more interesting than S-expressions.

I've been trying to figure out how to move forward both as a language
system and a type system for a while. I'm coming to think it is
impossible. I am leaning, lately, to the view that we should move to a
language-centric view of SGML/XML and move issues of "type" to a higher
layer. But I might think differently a few months from now...

 Paul Prescod  - http://itrc.uwaterloo.ca/~papresco

"Perpetually obsolescing and thus losing all data and programs every 10
years (the current pattern) is no way to run an information economy or
a civilization." - Stewart Brand, founder of the Whole Earth Catalog
http://www.wired.com/news/news/culture/story/10124.html

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)