Semantics (was Re: Inheritance in XML [^*])

Paul Prescod papresco at technologist.com
Fri Apr 24 09:16:02 BST 1998


Gregg Reynolds wrote:
> 
> Replace "element", "element type", "notation", etc, with "foo", "bar",
> "baz", etc.  Do you still have semantics?  The English words used by the
> spec happen to have commonly understood "meanings" which, to my eye,
> color the discussion in unfortunate ways.  

Good question. If the words were replaced with arbitrary strings of
letters, I believe that the XML REC would still have semantics.

> And what exactly is the
> semantics of "semantic concepts like ... that are *described by* the
> syntax"?  Isn't that begging the question a little bit?

It would be, except that everybody *depends on* these semantics. The body
that "owns" XML (W3C) is producing other specs. left, right and center
based on the described abstractions. For example, the DOM couldn't give
two farts about the syntax of a document. It moves seamlessly between HTML
syntax, XML syntax and could easily handle SGML (and probably or VRML, or
even PDF) syntax too. It cares about the abstract structure -- the tree of
attributed-elements described by an XML document. If XML has no semantics,
then how can it describe an abstract tree? If it doesn't describe a tree,
then what the heck is the DOM based on?

So I'm convinced that the XML WG believes (unknowingly!) that XML has
semantics even as they deny it. The concrete step that they could take to
prove that I am wrong is to require the DOM to be defined in terms of
XML's syntax instead of the tree abstraction.
 
> "Colorless green ideas sleep furiously."  Chomsky, late 50s or
> thereabouts.  Adj adj n v adv.
> English has semantics.  The quoted sentence has syntax, not semantics.

Right: it doesn't mean anything. But an XML document does mean something:
it is a linearization of an attributed element tree. If it can be
interpreted as NOT a linearization of this abstraction, then the DOM
rather falls apart. And it isn't just the DOM: XLL, MathML, SAX etc. have
the same problem. They are all defined in terms of the abstraction, not
the syntax. You can't both depend on the abstraction and claim it doesn't
exist.

> So I'm left wondering why we don't have formal definition for all this
> stuff.  The editors of the standard look like a pretty impressive bunch,
> which leaves me all the more mystified as to why prose instead of a
> formal language.  

It's a W3C standard. Look at HTML 4.0 and tell us about "prose."
Unofficially, W3C standards are intended to be partially tutorials as well
as specifications. On the other hand, Dan Connolly pushed harder for
formality than anyone, so it is probably more the "web community" that
drives this than the W3C staff.

I think we could have done better in this particular area without going
completely over to formal notation. As David M. pointed out to me in an
off-line conversation, the REC is very explicit that a processor must pass
whitespace to an application, but doesn't say that it must pass other
character data along! I attribute this to a half-hearted attempt to leave
semantics out.

 Paul Prescod  - http://itrc.uwaterloo.ca/~papresco

"Perpetually obsolescing and thus losing all data and programs every 10
years (the current pattern) is no way to run an information economy or
a civilization." - Stewart Brand, founder of the Whole Earth Catalog
http://www.wired.com/news/news/culture/story/10124.html

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list