Another look at namespaces

Rick Jelliffe ricko at allette.com.au
Mon Sep 20 13:12:45 BST 1999


From: James Tauber <jtauber at jtauber.com>

Simon St.Laurent wrote:
>> Whether or not _The SGML Cookbook_ makes this claim that a
>> language is not constrained to a single grammar, I'll be making
>> that claim in my next book,
>
>Depending on what you mean by "not constrained to" and what type of
>languages you mean, I either completely agree or completely disagree
with
>you :-)

My book, "The XML & SGML Cookbook", which is the only book in
print AFAIR on the subjects of markup schemas in general (Eve Maler's
book  is a methodology of creating DTDs and also contains much in
this regard, Dave Megginson's book looks at specific  DTDs and similarly
has much thought on these issues).

I am looking forward to Simon's book too, and I am sure that many
new ideas will come out of the XML Schema development effort. If
other authors have books on schemas in general (i.e., not the syntax)
then I am sure that the XML-DEV readers will appreciate a
notification of its existence.

My book has nothing on the subject of formal language theory, though
it certainly mentions that architectural forms allow parallel content
models which work together to describe the syntax of a language. (I.e.,
we could say
that a document satisfies a language if it is both valid against the
direct
DTD and valid when validated against the architecture.)  This is a clear
example of a language requiring more than one content model.

I think Paul's original comment mentioning my book is because IMHO
markup languages and DTDs are not currently about data modeling
but instead are concerned with software engineering.  So in my chapter
on this I mention that Yourdan and Constantine's ideas of "cohesion
and coupling" of software modules also applies well to DTDs, to guide
when to make elements sub-elements or siblings. (People who need
XML to be concerned with data-modeling will naturally be confised
when to use an attribute or element, etc.)

So in another post I took this cohesion/coupling idea one step further
to say that schema languages should be modeling these cohesion/coupling
relationships; superficial grammars such as content models hide these,
no matter how much we valiantly use PEs to document them.

The example I gave was the html:p element.  IMHO defining and
namespace-naming the html:p element in terms of its content model
elevates the content model to an importance in describing the actual
schema of HTML too much.  An html:p element is a block element
found in the body element of an HTML document whereever general
block elements can be placed, and it can contain mixed content, with
any of the inline elements that are found in mixed content underneath
the body element.   That is the deeper syntax for html:p, as far as I
can see.

Now you could model this using a grammar by introducing
productions to name content models:

    html:p    ::=    ( $generic-text-elements )*
     $generic-text-elements ::= #PCDATA | html:em | html:a |...

This reflects the deeper structure better. Also, it makes it clear
that a change in generic-text-elements does not bubble up to
require some change in our understanding of html:p.

The html:p element type is highly cohesive to the html:body
element and to general-inline-text element types. But the
general-inline-text
element types are not highly cohesive to paragraphs: they can also
appear
in many other elements.  There is an asymmetry to the cohesion that
reveals that there is an intermediate layer;  a schema describes
couplings
not cohesion, but it is good software engineering practise to only
couple
highly cohesive modules.  Otherwise you over or under-specify the
structure.  An appropriate schema language allows fine-enough grained
coupling of highly cohesive modules (element types, here): for example,
the use EBNF by RDF.

To go back to my orignal statement that Paul was responding to:
it disputed "the idea..that a language is defined by a single set of
content models".  Paul just wanted me to avoid repeating the
cohesion/coupling line and try to explain my viewpoint w.r.t.
formal languages  (grammars) I think.

To reiterate, content models (i.e., the specific technology
in XML, regular expressions on the child axis where there
is no first-class way to label parts of content models)
desribe superficial syntax only. They can describe the superficial
syntax, and define validation. But content models do not
necessarily define a markup language: they may hide as
much as they describe.

Rick Jelliffe

The XML & SGML Cookbook: Recipes for Structured Information
Charles F. Goldfarb Series on Structured Information Management
Prentice Hall, 650 pages + CD-ROM
ISBN 0-13-614223-0


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list