Inheritance in XML

Mon Apr 20 11:19:13 BST 1998

Robin,

You really hit the nail on the head with this post! These are exactly the
kinds of issues that I was having some trouble expressing in my previous
mail. I have read this thread with great interest, and it seems to me that
if we synthesize the discussion we are getting close to the heart of the
matter. Here is my attempt:

* Terminology *

I personally don't agree that there are carved-in-stone, well-understood
definitions for terms like "inheritance" and "subtyping" in XML. While there
surely
are in certain, specific contexts, we are talking about something new, i.e.
inheritance in XML, and what we really need to do is chose a term and define
it precisely. Does HyTime model inheritance? It does if my definition of
inheritance in XML corresponds to what HyTime does (it doesn't: see below).
Is
"subtyping" a better term. No, because it doesn't have the same resonance as
the word "inheritance" among non-programmer types.

I'll make a first attempt:
"Inheritance in XML refers to the process of creating new element types that
duplicate the content model and attribute list of existing element types (in
the same or a seperate "base" DTD), while extending these to include
additional attributes and/or content. As such, instances of the new element
types can be used wherever the base element type can be used, and can be
processed polymorphically by any external processor which knows about the
base element type."

* HyTime *

I read through Eliot's post and understood some of it. :-) I never meant
to question any design decisions made in the specification of HyTime. They
are all well-justified in the context which prevailed at the time. Despite
the fact that HyTime models derivation (I'll stay away from the i-word in
light
of the definition given above) of
instances and not of schemata, it remains one of the few attempts that have
been made at deriving document types and as such is an extremely valuable
basis for the thinking about a true inheritance mechanism for XML. To meet
the definition I proposed above, this mechanism would have to extend the DTD
syntax or create a new one (see below). The goals and uses of HyTime
derivation are and will continue to be somewhat different from this; I was
only trying to point out that we can benefit greatly from the experience
gained from HyTime in thinking about XML inheritance.

* Semantics and XML *

In last month's Wired, XML made it into the "hype list" with the comment
that we crazy XML types are kidding ourselves because XML will never fly
without well-defined semantics. These sentiments were echoed by several
posts on this list. I agree 100% percent, but as several people pointed out,
there are already a lot of semantics associated with XML, to the extent that
there are semantics associated with the idea of a hierarchy and with the
HAS-A relationship. XML-Link and XSL introduce a very valuable additional
set of semantic relationships. We are all so excited about XML, as opposed
to Excel files, Postscript or what have you, because there are tools like
XML parsers, editors and browsers which have value across the whole range of
XML applications.

I can write an XML file, and to the extent that existing semantics are
sufficient, I can do useful work with this file. I can, for example, display
it as a hierarchy. I can't do anything at all with an Excel file unless I
have Excel. This doesn't eliminate the need to define the specific
semantics of a given schema. This can only be done with clear documentation,
as Paul pointed out. What we can do is capture the semantics expressed in
this documentation and use them as the basis for new schemata. Sure, a lot
of this can be done using "parameter-entity hacks", or by writing content
models out by hand, but this isn't going to be an effective way to bring XML
to the masses.

The whole discussion about XML semantics is very apt in this context
precisely because inheritance is so important for making XML really useful.
Let me give an example implied by Peter (in reference to the agglutination
of DTDs for nuclear power plant software). Let's say that I am developing an
advanced medical diagnosis system based on chemical analysis of blood
samples. Part of the application is a hardware device which looks for
specific molecules in the sample and displays them on a monitor in 3D. I
decide to use CML to model these molecules, but I need to add additional
attributes and content to the molecule description which are specific to my
application. With the kind of inheritance mechanism I am talking about, I
could download a CML viewer and use it "out of the box" to display the
molecules, while still passing the entire XML structure (with my additional
information) to the application with attempts to create a diagnosis. Without
XML inheritance, I will probably "break" the viewer, so I find myself wading
through and adapting a lot of Java code. At this point I start wondering why
I decided to use XML in the first place...

* DTDs and schemata *

Francois Chahuneau's article makes a very effective argument for why we need
to extend or replace DTD syntax (thanks Robin). XML-Data is a reasonable
attempt to do so, but it is understandly controversial because it is a such
a radical departure from the existing syntax. I quite like the idea of an
alternate, XML-based schema syntax, but the real lesson of XML-Data is that
creating an effective inheritance mechanism isn't rocket science. All that
is really needed is a keyword that says "this element type is derived from
that element type". Something like:

<!element dog extends animal...

...where the subsequent content model and attribute list are understood as
being extensions to those of the base element type. The only other issue is
whether more complex handling of the context model is needed.

* Content model *

XML-Data (if I understand correctly) simply tacks any new content for a
derived element type at the end of the base content model. A valid question,
addressed briefly in my previous post, would be whether more robustness is
needed in modifying the existing content model. Steve and Robin both
mentioned this aspect as well; one of the most powerful features of
SGML/XML, as compared with OO languages, is the fact that content is
ordered. It would be nice, therefore, to take this into account in any
putative inheritance mechanism. Things like SGML exclusions don't fit the
above-mentioned definition of inheritance, for the reasons mentioned by
Robin (and others) in his post.

Having given this some more thought, I don't see any practical way to insert
new content in the middle of an existing content model. Maybe someone
cleverer than I has an idea about how this might be done (and whether it is
really useful). In the meantime, one useful approach might be to at least
enable new content to be added at the beginning of the base content model by
adding a #BASECONTENT keyword which is replaced by the base content model in
the derived element type description:

<!element dog extends animal (breed,#BASECONTENT,fleas*)>

This would simply mean that the breed element precedes the content of the
base element type, which is then followed optionally by some flea elements.
This approach is probably sufficient, since other modifications to the base
content model could be taken into account in the design phase of the base
schema (i.e. by breaking up monolithic elements, if necessary).

* What now? *

More tricky than any of these technical issues is the question of what, if
anything, could be done to promote a mechanism of this sort. Obviously this
would require a change to the XML spec as well as modification to all
existing tools which process DTDs, so it's a pretty big deal. I wonder if
anyone besides me thinks that a simple mechanism like this would make sense.
If so, is there any room in the XML standards process to discuss a change of
this type at some point in the future (certainly not for XML 1.0)?

Cheers,

Matthew

-----Original Message-----
From: Robin Cover <robin at ACADCOMP.SIL.ORG>
To: xml-dev at ic.ac.uk <xml-dev at ic.ac.uk>
Date: Saturday, April 18, 1998 7:37 PM
Subject: Re: Inheritance in XML

>> Re: Subject: Re: Inheritance in XML (was Re: Problems parsing XML)
>> Date: Sat, 18 Apr 1998 08:49:07 +0100
>> Reply-To: "Martin Bryan" <mtbryan at sgml.u-net.com>
>
>>>What is missing is that we can't define one class (element type) as a
>>>subtype of another.
>
>> In SGML you can use exclusions to make an element a true subclass of
>> another:
>>
>> <!ELEMENT X  (%Y-contents;) -(a|b|c)>
>>
>> providing a, b and c are optional components within the model for Y.
>> Unfortunately XML dropped this useful option from the set of SGML
facilities
>> it in inherited
>>
>> Martin Bryan
>
>Martin, I wish I could believe this were true and useful.  It seems
>that we confront here one of the several troublesome mismatches
>between OO database modeling and SGML/XML markup, with respect to
>the simple analogy:
>
>OODB           SGML/XML Markup
>
>class defn     element declaration
>class name     element type
>object         element
>attribute      attribute
>
>If we accept this crude analogy, and accept SGML's notion of an
>"attribute" as a name-value pair, then the hope of creating subclasses
>through SGML/XML element declarations appears slim.  Appears "to me" I
>should say: I would welcome comments from the experts.
>
>For starters, subclassing normally would mean further specialization
>by the addition (possibly 'plus subtraction') of properties, viz., of
>attributes.  Formally, then, an SGML element declaration can't do the
>work: it would need to be an ATTLIST declaration.  But then we face
>the problem that you can't model a complex attribute with the SGML
>'attribute' anyway (if you want any validation): the "value" in
>'(name-)value' is a flat/string in SGML, at least in the literal sense.
>
>Of course, one can (and we all do) model "real" attributes using SGML
>elements -- since we have no realistic alternative -- but that creates
>other problems for the notion of using SGML element decls as a
>subclassing mechanism.  One such problem is that (real) attributes are
>unordered.  The straightforward way to model an object/element with
>(some optional, some required) attributes a, b, c, d, e, and f would
>seem to be: (a* & b? & c? & d & e?), but SGML/XML notions of
>prescribing order in the serialization are fairly strong, and XML
>won't even allow the use of the AND connector to indicate what I
>plainly mean in this sample assertion. (Perhaps Steph Tryphonas has
>written a program by now to convert all content models using AND to
>use only OR, without sacrificing any integrity constraints on
>occurrence and sequence).  In any case, the impulse toward
>serialization in SGML -- at least in practice, given tools that force
>end users to reckon with (arbitrary non-intuitive) "order" based upon
>sequence rules in content models -- tends to work against the easy use
>of SGML elements to model attributes.
>
>Even apart from these mismatches between "object" modelling
>and SGML/XML encoding, I question whether
>
>  " <!ELEMENT X  (%Y-contents;) -(a|b|c)> "
>
>creates a useful "true subclass."  Why would one want to create a
>subclass based upon the subtraction of optional "attributes"
>(subelements)?  I think that would make it a superclass in many OO
>systems.  In this connection, one might be inclined to argue that the
>treatment of "content" as a special attribute is unfortunate, at least
>from the perspective of data modelling, where "part-whole" has no
>quintessential role vis-a-vis "is-a" or "has-a" or "kind-of" or
>"points-to"...  At which point, others would quickly point out that
>they think it's specious to be talking about object modeling in terms
>of SGML-based markup languages anyway, since "these languages can
>neither formally express nor enforce semantic integrity constraints
>which are so critical to good object modelling..."
>
>I think this all leads me in the direction of favoring the efforts
>at defining other schema languages (beyond SGML/XML DTD syntax),
>granting that the validation of instances against their schemas,
>if/when critical, will need to be done outside the framework of
>the SGML/XML "parser/processor" as defined.  I have little doubt
>that someone as brilliant as Eliot can show how the desired
>objectives might be met through architecture processing by an
>appropriate architecture engine; I don't know whether this is the
>"best" path in all cases, or whether SGML/XML users will want to
>deal with all the layers of indirection that architectures seem to
>want.
>
>I hope that experts with some years of experience in OO systems
>will contribute their insights to the new "schema" projects.
>
>-rcc
>
>-------------------------------------------------------------------------
>Robin Cover                    Email: robin at acadcomp.sil.org
>6634 Sarah Drive
>Dallas, TX  75236  USA          >>> The SGML/XML Web Page <<<
>Tel: +1 (972) 296-1783 (h)     http://www.sil.org/sgml/sgml.html
>Tel: +1 (972) 708-7346 (w)
>FAX: +1 (972) 708-7380
>=========================================================================
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo at ic.ac.uk the following
message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
>

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)