Inheritance in XML

Paul Prescod papresco at technologist.com
Mon Apr 20 15:05:44 BST 1998


Matthew Gertner wrote:
> 
> * Terminology *
> 
> I personally don't agree that there are carved-in-stone, well-understood
> definitions for terms like "inheritance" and "subtyping" in XML. 

I don't think that anyone claimed that there is a well-understood
definition for "inheritance" in any context -- even OO. But to be
consistent with English, it must have something to do with "getting
something for free." In the XML context the most obvious thing would be
declarations.

Subtyping is different. Subtyping comes straight from mathematics and is
as old as logic (at least). A type defines a set of objects. A subtype
describes a subset of those objects. Simple and precise.

> Is
> "subtyping" a better term. No, because it doesn't have the same resonance as
> the word "inheritance" among non-programmer types.

I don't know why you think that. Non-programmer types are likely to balk
at either word, but at least subtyping is shorter, and can be precisely
defined. Anyhow, it is not at all like the words are interchangable. You
can't pick and choose from words that already have meanings.

> I'll make a first attempt:
> "Inheritance in XML refers to the process of creating new element types that
> duplicate the content model and attribute list of existing element types (in
> the same or a seperate "base" DTD), while extending these to include
> additional attributes and/or content. As such, instances of the new element
> types can be used wherever the base element type can be used, and can be
> processed polymorphically by any external processor which knows about the
> base element type."

ACK! This definition was proven inadequate in the OO software world
around a decade ago. Both C++ and Java allow subtyping without
inheritance, and C++, Sather and Eiffel allow inheritance without
subtyping (I suppose to get that in Java, you would have to use
delegation). If we are going to borrow ideas from OO, then we should at
least use the updated, modern ideas, not those that were accidently
confused in Simula 67 (and have been confused in programmers minds ever
since).

The first major problem with your definition actually has nothing to do
with the inheritance/subtyping conundrum. The biggest problem is that if
you "extend" a content model, you are making a more flexible language,
which *cannot* be processed polymorphically by an external processor
which knows nothing about the base element type:

<!ELEMENT TITLE (#PCDATA)>
<!ELEMENT MY-TITLE (#PCDATA|IMG|FOO|BAR)>

Now imagine software that generates a TOC from titles, presuming them to
be strictly textual. What does it do with images in titles?

Now let's talk about inheritance and subtyping. This is not a merely
theoretical issue. It has important practical implications. The most
interesting, important application of subtyping is allowing divergent
evolution of compatible schemas. This is why architectural forms were
invented. But for this to work, subtyping *must* be unhitched from
inheritance.

Suppose that Boeing has a content model:

<!ELEMENT AIRPLANE-DOC - - (FRONT, MIDDLE, REAR)>

Bombardier has a similer model (after all, they are modelling the same
thing):

<!ELEMENT AIRCRAFT-DOC - - (COCKPIT, STORAGE, TAIL)>

How does inheritance help me to unify these models and validate that
they are actually isomorphic? It doesn't. This is a job for subtyping. I
can also come up with examples where inheritance is more useful without
subtyping but you can always achieve this through other means (which is
why Java does not support it).

Inheritance is a code reuse mechanism, so you can always emulate it with
cut and paste (or, parameter entities, or in a programming language with
delegation). Subtyping is a type system extension. It is completely
different.

I can inherit stuff from my dad without becoming a dad. I can choose to
be a dad without inheriting anything either from my dad, or the "class
dad". They are different things.
 
> * DTDs and schemata *
> 
> Francois Chahuneau's article makes a very effective argument for why we need
> to extend or replace DTD syntax (thanks Robin). XML-Data is a reasonable
> attempt to do so, but it is understandly controversial because it is a such
> a radical departure from the existing syntax. 

I think that XML-Data should be controversial because from my reading it
is just a mix and match combination of interesting features that people
want in schemas without a coherent theory of how they should fit
together. You can't just put 10 smart people into a working group and
have them throw in their good ideas and expect a coherent result.
XML-Data's inheritance mechanism does not take advantage of XML's nature
as a sequence-oriented language for encoding documents. In other words,
it doesn't solve the fundamental problem.

> I quite like the idea of an
> alternate, XML-based schema syntax, but the real lesson of XML-Data is that
> creating an effective inheritance mechanism isn't rocket science. All that
> is really needed is a keyword that says "this element type is derived from
> that element type". Something like:
> 
> <!element dog extends animal...

Sure. This isn't rocket science. But it doesn't solve the fundamental
problem at all. You haven't defined what happens to "BARK" sub-elements
in "DOG". Without that definition, any software dealing with animals
will croak on dogs. Which is exactly what subtyping was supposed to
avoid....
 
> More tricky than any of these technical issues is the question of what, if
> anything, could be done to promote a mechanism of this sort. Obviously this
> would require a change to the XML spec as well as modification to all
> existing tools which process DTDs, so it's a pretty big deal. I wonder if
> anyone besides me thinks that a simple mechanism like this would make sense.
> If so, is there any room in the XML standards process to discuss a change of
> this type at some point in the future (certainly not for XML 1.0)?

Personally, I have yet to see a decent proposal for inheritance and
subtyping in SGML. Coming up with ibe is difficult, which is why I've
spent the last year thinking about it. Dan Connolly has also spent
several years thinking about it. I know that there are many others in
the same boat. I think that we agree that it doesn't make sense to adopt
a solution that solves only 5% of the problem, which is why you will see
resistance to anything like that.

We will know that we have a complete solution to the problem when HTML
6.0 can be described as a subtype of HTML 5.0, and its behaviour in a
"subtype aware" HTML 5.0 browser is predictable and well-defined.
Further, HTML 6.0 must not just extend HTML 5.0 in trivial ways such as
new <HEAD> tags. It must actually have new elements, with new content
models mixed in at all levels. As I said, inheritance-at-the-end solves
about 5% of this problem.

 Paul Prescod  - http://itrc.uwaterloo.ca/~papresco

"Journalism is good if you follow the rules. Don't allow the human 
rights groups to spoil your profession" 
    - Col. Godwin Ugbo of the  Nigerian military dictatorship

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list