Inheritance in XML (was Re: Problems parsing XML)

Paul Prescod papresco at
Fri Apr 17 19:23:20 BST 1998

Matthew Gertner:
>Nevertheless, inheritance of some sort is absolutely vital if XML is to
>fulfill its promise. If we can't produce standard DTDs which can be
>extended, *without* modifying the base DTD, then many of the advantages of
>XML go out the window.

Michael Kay wrote:
> I agree that this is central. Let's leave identity out of the discussion, as
> that
> does, I think, fall into the XML Linking domain, and concentrate on what I
> prefer to call subtyping.

You act as if this is just a terminological difference, but it isn't. He
is talking about one thing and you are talking about another. He speaks
of "Producing standard DTDs which can be extended *without* modifying
the base DTD" is inheritance. It can be implemented right now through
parameter entity hacks and is not subtyping. You on the other hand seem
to be talking about subtyping:

> I know some people will disagree, but the way I use XML, a DTD is a
> schema, an element definition in a DTD is a class, a document is a
> database, and an element within a document is an instance of a class.
> What is missing is that we can't define one class (element type) as a
> subtype of another.

The only reason that the concepts *even intersect* is because 

a) subtyping without inheritance is often painful and leads to code
duplication. I claim that architectural forms and Java "interfaces" are
often painful for exactly this reason. Of course in [SG|X]ML,
inheritance can be hacked with parameter entities, which is something
HyTime does for its architectures. (also HyTime can only be thought of
as subtyping if you use it in a restricted form...)

b) inheritance without subtying is only occasionally useful. I can't
remember the last time I used "private inheritance" in C++ and I don't
even remember right now if Java supports it.
But the fact that the two concepts work well together does not make them
synonyms. They are not.

> The main thing that's tricky is that you can get the "is-a" the wrong way
> round. If a PREFACE is-a-kind-of CHAPTER, that means you can find
> anything (elements, attributes) in a PREFACE that you can find in a chapter,
> and more besides. 

No it doesn't. If PREFACE is-a-kind-of CHAPTER then source code designed
to handle chapters should work with prefaces. That means that PREFACE
must either directly describe a *subset* of the language described by
CHAPTER (i.e. have a constrained content model) or PREFACE must provide
"some mechanism" for transforming its content into a language
understandable by CHAPTERs. In real world documents, we often want to be
able to have subtypes that are also extensions, which means that we need
to define some transformational system (as archforms do).

This transformational question is exactly what makes subtyping with
extension very tricky. Subtyping without extension is trivial. This is
why I have stepped back from the question of subtyping with extension
and am investigating transformation languages. In particular I am right
now looking at Forest Automata theory and a transformation language
designed by Makato Murata.

> It also means you can reduce a PREFACE to a CHAPTER
> by removing these extra bits. I'm not entirely sure what "removing the extra
> bits" means: for example should it remove elements that cannot occur
> in a CHAPTER, or should it just remove the tags that surround those
> elements? This tends to show up the lack of semantics in the object
> model underlying XML.

That's exactly right. Your confusion is my confusion. The only way out
is through transformation languages -- either simple, relatively weak
ones like those provided by archtiectural forms, or more powerful (and
more complicated? I don't know yet?) ones like those described by
Murata-san in his various Principles of Documentation papers. They are

Unless you are much smarter than me, you will probably not find these
light reading, but my hope is that the concepts can be simply expressed
in a nice syntax in much the same way that regular expressions hide the
nastiness of DFAs. There is in fact such a thing as a regular tree
expression that is quite analogous to a regular expression. I don't yet
know if these can be hooked up to an easy to use (non-programmable!)
transformation language yet.

Sorry for the brain dump. I'm late for a meeting.

Paul Prescod  -

[Woody Allen on Hollywood in "Annie Hall"]
Annie: "It's so clean down here."
Woody: "That's because they don't throw their garbage away. They make 
        it into television shows."

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as:
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list