Compound Documents - necessary for success?

Mon Feb 1 11:39:49 GMT 1999

Marcus Carr wrote:

> With all due respect Roger, I think that the problem is that we're both 
asking
> questions and with few exceptions, nobody's answering. In my own case, I 
assume
> that this is due to the fact that:
>
> a)  creating compound documents with fragments using the same DTD as the 
parent
> may cause problems, but that there would always be a better way to handle 
such
> documents,
>
> b)  nobody's sure whether this will be a problem once XLink, XPointer, 
XML
> Fragments and X?? have spun their magic,
>
> c)  I've not clearly explained what I think the problem is,
>
> d)  I'm missing the point so totally that nobody feels that it even 
merits a
> reply,
>

I've been following this conversation with interest.  I'll hazard two 
guesses for the lack of answers.  First is (b) -- schemas and fragments are 
likely to answer some, but not all, of these questions.  Second is that 
these questions are on or ahead of the bleeding edge, so it's not 
surprising that nobody has answers yet.

I think that many of us have a notion of a "compound document" and "reusing 
schemas" but that, for most of us, these notions don't go much beyond the 
actual words and a hazy, utopic, AI-intensive dream that XML documents will 
somehow magically recombine themselves to solve all of our problems.

Let's look at a simple example.  Suppose we have a DTD for NBA players:

<!ELEMENT Players (Player*)>
<!ELEMENT Player (Name, Team)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Team (#PCDATA)>

Now suppose we also have a DTD for heights:

<!ELEMENT Height (Scalar, Units)>
<!ELEMENT Scalar (#PCDATA)>
<!ELEMENT Units (#PCDATA)

What I think a lot of people would like is to automagically combine these 
two DTDs so that the following document is valid:

   <?xml version="1.0" ?>
   <!-- Note the illegal syntax.  There is
        currently no legal way to express this. -->
   <!DOCTYPE "Player" System="players.dtd" System="height.dtd">
   <Players>
      <Player>
         <Name>Joe Tall</Name>
         <Team>Iowa Talls</Team>
         <Height>
            <Scalar>3</Scalar>
            <Units>meters</Units>
         </Height>
      </Player>
   </Players>

This does not currently work for two reasons.  First, there is no way to 
express that a document is valid under two different DTDs.  Second, the 
above document is clearly not valid under either of the above DTDs.  To 
create such a document under the current spec, we need to rewrite 
players.dtd:

<!ELEMENT Players (Player*)>
<!ELEMENT Player (Name, Team, Height)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Team (#PCDATA)>
<!ENTITY % height SYSTEM "height.dtd">
%height;

There are two important things to notice here:

1) We got nothing for free.  That is, we had to write a new DTD because we 
have a new file type, and the new file type (DTD) is different from either 
of the previous file types.  In Roger's case, he needs to generate new DTDs 
dynamically, as was mentioned in an earlier message.

2) When we wrote the new DTD, a *human* made the decision about where 
<Height> was legal.  Anybody figuring out a foolproof way for a machine to 
do this usefully -- that is, without defining the content model of all 
elements as ANY -- will probably get a Turing Award for AI.

Without knowing much about fragments, it appears these have more to do with 
the delivery of pieces of an XML document rather than assembling and 
validating pieces from multiple documents. In particular, requirement 12 of 
the XML Fragement Interchange Requirements states that, "Issues involved 
with the possible "return" of any fragment to its original context and the 
determination of the possible validity of the "returned" fragment in its 
original context are beyond the scope of this activity."  However, I have 
no doubt that the fragments project will turn up some interesting ideas 
about compound documents.

In schema languages, the current state of the problem is to generalize the 
step:

<!ENTITY % height SYSTEM "height.dtd">
%height;

That is, to define a general syntax that makes it easy to reuse parts 
(generally elements and attributes, but possibly any part) of other schemas 
without bringing in all of the second schema. This may not sound too 
exciting, but it is very useful.

I personally think that anything more utopian than this is going to 
require, at the very least, a new definition of validity.  One such 
definition was that proposed in this thread: that each subdocument is 
validated under its own DTD and the overall document is not validated but 
merely checked for well-formedness.  This obviously is a specific case, but 
interesting nonetheless, as it suggests a useful application for partial 
validity.  (As an aside, anybody figuring out an algorithm by which 
compound documents such as that shown above are "valid" under multiple DTDs 
and still work with existing tools would significantly advance the field. 
 Personally, I'm not too hopeful.)

So for the moment, don't be disappointed by the lack of answers.  You're 
just ahead of us.

-- Ron Bourret

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)