Compound Documents - necessary for success?
Ronald Bourret
rbourret at ito.tu-darmstadt.de
Mon Feb 1 11:39:49 GMT 1999
Marcus Carr wrote:
> With all due respect Roger, I think that the problem is that we're both
asking
> questions and with few exceptions, nobody's answering. In my own case, I
assume
> that this is due to the fact that:
>
> a) creating compound documents with fragments using the same DTD as the
parent
> may cause problems, but that there would always be a better way to handle
such
> documents,
>
> b) nobody's sure whether this will be a problem once XLink, XPointer,
XML
> Fragments and X?? have spun their magic,
>
> c) I've not clearly explained what I think the problem is,
>
> d) I'm missing the point so totally that nobody feels that it even
merits a
> reply,
>
I've been following this conversation with interest. I'll hazard two
guesses for the lack of answers. First is (b) -- schemas and fragments are
likely to answer some, but not all, of these questions. Second is that
these questions are on or ahead of the bleeding edge, so it's not
surprising that nobody has answers yet.
I think that many of us have a notion of a "compound document" and "reusing
schemas" but that, for most of us, these notions don't go much beyond the
actual words and a hazy, utopic, AI-intensive dream that XML documents will
somehow magically recombine themselves to solve all of our problems.
Let's look at a simple example. Suppose we have a DTD for NBA players:
<!ELEMENT Players (Player*)>
<!ELEMENT Player (Name, Team)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Team (#PCDATA)>
Now suppose we also have a DTD for heights:
<!ELEMENT Height (Scalar, Units)>
<!ELEMENT Scalar (#PCDATA)>
<!ELEMENT Units (#PCDATA) <!-- This would be feet, inches, meters,
angstroms, etc. -->
What I think a lot of people would like is to automagically combine these
two DTDs so that the following document is valid:
<?xml version="1.0" ?>
<!-- Note the illegal syntax. There is
currently no legal way to express this. -->
<!DOCTYPE "Player" System="players.dtd" System="height.dtd">
<Players>
<Player>
<Name>Joe Tall</Name>
<Team>Iowa Talls</Team>
<Height>
<Scalar>3</Scalar>
<Units>meters</Units>
</Height>
</Player>
</Players>
This does not currently work for two reasons. First, there is no way to
express that a document is valid under two different DTDs. Second, the
above document is clearly not valid under either of the above DTDs. To
create such a document under the current spec, we need to rewrite
players.dtd:
<!ELEMENT Players (Player*)>
<!ELEMENT Player (Name, Team, Height)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Team (#PCDATA)>
<!ENTITY % height SYSTEM "height.dtd">
%height;
There are two important things to notice here:
1) We got nothing for free. That is, we had to write a new DTD because we
have a new file type, and the new file type (DTD) is different from either
of the previous file types. In Roger's case, he needs to generate new DTDs
dynamically, as was mentioned in an earlier message.
2) When we wrote the new DTD, a *human* made the decision about where
<Height> was legal. Anybody figuring out a foolproof way for a machine to
do this usefully -- that is, without defining the content model of all
elements as ANY -- will probably get a Turing Award for AI.
Without knowing much about fragments, it appears these have more to do with
the delivery of pieces of an XML document rather than assembling and
validating pieces from multiple documents. In particular, requirement 12 of
the XML Fragement Interchange Requirements states that, "Issues involved
with the possible "return" of any fragment to its original context and the
determination of the possible validity of the "returned" fragment in its
original context are beyond the scope of this activity." However, I have
no doubt that the fragments project will turn up some interesting ideas
about compound documents.
In schema languages, the current state of the problem is to generalize the
step:
<!ENTITY % height SYSTEM "height.dtd">
%height;
That is, to define a general syntax that makes it easy to reuse parts
(generally elements and attributes, but possibly any part) of other schemas
without bringing in all of the second schema. This may not sound too
exciting, but it is very useful.
I personally think that anything more utopian than this is going to
require, at the very least, a new definition of validity. One such
definition was that proposed in this thread: that each subdocument is
validated under its own DTD and the overall document is not validated but
merely checked for well-formedness. This obviously is a specific case, but
interesting nonetheless, as it suggests a useful application for partial
validity. (As an aside, anybody figuring out an algorithm by which
compound documents such as that shown above are "valid" under multiple DTDs
and still work with existing tools would significantly advance the field.
Personally, I'm not too hopeful.)
So for the moment, don't be disappointed by the lack of answers. You're
just ahead of us.
-- Ron Bourret
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list