Namespaces, Architectural Forms, and Sub-Documents

Thu Feb 5 00:18:04 GMT 1998

David Megginson wrote:
> 
> XML documents may (and perhaps, usually will) contain non-XML objects
> such as wordprocessor documents, spreadsheets, MPEG clips, Java
> applets, audio sequences, and many others -- to date, thankfully, no
> one has proposed uuencoding any these and dumping them inline between
> a start and and tag.

Maybe not on this mailing list, but come on over to "SGML-TOOLS"
(formerly LinuxDoc). :) :)

> Why should we treat an equation marked up in XML differently than an
> equation marked up in Microsoft Word?  It seems easier (from a user's
> perspective) to treat everything as objects, rather than defining one
> special case.  

We should treat them differently for two reasons:

#1. XML data is text, and thus makes a certain amount of "sense" inline.
If I embedded LaTeX in an XML document I would probably inline it,
rather than refer to it for the same reason. Word formuale are binary.

#2. XML has concepts such as validation and id-reference that depend on
data being logically inline.

#3. If we do not do this, I do not think that people will use subdocs.
They will probably just abandon validation or use XML-Data.

> Object-oriented programming has proven the value of
> encapsulation, and the compound-document idiom is standard on millions
> of desktops already, so we can hardly argue that subdocuments are an
> unfamiliar approach.

Not so. Word does not use externally embedded data by default. If you
create a table, formula or a graphic, it is inlined by default.
Typically you only externally link to a file if it already exists (e.g.
it has some meaning independent of this document). I think Microsoft
made the right choice there.

> I am a big fan of pragmatism on the implementation side, as people
> might have noticed from my postings on the design of AElfred; on the
> standards side, though, I wouldn't want to cripple a spec just to work
> around a temporary problem that will have to be solved anyway for
> non-XML objects.  

SGML is 12 years old. We are only marginally closer to having decent
tools that will manage this stuff for us. I personally have no faith
that they will arrive soon. I also think that we have 10 years of good
experience with what we need to guide our choices. Most major DTDs
incorporate ad hoc DTD modularity features. We know what they need to
make these features robust -- just namespace protection.

> SGML people will remember unfortunate features like
> SHORTREF, DATATAG, and OMITTAG -- included a little over a decade ago,
> likewise, for the sake of making things easy and working around
> temporary deficiencies in the available tools.  

Well, I still use two of those three features, so obviously the problems
with the tools have not sufficiently cleared up yet. It also isn't clear
to me if those features have helped or hurt SGML's propularity. OMITTAG
in particular is very widely used. Even HTML uses it.

>  >  * element type constrainability (how do I specify a SUBDOC root element
>  > type in a content model?)
> 
> Use HyTime (just joking).  Seriously, I cannot see that this is a
> worse case than not being able to use a DTD at all.  

It isn't. But in XML we do have DTDs and we want to use them for these
heterogenous (not "compound") document.

> The general idea
> of compound documents (Netscape with plug-ins, OLE documents, Andrew
> documents, or otherwise) is that you can plug in any object -- I had
> imagined that this was the goal of namespaces as well.  

I don't think so. In my paper I quoted from the XML Namespaces spec:

"We envision applications of XML in which a document instance may
contain markup defined in multiple schemas. These schemas may have been
authored independently. One motivation for this is that writing good
schemas is hard, so it is beneficial to reuse parts from existing,
well-designed schemas. Another is the advantage of allowing search
engines or other tools to operate over a range of documents that vary in
many respects but use common names for common element types. "

The goal of combining schemas is central to the concept.

> In XML you can
> constrain the placement of pointers to external objects, at least.

Cold comfort. :)

>  >  * "content model communication" (how do I pass a %cell; content model
>  > into my table subdoc)
> 
> You're thinking of CALS here.  I'd suggest that we move away from the
> older SGML model of heavily parameterised DTDs (as from heavily
> #IFDEF'ed C header files): remember that one of the arguments for the
> namespace model is to reuse stylesheets and other processing
> specifications -- if a table model can vary its content unpredictably,
> then you will not be able to reuse stylesheets anyway.  

The formatting for the contents of table cells and for the shape of the
table can be specified independently. In HTML, (for example) essentially
anything can go in a table cell. The table formatter just figures it
out. A good stylesheet language will provide quite a bit of independence
between construction rules. Yes, we may need some conventions for more
complex combinations (e.g. metadata formatting conventions), but most
things will "just work."

>  >  * ID linkage (even for simple links I must use some more advanced
>  > linking strategy)
> 
> HREFs would work fine -- HTML people are already used to
> 
>   <a href="book.html#chapter3">
> 
> so we should have no confusion here.  

>  >  * semantics (i.e. SUBDOC has none...you need VALUEREF or something else
>  > on top of subdoc)
> 
> I expect that XLL will provide mechanisms for expressing the 'embed'
> semantic.

Both of these proposals just add hassles to something that should be
simple.

> Furthermore, you have the
> advantage that your document's validity does not depend on its child
> objects (this is very important for document management in large,
> multi-author systems -- if subdocuments are atomic, then a change by
> one author to a table, for example, will not make the containing
> chapter invalid).  Again, as in programming, encapsulation will be a
> big win in the medium term.

Yes, there are occasions where this encapsulation is important and
useful. There are also times where it is not.

Let me put it this way: do you feel that the creators of DocBook, TEI
and HTML were mistaken by including table models rather than forcing
their users to use subdocs? If yes, then you have a very different idea
of usable DTD design than I do. If no, then I cannot understand why you
are opposed to making this process of including table models easier so
that you do not need people with brains the size of planets and a
serious commitment to DTD use to accomplish it.

All I am asking is to make this common DTD fragment combination idiom
simpler, more standard and more robust so that casual (and expert!)
users can whip up their own DTDs by combining fragments instead of
manually merging fragments, disambiguating names, adding architectural
forms etc. etc.

 Paul Prescod
--
http://itrc.uwaterloo.ca/~papresco

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)