Blockland (was Re: XML Information Set Requirements, W3C Note 18-February-1999)

Sat Feb 20 08:32:25 GMT 1999

 (responding to Marc McM and Jeff S.)
From: Marc.McDonald at Design-Intelligence.com
<Marc.McDonald at Design-Intelligence.com>

>In considering a document to be a stream or information set, it allows
>a distributive organization over a network. Instead of requiring the
>entire 'document' to be transferred en-masse as a file, it can be done
>piece-wise over a stream. Consider this just-in-time manufacturing of
>the 'document'.

Err, isn't this what entities are about? An entity could be a file or a
"piece"
on a stream.

 I guess you mean entity "prefetch" or "fetch on declaration" rather
than "fetch on reference" for external entities. The XML SIG briefly
considered whether this should be part of XML (I remember raising it)
and I think the consensus was that entity management was an
implementation issue to be dealt with by PIs or some other layer;
XML does not mandate a specific entity-fetch policy, though,
certainly, the expectation is that entities are fetched on reference.

Fetching on declaration would be a bad default policy, because
many entities are just link ends, and will never be traversed. The
Xlink "auto|user" option will bring a nice possibility for smarter
entity fetching: I would like for XLink to also include a priority
indicator (e.g., a number) on links to indicate the fetching priority.

So a company logo can arrive first, the content second, and the
advertising last, for example; or so that data is only fetched after
the script to run the data has been fetched.

> Naturally, you can think of cases where only part of the entire
>document is needed. Subsetting of the document tree is one of the
>features of XSL.

XML deliberately culled three features from SGML to allow this:
    *  a special kind of ENTITY  attribute called CONREF;
an element with that could either have content directly speficied, or it
could point to some other entity.
    * a special kind of ENTITY attribute called SUBDOC, which
meant that the entity referred to was a document with its own DTD
and local ID namepace.
    * data attributes are attributes on entity declarations: you could
use them,
conceivably, to specify the prefetching attributes for the entity's
resource.
Actually, you could also use PIs for this, and even (yuck) special
elements
at the head of your document (to simulate the data attributes). HyTime
and
SMIL also could be used to support prefetching policy.

>Unifying these 2 ideas provides a new use for a DTD. It is not only a
>means to describe the valid structure of a document, but now can
>advertise the information available. A site can be described as
>capable of providing information sets in a set of structures defined
>by DTDs (or their replacement). A consuming application could request
>information by a pattern or query which would return the desired
>subset of information.

This is more like what RDF is attempting: to provide a way to describe
a resource, so that applications can determine whether the schema being
used
is one that they understand. This is what para 2 of
    http://www.w3.org/TR/PR-rdf-syntax/#intro
seems to suggest.

>In terms of architecture, it removes bottlenecks. Converting to a file
>model is expensive if the information is large and it can be used
>piecemeal on the other side. It is a worst-case solution. A
>demand-based stream model will create entire documents only if
>required by the ultimate consumer of the information and otherwise
>incrementally provide elements.

I think in your mind is the idea that there is only one big fat
document;
hence "entire document". If elements are provided "incrementally", each
of them are documents. Together they are used for a "publication", not
for an "entire document".

Jeff Sussna write:
>If you approach XML as a type system, the concept of document loses
>its first-class status (or at least should, in my opinion).

XML is not a type system. A document is a graph of elements, data,
comments and PIs with
   * an ID namespace
   * optionally some element type declarations
   * optionally some entity declarations and notation declarations
   * optionally namespace declarations which allow local type names to
be qualified by a URI

In other words, the document is the block mechanism for metadata and
namespaces for a subtree of the entire hyper-document. XML is a
labelling notation, not a type system.

If the document loses its first-class status, which of these things
should be gotten rid of? Do you want arbitrary scoping of IDs, element
type declarations, entity declarations, notation declarations and
namespaces?  If so, you need some block mechanism to allow these.  If
not, what are you proposing: universal scope? all typing to be performed
out-of-band (i.e, external M).

> It is
>interesting that the concept of document (even physical document as
>file) has crept into programming languages, and has caused problems
>there as well. The C language include directive is a physical rather
>than a logical mechanism. When you try to build a database-driven
>incremental build system, includes become problematic.

Ah, so your point is relational databases don't support XML entities.
Actually,
relational databases support XML entities but not XML elements. I hope
we are not going to lurch into some discussion of the benefits of
relational
models rather than network models...please please please leave that for
some other mailing list.

>I would like to encourage the XML community to 1) pay attention to the
>lessons of 30 years of development in the arena programming and type
>languages, and 2) not get bogged down by the historical baggage of the
>M in XML.

Has history shown us that block/functions/classes/modules/packages are
bad things? On the contrary, the lesson of the last 30 years of
development
is that it is vital to large systems to be able to package things
neatly:
I would say that XML needs to enhance the possibilities of what a
document
is, not get bogged down by this historical baggage of relational
databases.

Indeed, history has shown us that when people try to avoid the M, they
make
monolothic, proprietory, binary systems that are hard to maintain or
distribute, and which don't allow incremental enhancements or data
annotation, or which they suddenly find leaves out major parts (notably,
internationalization) or which can only be used by gurus and those with
specialist
tools. Look at the graveyard of compound document systems.

However, if you are also saying that there is enormous scope for
reconciling
declarative programming and X*L, then I certainly agree with you.
I certainly expect to see some kind of  prolog-(i.e., the logic
programming
language)-in-XML system sometime (perhaps RDF provides part of this),
for example.

Rick Jelliffe

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)