Documents and Document Fragments (Was RE: XML Information Set Requirements, W3C Note 18-February-1999)

James Tauber jtauber at
Sun Feb 21 04:02:26 GMT 1999

>is one document for an issue of a magazine, but it also 'contains' three
>more documents - one for each article in that issue. A closing element
>is therefore effectively the end of a document - even if that document
>may be inside another document (in the *logical* sense in which the word
>is used in the spec.) I don't think we therefore need the notion of a
>'document fragment', because in XML 1.0 terms a fragment *is* a

We must be careful when using the word "document" because it does have a
specific meaning in the XML spec. It is *NOT* true that a document may be
inside another document in the logical sense in which the word is used in
the spec.

An XML *document* (document in the spec sense) is PROLOG+ELEMENT+optional
You can't have PROLOGs in the content of elements, therefore you cannot have
document in documents if you mean document in the spec sense. (note that all
XML document have a prolog even if it is empty).

This is exactly why I introduced the notion of an überdocument in a previous
thread on this list about my vision for an OS shell that treats a file
system as XML.

The reason I used the term überdocument is that terms like "document" and
"logical" have a particular meaning in XML. In the usual sense of the word
logical, an überdocument is logically but not physically a document. But in
the XML sense of the word logical, it is not logically a document at all.

>Whether this approach is of any use to you obviously depends on what you
>are doing. In our case we have stored all the data that makes up the
>articles and issues of a magazine in an object-type database, and then
>built interfaces onto it that allow any node and its children to be
>exported as XML, as if they were a document. This means that the notion
>of a document that we normally have (the physical one) is no good, since
>all 'documents' are dynamic and can start at any point in the tree.

The key is when you say "*as if* they were a document". An element within an
XML can be cut off and promoted to full blown "document" status, sure, but
while it is an element in an XML document it is not an XML document.

I agree that the notion of a physical document is no good. I have actually
submitted a poster to WWW8 entitled "Rethinking websites as single
documents" which discusses many of these issues. Basically I argue that the
model of SGML document + stylesheet -> physical document shouldn't carry
over to the web. We should not see XML document + stylesheet -> web page.
Because the interrelatedness of web pages on a particular site is far
greater that the interrelatedness between separate physical documents.
Instead I suggest the view XML document + stylesheet -> web site.

Actually I'm being sloppy here because for scalability reasons I suspect
larger sites will be represented by an überdocument rather than a single XML
document. Of course an überdocument can always be thought of as an XML
document (a logical one in the general sense of logical, not the XML spec
sense) just as a element can be made into a full-blown XML document.

>We now treat our web servers logically as 'XML servers', with either one
>massive document on or thousands of smaller ones, whichever way you want
>to slice it.

Yep. This is the idea I'm exploring. I'm just using the term "überdocument"
for the "one massive document".

James Tauber / jtauber at /
Associate Researcher, Electronic Commerce Network
Curtin University of Technology, Perth, Western Australia

Full-day XML Tutorial @ WWW8 :

Maintainer of :, and

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as: and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list